Jump to content

Server Admin Log/Archive 47

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2021-12-29

  • 10:30 elukey: kill tcpdump process on kubestagemaster1001 (kept a big pcap file opened that kept growing)

2021-12-28

  • 11:27 godog: powercycle ms-be1059 -- was powered down
  • 02:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn

2021-12-24

  • 20:08 mforns@deploy1002: Finished deploy [airflow-dags/analytics@e282d2d]: (no justification provided) (duration: 00m 06s)
  • 20:08 mforns@deploy1002: Started deploy [airflow-dags/analytics@e282d2d]: (no justification provided)
  • 04:20 legoktm: depooled cp2029 now that it's up
  • 04:16 legoktm: powercycling cp2029 via mgmt
  • 00:57 ejegg: updated fundraising CiviCRM from 47dd67f2 to aaceb4ab
  • 00:28 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudbackup1003.eqiad.wmnet with OS buster
  • 00:28 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudbackup1004.eqiad.wmnet with OS buster
  • 00:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1004.eqiad.wmnet with OS buster
  • 00:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1003.eqiad.wmnet with OS buster
  • 00:19 legoktm: repooling mw1450 (forgot to after benchmarking finished)
  • 00:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox

2021-12-23

  • 21:41 taavi@deploy1002: Finished deploy [horizon/deploy@ff82962] (dev): (no justification provided) (duration: 04m 06s)
  • 21:37 taavi@deploy1002: Started deploy [horizon/deploy@ff82962] (dev): (no justification provided)
  • 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:14 legoktm@deploy1002: Synchronized php-1.38.0-wmf.13/includes/changetags/ChangeTags.php: Disable querying the 'wikieditor' change tag temporarily (T298225) (duration: 00m 59s)
  • 18:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:50 mforns@deploy1002: Finished deploy [airflow-dags/analytics@363651a]: (no justification provided) (duration: 00m 06s)
  • 17:50 mforns@deploy1002: Started deploy [airflow-dags/analytics@363651a]: (no justification provided)
  • 17:36 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2041.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:19 volans@cumin2002: START - Cookbook sre.hosts.provision for host mc2041.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:14 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:10 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:04 ejegg: updated standalone SmashPig deploy from 5a7d0c2c to 96c7b03e
  • 16:46 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2038.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:40 volans@cumin1001: START - Cookbook sre.hosts.provision for host mc2038.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:26 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2038.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:13 volans@cumin1001: START - Cookbook sre.hosts.provision for host mc2038.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:01 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2038.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:01 volans@cumin1001: START - Cookbook sre.hosts.provision for host mc2038.mgmt.codfw.wmnet with reboot policy FORCED
  • 11:34 volans: upgraded spicerack to v1.1.1 on cumin1001,cumin2002
  • 02:11 ejegg: updated fundraising CiviCRM from aa90dd3a to 47dd67f2
  • 00:04 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin1001 - T297986

2021-12-22

  • 23:01 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin1001 - T297986
  • 21:27 volans: uploaded spicerack_1.1.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 18:42 inflatador: T297735 removing/banning elastic1039 and elastic1043 from all EQIAD prod clusters
  • 18:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1088.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:11 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1088.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:51 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1085.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:45 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1085.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:42 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1085.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:42 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1085.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:35 tzatziki: removing one file for legal compliance
  • 17:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:37 mforns@deploy1002: Finished deploy [airflow-dags/analytics@f1522be]: (no justification provided) (duration: 00m 06s)
  • 15:37 mforns@deploy1002: Started deploy [airflow-dags/analytics@f1522be]: (no justification provided)
  • 15:24 mforns@deploy1002: Finished deploy [analytics/refinery@fcf104e] (hadoop-test): Adhoc train for anomaly detection queries TEST [analytics/refinery@fcf104e] (duration: 06m 44s)
  • 15:17 mforns@deploy1002: Started deploy [analytics/refinery@fcf104e] (hadoop-test): Adhoc train for anomaly detection queries TEST [analytics/refinery@fcf104e]
  • 15:17 mforns@deploy1002: Finished deploy [analytics/refinery@fcf104e] (thin): Adhoc train for anomaly detection queries THIN [analytics/refinery@fcf104e] (duration: 00m 07s)
  • 15:17 mforns@deploy1002: Started deploy [analytics/refinery@fcf104e] (thin): Adhoc train for anomaly detection queries THIN [analytics/refinery@fcf104e]
  • 15:15 mforns@deploy1002: Finished deploy [analytics/refinery@fcf104e]: Adhoc train for anomaly detection queries [analytics/refinery@fcf104e] (duration: 21m 25s)
  • 14:54 mforns@deploy1002: Started deploy [analytics/refinery@fcf104e]: Adhoc train for anomaly detection queries [analytics/refinery@fcf104e]
  • 12:31 mutante: LDAP added uid=zabe to group nda (T297323)
  • 02:38 legoktm: restarted zuul on contint2001, was totally stuck. (T298177)

2021-12-21

  • 22:53 eileen: civicrm rev:aa90dd3a conf:449c8de8
  • 21:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1028.eqiad.wmnet with OS buster
  • 21:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1026.eqiad.wmnet with OS buster
  • 21:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1027.eqiad.wmnet with OS buster
  • 21:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1028.eqiad.wmnet with OS buster
  • 21:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1026.eqiad.wmnet with OS buster
  • 21:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1027.eqiad.wmnet with OS buster
  • 21:10 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1026.eqiad.wmnet with OS buster
  • 21:00 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1028.eqiad.wmnet with OS buster
  • 20:59 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1027.eqiad.wmnet with OS buster
  • 20:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1025.eqiad.wmnet with OS buster
  • 20:32 mforns@deploy1002: Finished deploy [airflow-dags/analytics@053bfc0]: (no justification provided) (duration: 00m 06s)
  • 20:32 mforns@deploy1002: Started deploy [airflow-dags/analytics@053bfc0]: (no justification provided)
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1028.eqiad.wmnet with OS buster
  • 20:30 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti1028.eqiad.wmnet with OS buster
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1026.eqiad.wmnet with OS buster
  • 20:30 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti1026.eqiad.wmnet with OS buster
  • 20:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1025.eqiad.wmnet with OS buster
  • 20:29 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti1025.eqiad.wmnet with OS buster
  • 20:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1027.eqiad.wmnet with OS buster
  • 20:29 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti1027.eqiad.wmnet with OS buster
  • 20:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1028.eqiad.wmnet with OS buster
  • 20:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1027.eqiad.wmnet with OS buster
  • 20:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1026.eqiad.wmnet with OS buster
  • 20:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1025.eqiad.wmnet with OS buster
  • 19:58 mforns@deploy1002: Finished deploy [airflow-dags/analytics@e970bd0]: (no justification provided) (duration: 00m 06s)
  • 19:57 mforns@deploy1002: Started deploy [airflow-dags/analytics@e970bd0]: (no justification provided)
  • 19:57 mforns@deploy1002: Finished deploy [airflow-dags/analytics@27a4f7a]: (no justification provided) (duration: 00m 03s)
  • 19:57 mforns@deploy1002: Started deploy [airflow-dags/analytics@27a4f7a]: (no justification provided)
  • 19:53 mforns@deploy1002: Finished deploy [airflow-dags/analytics@27a4f7a]: (no justification provided) (duration: 00m 03s)
  • 19:53 mforns@deploy1002: Started deploy [airflow-dags/analytics@27a4f7a]: (no justification provided)
  • 19:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on graphite1004.eqiad.wmnet with reason: update firmware
  • 19:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 0:40:00 on graphite1004.eqiad.wmnet with reason: update firmware
  • 18:07 mforns@deploy1002: Finished deploy [airflow-dags/analytics-test@27a4f7a]: (no justification provided) (duration: 00m 07s)
  • 18:07 mforns@deploy1002: Started deploy [airflow-dags/analytics-test@27a4f7a]: (no justification provided)
  • 17:44 mutante: LDAP - added uid=spatel to wmf group (T297927)
  • 17:11 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1088.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:06 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1088.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:06 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1085.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:05 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1085.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:04 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:03 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:00 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1085.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:00 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1088.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:00 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1085.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:59 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1088.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:59 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1087.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:57 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1086.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:53 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1087.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:52 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1086.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:49 otto@deploy1002: Finished deploy [airflow-dags/analytics@27a4f7a]: (no justification provided) (duration: 01m 53s)
  • 16:47 otto@deploy1002: Started deploy [airflow-dags/analytics@27a4f7a]: (no justification provided)
  • 16:06 otto@deploy1002: Finished deploy [airflow-dags/analytics-test@fa11cb4]: (no justification provided) (duration: 00m 07s)
  • 16:06 otto@deploy1002: Started deploy [airflow-dags/analytics-test@fa11cb4]: (no justification provided)
  • 16:05 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1085.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:57 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1085.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:57 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:52 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:36 Amir1: running sudo perf record -ag -F 99 -- sleep 3600 on integration-agent-docker-1008 and 1009 (T225730)
  • 15:12 _joe_: pruning docker images on deneb
  • 14:57 _joe_: upgrading php 7.2 everywhere, T297667
  • 14:48 moritzm: installing squashfs-tools security updates
  • 14:20 moritzm: installing lldpd security updates on bullseye
  • 13:41 moritzm: installing vim security updates on bullseye
  • 13:24 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1088.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:19 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1088.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:19 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1087.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:11 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1087.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:58 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1087.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:52 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1087.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:50 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:45 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:45 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1086.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:44 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1086.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:26 moritzm: imported cas 6.4.4.2 to apt.wikimedia.org/buster-wikimedia
  • 11:10 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1086.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:02 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1086.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:00 jynus: reenabled puppet on mx1001 T298038
  • 10:59 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:53 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:35 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1085.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:29 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1085.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:28 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:23 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:13 jynus: disabling puppet on mx1001 T298038
  • 10:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2007.codfw.wmnet
  • 09:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2007.codfw.wmnet
  • 09:38 oblivian@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:38 oblivian@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:37 oblivian@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:37 oblivian@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2007.codfw.wmnet with OS buster
  • 09:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2007.codfw.wmnet with OS buster
  • 08:50 ema: cp3051: pool with single backend experiment reverted T288106
  • 08:45 ema: cp3051: depool to revert single backend experiment T288106
  • 08:29 ema: cp4021: pool with single backend experiment reverted T288106
  • 08:14 ema: cp4021: depool to revert single backend experiment T288106
  • 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:29 legoktm: depooling mw1450
  • 02:27 legoktm: repooling mw1312
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:06 legoktm: depooling mw1312 for benchmarking
  • 00:35 ejegg: updated fundraising CiviCRM from 2826afc3 to e1ffa75a

2021-12-20

  • 23:38 ryankemper: [WCQS Deploy] Deploy complete of version `0.3.97`
  • 23:27 ejegg: updated fundraising CiviCRM from 07efd9fb to 2826afc3
  • 23:26 bking@deploy1002: Finished deploy [wdqs/wdqs@81ee634] (wcqs): Deploy 0.3.97 to WCQS (duration: 02m 46s)
  • 23:23 bking@deploy1002: Started deploy [wdqs/wdqs@81ee634] (wcqs): Deploy 0.3.97 to WCQS
  • 23:23 ejegg: updated SmashPig standalone (IPN listener) from 235a261b to 5a7d0c2c
  • 23:04 inflatador: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 23:04 ryankemper: [WDQS] `ryankemper@wdqs1006:~$ sudo depool` (catching up on ~14.5 hours of lag)
  • 23:04 inflatador: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 23:03 inflatador: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 22:50 bking@deploy1002: Finished deploy [wdqs/wdqs@81ee634]: 0.3.97 (duration: 09m 22s)
  • 22:48 inflatador: [WDQS Deploy] Tests passing following deploy of `0.3.97` on canary `wdqs1003`; proceeding to rest of fleet
  • 22:41 bking@deploy1002: Started deploy [wdqs/wdqs@81ee634]: 0.3.97
  • 22:39 inflatador: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.97`. Pre-deploy tests passing on canary `wdqs1003`
  • 21:47 andrewbogott: deleting the 'exported' rackspace container in IAD -- pretty sure this is left over from wikitech-static DC migration
  • 20:12 mforns@deploy1002: Finished deploy [airflow-dags/analytics@febf1c5] (hadoop-test): (no justification provided) (duration: 00m 03s)
  • 20:12 mforns@deploy1002: Started deploy [airflow-dags/analytics@febf1c5] (hadoop-test): (no justification provided)
  • 20:11 otto@deploy1002: Finished deploy [airflow-dags/analytics@febf1c5] (hadoop-test): (no justification provided) (duration: 00m 07s)
  • 20:11 otto@deploy1002: Started deploy [airflow-dags/analytics@febf1c5] (hadoop-test): (no justification provided)
  • 20:06 otto@deploy1002: Finished deploy [airflow-dags/analytics@febf1c5] (hadoop-test): (no justification provided) (duration: 00m 30s)
  • 20:06 otto@deploy1002: Started deploy [airflow-dags/analytics@febf1c5] (hadoop-test): (no justification provided)
  • 20:06 otto@deploy1002: Finished deploy [airflow-dags/analytics@febf1c5]: (no justification provided) (duration: 00m 06s)
  • 20:06 otto@deploy1002: Started deploy [airflow-dags/analytics@febf1c5]: (no justification provided)
  • 20:04 mforns@deploy1002: Finished deploy [airflow-dags/analytics@febf1c5] (hadoop-test): (no justification provided) (duration: 00m 11s)
  • 20:04 mforns@deploy1002: Started deploy [airflow-dags/analytics@febf1c5] (hadoop-test): (no justification provided)
  • 20:02 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/python3-imagecatalog/imagecatalog_0.0.3-1_amd64.changes
  • 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:44 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Expand $wgLocalVirtualHosts, enable $wgLocalHTTPProxy on Kubernetes (duration: 00m 57s)
  • 19:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:37 otto@deploy1002: Finished deploy [analytics/refinery@e29c9f0] (hadoop-test): (no justification provided) (duration: 05m 07s)
  • 19:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:32 otto@deploy1002: Started deploy [analytics/refinery@e29c9f0] (hadoop-test): (no justification provided)
  • 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:31 otto@deploy1002: Finished deploy [analytics/refinery@e29c9f0] (hadoop-test): (no justification provided) (duration: 00m 22s)
  • 19:30 otto@deploy1002: Started deploy [analytics/refinery@e29c9f0] (hadoop-test): (no justification provided)
  • 19:26 mforns@deploy1002: Finished deploy [analytics/refinery@e29c9f0] (hadoop-test): Add anomaly detection queries TEST [analytics/refinery@e29c9f0] (duration: 00m 04s)
  • 19:26 mforns@deploy1002: Started deploy [analytics/refinery@e29c9f0] (hadoop-test): Add anomaly detection queries TEST [analytics/refinery@e29c9f0]
  • 19:25 mforns@deploy1002: Finished deploy [analytics/refinery@e29c9f0] (hadoop-test): Add anomaly detection queries TEST [analytics/refinery@e29c9f0] (duration: 00m 05s)
  • 19:24 mforns@deploy1002: Started deploy [analytics/refinery@e29c9f0] (hadoop-test): Add anomaly detection queries TEST [analytics/refinery@e29c9f0]
  • 19:18 mforns@deploy1002: Finished deploy [analytics/refinery@e29c9f0] (hadoop-test): Add anomaly detection queries TEST [analytics/refinery@e29c9f0] (duration: 00m 09s)
  • 19:18 mforns@deploy1002: Started deploy [analytics/refinery@e29c9f0] (hadoop-test): Add anomaly detection queries TEST [analytics/refinery@e29c9f0]
  • 18:47 jynus: reenabling puppet on mx servers T298038
  • 18:46 mforns@deploy1002: Started deploy [analytics/refinery@e29c9f0] (hadoop-test): Add anomaly detection queries TEST [analytics/refinery@e29c9f0]
  • 18:43 mforns@deploy1002: Started deploy [analytics/refinery@e29c9f0] (hadoop-test): Add anomaly detection queries TEST [analytics/refinery@e29c9f0]
  • 18:43 mforns@deploy1002: Finished deploy [analytics/refinery@e29c9f0] (thin): Add anomaly detection queries THIN [analytics/refinery@e29c9f0] (duration: 00m 07s)
  • 18:43 mforns@deploy1002: Started deploy [analytics/refinery@e29c9f0] (thin): Add anomaly detection queries THIN [analytics/refinery@e29c9f0]
  • 18:42 mforns@deploy1002: Finished deploy [analytics/refinery@e29c9f0]: Add anomaly detection queries [analytics/refinery@e29c9f0] (duration: 23m 07s)
  • 18:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 18:38 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 18:19 mforns@deploy1002: Started deploy [analytics/refinery@e29c9f0]: Add anomaly detection queries [analytics/refinery@e29c9f0]
  • 17:58 jynus: reloading exim configuration with extra rule on mx2001 T298038
  • 17:46 jynus: disabling puppet on mail servers T298038
  • 17:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2040.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2040.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:04 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:59 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:43 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2038.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 16:36 volans@cumin2002: START - Cookbook sre.hosts.provision for host mc2038.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 16:21 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:16 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 16:03 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:58 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 15:38 urbanecm: Deploy security patch for T298019
  • 14:42 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:39 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 14:13 moritzm: installing wireshark security updates on buster
  • 13:33 moritzm: fail over master in codfw to ganeti2021 T296622
  • 11:29 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti1025.eqiad.wmnet with OS buster
  • 11:24 volans@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1025.eqiad.wmnet with OS buster
  • 09:56 moritzm: switch ml-etcd2001 to DRBD storage to allow eventual migration for reimage of ganeti2019
  • 09:56 marostegui: Stop mysql on db2135 to check new haproxy on bullseye T295965
  • 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: switch to drbd storage
  • 09:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: switch to drbd storage
  • 09:46 marostegui: Stop mysql on db2078:3325 to check new haproxy on bullseye T295965
  • 09:35 moritzm: switch kubetcd2006 to DRBD storage to allow eventual migration for reimage of ganeti2019
  • 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to drbd storage
  • 09:28 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to drbd storage
  • 09:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2004.codfw.wmnet with OS bullseye
  • 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2015.codfw.wmnet
  • 09:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2015.codfw.wmnet with OS buster
  • 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2004.codfw.wmnet with OS bullseye
  • 08:40 moritzm: updated bullseye installer images for 11.2 point release
  • 08:14 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2015.codfw.wmnet with OS buster
  • 07:39 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy2004.codfw.wmnet with OS bullseye
  • 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2004.codfw.wmnet with OS bullseye
  • 07:08 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy2004.codfw.wmnet with OS bullseye
  • 06:41 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2004.codfw.wmnet with OS bullseye

2021-12-19

  • 17:10 Amir1: restart apache2 on lists1001 (T293826)

2021-12-18

  • 13:57 dcausse: restarting blazegraph on wdqs1013 (jvm stuck for 10hours)

2021-12-17

  • 23:14 ryankemper: T297986 Beep boop testing 1 2 3 disregard me
  • 23:13 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 23:13 ryankemper: T297910 foobar testing 1 2 3
  • 23:12 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 23:09 Reedy: Testing T297987
  • 23:07 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 22:30 bblack@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host prometheus6001.drmrs.wmnet
  • 21:28 bblack@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus6001.drmrs.wmnet
  • 21:21 bblack@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus6001.drmrs.wmnet
  • 21:17 bblack@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus6001.drmrs.wmnet
  • 21:08 legoktm: repooling wtp1025
  • 20:56 mutante: puppetmaster - revoking and recreating TLS cert for miscweb one more time because "tendril-static" isn't "static-tendril" ;Pp
  • 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:34 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Set $wgMaxImageArea = false; (T291014) (duration: 00m 59s)
  • 19:46 mutante: adding dbtree.wikimedia.org and tendril.wikimedia.org to TLS cert for webserver-misc-apps.discovery.wmnet - recreating cert T297605
  • 19:44 ryankemper: T297910 `ryankemper@mwmaint1002:~$ sudo modify-ldap-group wmf` to add `bking`
  • 19:43 ryankemper: T297910 `ryankemper@mwmaint1002:~$ sudo modify-ldap-group ops` to add `bking`
  • 19:39 mutante: puppetmaster1001 - sudo puppet cert clean webserver-misc-apps.discovery.wmnet - Revoked certificate with serial 8502
  • 19:26 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 19:15 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/python3-imagecatalog/imagecatalog_0.0.2-1_amd64.changes
  • 19:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti1025.eqiad.wmnet with OS buster
  • 18:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1025.eqiad.wmnet with OS buster
  • 17:58 bblack@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install6001.wikimedia.org
  • 17:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 17:43 bblack@cumin1001: START - Cookbook sre.ganeti.makevm for new host install6001.wikimedia.org
  • 17:24 milimetric@deploy1002: Finished deploy [analytics/refinery@0778d1e] (thin): Proper fix for mediawiki_skin_diff [THIN] (duration: 00m 06s)
  • 17:24 milimetric@deploy1002: Started deploy [analytics/refinery@0778d1e] (thin): Proper fix for mediawiki_skin_diff [THIN]
  • 17:21 bblack: bast6001: shutdown->start (again)
  • 17:20 milimetric@deploy1002: Finished deploy [analytics/refinery@0778d1e]: Proper fix for mediawiki_skin_diff (duration: 20m 45s)
  • 17:07 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 16:59 milimetric@deploy1002: Started deploy [analytics/refinery@0778d1e]: Proper fix for mediawiki_skin_diff
  • 16:59 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 16:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 16:53 bblack: bast6001: shutdown->start
  • 16:44 bblack: ganeti6003 - rebooting
  • 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 16:33 godog: remove /var/log/swift/server.log.1 from thanos-be* - T297959
  • 16:29 milimetric@deploy1002: Finished deploy [analytics/refinery@5c3bce1] (thin): Fix refine sanitize allowlist, remove mediawiki_skin_diff schema for now [THIN] (duration: 00m 07s)
  • 16:29 milimetric@deploy1002: Started deploy [analytics/refinery@5c3bce1] (thin): Fix refine sanitize allowlist, remove mediawiki_skin_diff schema for now [THIN]
  • 16:28 bblack: reboot bast6001 (downtimed)
  • 16:26 milimetric@deploy1002: Finished deploy [analytics/refinery@5c3bce1]: Fix refine sanitize allowlist, remove mediawiki_skin_diff schema for now (duration: 69m 48s)
  • 16:02 godog: root@thanos-be2004:/srv/log/swift# rm server.log.1 - T297959
  • 15:35 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:35 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 15:35 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:34 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:34 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 15:33 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:16 milimetric@deploy1002: Started deploy [analytics/refinery@5c3bce1]: Fix refine sanitize allowlist, remove mediawiki_skin_diff schema for now
  • 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2001.codfw.wmnet with OS buster
  • 14:35 milimetric@deploy1002: Finished deploy [analytics/refinery@e9f04c3] (hadoop-test): Fix sanitize allowlist problem [TEST] (duration: 69m 41s)
  • 14:22 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2001.codfw.wmnet with OS buster
  • 13:25 milimetric@deploy1002: Started deploy [analytics/refinery@e9f04c3] (hadoop-test): Fix sanitize allowlist problem [TEST]
  • 13:20 milimetric@deploy1002: Finished deploy [analytics/refinery@e9f04c3] (thin): Fix sanitize allowlist problem [THIN] (duration: 00m 07s)
  • 13:20 milimetric@deploy1002: Started deploy [analytics/refinery@e9f04c3] (thin): Fix sanitize allowlist problem [THIN]
  • 13:20 milimetric@deploy1002: Finished deploy [analytics/refinery@e9f04c3]: Fix sanitize allowlist problem (duration: 26m 19s)
  • 13:06 mmandere@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast6001.wikimedia.org
  • 13:04 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=druid1008.eqiad.wmnet,service=druid-public-broker
  • 13:02 btullis: upgraded druid packages on druid1008
  • 13:01 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: name=druid1008.eqiad.wmnet,service=druid-public-broker
  • 13:00 btullis: upgraded druid packages on an-druid1005
  • 12:58 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=druid1007.eqiad.wmnet,service=druid-public-broker
  • 12:55 btullis: upgrading druid packages on druid1007
  • 12:54 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: name=druid1007.eqiad.wmnet,service=druid-public-broker
  • 12:53 milimetric@deploy1002: Started deploy [analytics/refinery@e9f04c3]: Fix sanitize allowlist problem
  • 12:52 btullis: upgrading druid packages on an-druid1004
  • 12:51 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=druid1006.eqiad.wmnet,service=druid-public-broker
  • 12:50 btullis: upgrading druid packages on druid1006
  • 12:48 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: name=druid1006.eqiad.wmnet,service=druid-public-broker
  • 12:47 btullis: Upgrading druid packages on an-druid1003
  • 12:44 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=druid1005.eqiad.wmnet,service=druid-public-broker
  • 12:43 btullis: Upgraded druid packages on druid1005.
  • 12:43 btullis: Upgraded druid packages on an-druid1002.
  • 12:43 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host bast6001.wikimedia.org
  • 12:41 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: name=druid1005.eqiad.wmnet,service=druid-public-broker
  • 12:40 mmandere@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host bast6001.wikimedia.org
  • 12:37 btullis: upgrading druid packages on an-druid1002
  • 12:32 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host bast6001.wikimedia.org
  • 12:31 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=druid1004.eqiad.wmnet,service=druid-public-broker
  • 12:31 btullis: Upgraded druid packages on druid1004
  • 12:07 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: name=druid1004.eqiad.wmnet,service=druid-public-broker
  • 11:42 btullis: Upgrading druid packages on an-druid1001.
  • 11:18 btullis: updating reprepro with new druid packages for buster-wikimedia to pick up new log4j jar files
  • 10:26 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1010.eqiad.wmnet
  • 09:57 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:57 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2002.codfw.wmnet with OS buster
  • 09:51 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:50 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:37 dcausse: restart blazegraph on wdqs1007 (jvm stuck for 6hours)
  • 09:23 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2002.codfw.wmnet with OS buster
  • 09:04 godog: previous message refers to ms-be2065
  • 09:03 godog: set sdq as offline, showing errors. megacli -PDOffline -PhysDrv '[32:14]' -aALL
  • 08:50 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:30 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:15 legoktm: repooling mw1456
  • 02:07 legoktm: depooling wtp1025 for benchmarking (T297259)
  • 00:48 brennen: end of UTC late backport and config window
  • 00:45 brennen@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/MediaSearch/templates/SERPWidget.mustache: Backport: Don't boot users with title="Special:MediaSearch" back to old search page (T297877) (duration: 00m 57s)
  • 00:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:35 brennen@deploy1002: Synchronized logos/config.yaml: Config: Change logo in abwiki (T297810) (duration: 00m 57s)
  • 00:34 brennen@deploy1002: Synchronized wmf-config/logos.php: Config: Change logo in abwiki (T297810) (duration: 00m 56s)
  • 00:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:33 brennen@deploy1002: Synchronized static/images/project-logos/: Config: Change logo in abwiki (T297810) (duration: 00m 57s)
  • 00:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:31 brennen@deploy1002: Synchronized static/images/project-logos/abwiki-1.5x.png: Config: Change logo in abwiki (T297810) (duration: 00m 57s)
  • 00:17 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix wordmark to outreachwiki (T297580) (duration: 00m 57s)
  • 00:16 brennen@deploy1002: Synchronized static/images/mobile/copyright/outreach-wordmark.svg: Config: Fix wordmark to outreachwiki (T297580) (duration: 00m 57s)
  • 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-12-16

  • 23:08 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@4552dff] (codfw): Move maxzoom configuration to the proper field (duration: 01m 28s)
  • 23:06 mbsantos@deploy1002: Started deploy [kartotherian/deploy@4552dff] (codfw): Move maxzoom configuration to the proper field
  • 23:06 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@4552dff] (eqiad): Move maxzoom configuration to the proper field (duration: 02m 31s)
  • 23:04 mbsantos@deploy1002: Started deploy [kartotherian/deploy@4552dff] (eqiad): Move maxzoom configuration to the proper field
  • 22:57 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@2dc8b8b] (codfw): Update kartotherian-package to e843e8f (duration: 02m 23s)
  • 22:55 mbsantos@deploy1002: Started deploy [kartotherian/deploy@2dc8b8b] (codfw): Update kartotherian-package to e843e8f
  • 22:54 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@2dc8b8b] (eqiad): Update kartotherian-package to e843e8f (duration: 02m 27s)
  • 22:52 mbsantos@deploy1002: Started deploy [kartotherian/deploy@2dc8b8b] (eqiad): Update kartotherian-package to e843e8f
  • 22:13 ejegg: updated fundraising CiviCRM from d4cea6a9 to 07efd9fb
  • 22:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti2007.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 22:01 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti2007.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 21:29 sbassett: Reverted previous mitigation for T297416
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:19 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.13 refs T293954
  • 19:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:58 urbanecm: UTC evening B&C window done
  • 19:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/GrowthExperiments/includes/Mentorship/MentorPageMentorManager.php: b8e64fe: MentorManager: Only invalidate cache when mentor list exists (T297827) (duration: 01m 06s)
  • 19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:51 legoktm: depooling mw1456 for benchmarking (T297259)
  • 19:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:31 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/Wikibase/: 7799383: bridge: Reenable scrolling by mounting into parent (duration: 01m 12s)
  • 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:23 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enwiki config: remove autopatrol from sysop (T297058) (duration: 01m 06s)
  • 19:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:17 ejegg: updated standalone SmashPig (IPN listener) from 9e885819 to 235a261b
  • 19:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:13 catrope@deploy1002: Synchronized wmf-config: Config: Enable VectorLanguageInMainPageHeader on main page (T293470) (duration: 01m 06s)
  • 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:52 mmandere@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=93) for new host bast6001.wikimedia.org
  • 18:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:49 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.13/includes/Revision/RevisionStore.php: Backport: Revision: Bypass checking the cache if it's not found (duration: 01m 06s)
  • 18:48 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/includes/Revision/RevisionStore.php: Backport: Revision: Bypass checking the cache if it's not found (duration: 01m 06s)
  • 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:05 hashar@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.13 refs T293954 (duration: 01m 05s)
  • 18:03 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.13 refs T293954
  • 17:29 cwhite: pruned jndilookup.class from log4j-core on logstash 7 instances T297468
  • 17:29 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host bast6001.wikimedia.org
  • 17:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:56 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/includes/Revision/RevisionStore.php: Backport: Revision: Add two caching layers to loadSlotRecords for template pages (T297147) (duration: 01m 06s)
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:55 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.13/includes/Revision/RevisionStore.php: Backport: Revision: Add two caching layers to loadSlotRecords for template pages (T297147) (duration: 01m 06s)
  • 16:49 cwhite: pruned jndilookup.class from log4j-core on logstash 5 instances T297468
  • 16:48 mmandere@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host bast6001.wikimedia.org
  • 16:48 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host bast6001.wikimedia.org
  • 16:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1026.eqiad.wmnet with OS buster
  • 16:47 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1025.eqiad.wmnet with OS buster
  • 16:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1028.eqiad.wmnet with OS buster
  • 16:39 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1027.eqiad.wmnet with OS buster
  • 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:32 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/intersection: Backport: Set a maximum allowed time for db queries (T297708) (duration: 01m 06s)
  • 16:30 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/intersection: Backport: Set a maximum allowed time for db queries (T297708) (duration: 01m 05s)
  • 16:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 16:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus1006.eqiad.wmnet with OS bullseye
  • 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:25 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/MediaSearch/: Backport: Filter out non-string keys/values from query string before using (T297828) (duration: 01m 06s)
  • 16:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus1005.eqiad.wmnet with OS bullseye
  • 16:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1028.eqiad.wmnet with OS buster
  • 16:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1027.eqiad.wmnet with OS buster
  • 16:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1026.eqiad.wmnet with OS buster
  • 16:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1025.eqiad.wmnet with OS buster
  • 16:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 16:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host prometheus1006.eqiad.wmnet with OS bullseye
  • 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host prometheus1005.eqiad.wmnet with OS bullseye
  • 15:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/Wikibase/client/data-bridge/: Backport: bridge: fix terms of service and copyright missing (duration: 01m 06s)
  • 15:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:42 elukey: shutdown kafka-main2002 for BIOS+NIC firmware upgrades
  • 15:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:36 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Gradual roll out of $wgMaxExecutionTimeForExpensiveQueries (T297708) (duration: 01m 06s)
  • {{safesubst:SAL entry|1=15:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/includes/: Backport: [[gerrit:747696|Allow setting max execution time to several special pages (T297708)], Part II (duration: 01m 11s)}}
  • {{safesubst:SAL entry|1=15:34 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/includes/DefaultSettings.php: Backport: [[gerrit:747696|Allow setting max execution time to several special pages (T297708)], Part I (duration: 01m 05s)}}
  • {{safesubst:SAL entry|1=15:32 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.13/includes/: Backport: [[gerrit:747695|Allow setting max execution time to several special pages (T297708)], Part II (duration: 01m 12s)}}
  • {{safesubst:SAL entry|1=15:30 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.13/includes/DefaultSettings.php: Backport: [[gerrit:747695|Allow setting max execution time to several special pages (T297708)], Part I (duration: 01m 06s)}}
  • 15:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:26 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:25 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:24 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:24 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:19 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:19 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:15 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:07 volans@cumin1001: START - Cookbook sre.hosts.provision for host elastic1084.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:03 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/includes/libs/rdbms/database/: Backport: rdbms: add query timeout support to Database::select() (T129093 T195792) (duration: 01m 11s)
  • 15:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
  • 14:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
  • 14:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:55 elukey: shutdown kafka-main2001 for BIOS+NIC firmware upgrades
  • 14:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:44 moritzm: drain primary/secondary instances off ganeti2007 T296622
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:33 volans: upgraded spicerack to v1.1.0 on cumin[1001,2001]
  • 13:27 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@e36c241] (eqiad): Change osm-intl and osm source to get MVT from Tegola (Full production for Tegola) (duration: 01m 39s)
  • 13:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@e36c241] (eqiad): Change osm-intl and osm source to get MVT from Tegola (Full production for Tegola)
  • 13:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@e36c241] (codfw): (no justification provided) (duration: 03m 12s)
  • 13:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@e36c241] (codfw): (no justification provided)
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
  • 13:17 volans: uploaded spicerack_1.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 12:45 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:39 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:34 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:33 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:14 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable WelcomeSurvey Interaction schema (T267273 T297858) (duration: 01m 07s)
  • 12:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 12:11 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 11:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2003.codfw.wmnet with OS buster
  • 10:59 btullis: pushed new packages for druid version 0.19.0-2 on buster using reprepro
  • 10:31 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2003.codfw.wmnet with OS buster
  • 10:28 elukey: second attempt to reimage kafka-main2003 to buster
  • 10:09 moritzm: drain primary/secondary instances off ganeti2007 T296622
  • 10:04 moritzm: switched kubetcd2004 to DRBD-based storage to allow migration for reimages
  • 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2004.codfw.wmnet with reason: switch to drbd storage
  • 09:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2004.codfw.wmnet with reason: switch to drbd storage
  • 09:46 moritzm: added ganeti2028 to ganeti codfw cluster T294139
  • 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti2015.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 09:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti2015.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2024.codfw.wmnet with OS buster
  • 08:56 moritzm: drain primary/secondary instances off ganeti2015 T296622
  • 08:43 moritzm: switch ml-etcd2003 to DRBD-based storage to allow migration for reimages
  • 08:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:31 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/GrowthExperiments/: 35c055c: MentorPageMentorManager: Do not fail hard with no mentor list configured (T297827) (duration: 01m 09s)
  • 08:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2024.codfw.wmnet with OS buster
  • 08:07 dcausse: restart blazegraph on wdqs1013 (jvm stuck for 4hours)
  • 03:27 hoo: Stopped rebuildItemsPerSite on mwmaint1002 (was slightly beyond item Q72056756), as it has a memory leak (and would OOM in a few days)
  • 01:53 mutante: miscweb1002 / miscweb2002 - both backends 'PASS: 26 requests sent to miscweb1002.eqiad.wmnet. All assertions passed.' again after fixing httpbb tests and T297605
  • 01:50 mutante: miscweb1002 - re-enabling puppet after deployment for T297605
  • 01:03 legoktm: removing current dump from static-codereview to replace it with a new one
  • 00:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:22 legoktm: upgraded php7.2 on mw1414 for mysqlnd memory leak fix part 2 (T297667)
  • 00:19 legoktm: uploaded 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf5 to buster-wikimedia for T297667
  • 00:18 ejegg: updated payments-wiki from df3ded67 to 55e605dd
  • 00:16 mutante: miscweb1002 - disable puppet, deploying gerrit:747600 on miscweb2002 first, indeed puppet problem detected T297605
  • 00:05 legoktm: published new versions of php7.{2,4}-fpm-multiversion-base image with php-yaml extension (T296331)

2021-12-15

  • 23:38 milimetric@deploy1002: Finished deploy [analytics/refinery@0d74de0] (thin): Pushing 0.1.23 for SparkSQLNCLIDriver job (THIN) (duration: 00m 07s)
  • 23:37 milimetric@deploy1002: Started deploy [analytics/refinery@0d74de0] (thin): Pushing 0.1.23 for SparkSQLNCLIDriver job (THIN)
  • 23:26 milimetric@deploy1002: Finished deploy [analytics/refinery@0d74de0]: Pushing 0.1.23 for SparkSQLNCLIDriver job (duration: 15m 35s)
  • 23:10 milimetric@deploy1002: Started deploy [analytics/refinery@0d74de0]: Pushing 0.1.23 for SparkSQLNCLIDriver job
  • 23:10 milimetric@deploy1002: Started deploy [analytics/refinery@0d74de0]: Pushing 0.1.23 for SparkSQLNCLIDriver job
  • 22:50 legoktm: installing php-yaml on parsoid, jobrunners and maint servers
  • 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:52 hashar@deploy1002: Synchronized php-1.38.0-wmf.13/includes/skins/Skin.php: Remove migration script - T297484 (duration: 01m 06s)
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:46 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.13 refs T293954
  • 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:04 hashar@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.13 refs T293954 (duration: 01m 05s)
  • 20:03 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.13 refs T293954
  • 19:52 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-test-coord1002.eqiad.wmnet with OS buster
  • 19:45 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS buster
  • 19:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:27 hashar: UTC evening backport window completed
  • 19:25 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable tegola on enwiki T2980767 (duration: 01m 06s)
  • 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2024.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 19:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2024.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-test-coord1002.eqiad.wmnet with OS buster
  • 18:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS buster
  • 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:50 dancy@deploy1002: Finished scap: testing (duration: 02m 01s)
  • 17:48 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1016.eqiad.wmnet with OS buster
  • 17:48 dancy@deploy1002: Started scap: testing
  • 17:47 elukey: kafka-main2003 up and running (dcops maintenance done)
  • 17:44 jayme: deployed imagecatalog RBAC rules to all k8s clusters - T287130
  • 17:44 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:43 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 17:43 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 17:43 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 17:43 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 17:43 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 17:43 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 17:42 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 17:40 dancy@deploy1002: Finished scap: testing (duration: 02m 07s)
  • 17:38 dancy@deploy1002: Started scap: testing
  • 17:37 dancy@deploy1002: Synchronized README: testing (duration: 01m 06s)
  • 17:37 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:37 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:35 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:33 Amir1: removing grant on letter a on all of s3 hosts (T296537)
  • 17:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2008.codfw.wmnet with OS buster
  • 17:32 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 17:25 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS buster
  • 16:42 moritzm: import wmf-log4j 2.16.0-1 for stretch-wikimedia (stub package to provide log4j jars for the ELK5 cluster)
  • 16:39 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@132455b] (eqiad): apply overzoom on tegola (duration: 02m 33s)
  • 16:36 mbsantos@deploy1002: Started deploy [kartotherian/deploy@132455b] (eqiad): apply overzoom on tegola
  • 16:34 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@132455b] (codfw): apply overzoom on tegola (duration: 04m 11s)
  • 16:32 ladsgroup:: Deployed security patch for T297731
  • 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
  • 16:30 mbsantos@deploy1002: Started deploy [kartotherian/deploy@132455b] (codfw): apply overzoom on tegola
  • 16:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 16:14 Deployed: security patch for T297731
  • 16:12 elukey: shutdown kafka-main2003 to allow work for DCops (firmware upgrade)
  • 16:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host backup2008.codfw.wmnet with OS buster
  • 15:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:11 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:48 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 14:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:46 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:46 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:44 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:44 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:44 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:23 kormat: uploaded wmfdb 0.1.1 to apt.wm.o for buster+bullseye T297618
  • 14:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:15 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/WikimediaMaintenance/blameStartupRegistry.php: Backport: blameStartupRegistry: Fix clash in $startupBytes variable name (T295413) (duration: 01m 07s)
  • 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 13:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 13:29 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:28 kormat: uploaded wmfdb 0.1 to apt.wm.o for buster+bullseye T297618
  • 13:24 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:50 moritzm: drain primary/secondary instances off ganeti2024 T296622
  • 12:43 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: T297454
  • 12:43 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: T297454
  • 12:43 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: T297454
  • 12:43 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: T297454
  • 12:40 Lucas_WMDE: UTC morning backport+config window done
  • 12:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Lexeme Lua access on first four wikis (T294159) happy holidays :) (duration: 01m 06s)
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 12:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove redundant project namespace aliases (T296643) (no-op) (duration: 01m 07s)
  • 12:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:08 moritzm: added ganeti2025 to codfw ganeti cluster T282603
  • 11:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 11:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 11:39 _joe_: repooling mw1414 T297667
  • 11:36 _joe_: upgrading php7.2 on mw1414, T297667
  • 11:35 _joe_: uploading php 7.2 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf4 to buster T297667
  • 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
  • 11:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
  • 10:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:00 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:53 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 09:43 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:42 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:28 vgutierrez: pool cp4025 - T271421
  • 09:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2018.codfw.wmnet with OS buster
  • 08:44 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2018.codfw.wmnet with OS buster
  • 07:04 marostegui: Enable full_crc32 on db2094 (s1, s3, s5 and s8) T287244
  • 05:48 eileen: revision 1ede5365 -> d4cea6a9 civicrm
  • 05:07 eileen: revision d0ac9184 -> 1ede5365 civicrm
  • 00:59 dancy@deploy1002: Finished scap: testing (duration: 03m 38s)
  • 00:55 dancy@deploy1002: Started scap: testing
  • 00:52 dancy@deploy1002: Synchronized /: testing (duration: 00m 37s)
  • 00:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:31 catrope@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/MediaSearch/resources/store/index.js: Backport: Remove multiple instance of VUEX initialization (T297690) (duration: 01m 04s)
  • 00:29 catrope@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/MediaSearch/resources/components/SearchResults.vue: Backport: Don't attempt to scroll to a non-existing result (duration: 01m 05s)
  • 00:28 catrope@deploy1002: Synchronized php-1.38.0-wmf.13/includes/: Backport: Revert "Replace deprecated methods IContextSource::getWikiPage && IContextSource::canUseWikiPage" (T297744) (duration: 01m 12s)
  • 00:26 catrope@deploy1002: Synchronized php-1.38.0-wmf.12/includes/: Backport: Revert "Replace deprecated methods IContextSource::getWikiPage && IContextSource::canUseWikiPage" (T297744) (duration: 01m 11s)
  • 00:01 bblack: lvs1015: start pybal, back to normal

2021-12-14

  • 23:49 bblack: lvs1015 (internal services) - disabling pybal, will fail over traffic to lvs1020 (to test lvs1020 sanity)
  • 23:44 bblack: lvs1013 (text) restart pybal, back to normal
  • 23:28 bblack: lvs1013 (text) - disabling pybal, will fail over traffic to lvs1020 (to test lvs1020 sanity)
  • 23:26 bblack: lvs1014 (upload) restart pybal, back to normal
  • 23:15 bblack: lvs1014 (upload) - disabling pybal, will over traffic to lvs1020 (to test lvs1020 sanity)
  • 23:10 legoktm: deploying patch for T297416
  • 21:18 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.13 refs T293954
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:09 hashar@deploy1002: Finished scap: testwiki to php-1.38.0-wmf.13 and rebuild l10n cache (duration: 33m 47s)
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:35 hashar@deploy1002: Started scap: testwiki to php-1.38.0-wmf.13 and rebuild l10n cache
  • 20:34 urbanecm: Manually rollback group0 to wmf.12 by running `sudo -u mwdeploy cp /srv/mediawiki-staging/wikiversions.json /srv/mediawiki/wikiversions.json && scap wikiversions-compile && cp /srv/mediawiki/wikiversions.php /srv/mediawiki-staging/wikiversions.php && scap sync-file --force wikiversions.php 'rollback group0'`
  • 20:34 hashar: Group 0 wikis are available again and still on 1.38.0-wmf.12
  • 20:31 urbanecm@deploy1002: Synchronized wikiversions.php: rollback group0 (duration: 00m 41s)
  • 20:28 hashar: group0 wikis (eg mediawiki.org) are unavailable due to a deployment issue. We are working on it # T293954
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:16 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.13 refs T293954
  • 20:15 eileen: a88cd178 -> d0ac9184
  • 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e127f4c: zhwiki: Promote Growth features out of dark mode (T287884) (duration: 00m 57s)
  • 19:54 urbanecm: UTC evening B&C window done
  • 19:53 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.12/skins/Vector/resources/skins.vector.es6/AB.js: 62e84e7: Prevent A/B test enrollment hook from firing for unsampled (T297662) (duration: 00m 56s)
  • 19:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 40f0cff: VE on zh.wiki: Enable single-edit-tab mode, and other config like en.wiki (T296269) (duration: 00m 57s)
  • 19:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7f4ae4c: kartographer: Enable tegola on jawiki (T280767) (duration: 00m 58s)
  • 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic rolling restart - ryankemper@cumin1001 - T297468
  • 19:18 bblack: lvs1020 - rebooting on new config
  • 19:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:08 milimetric@deploy1002: Finished deploy [analytics/refinery@92c63c9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@92c63c9] (duration: 06m 54s)
  • 19:01 milimetric@deploy1002: Started deploy [analytics/refinery@92c63c9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@92c63c9]
  • 19:01 milimetric@deploy1002: Finished deploy [analytics/refinery@92c63c9] (thin): Regular analytics weekly train THIN [analytics/refinery@92c63c9] (duration: 00m 07s)
  • 19:01 milimetric@deploy1002: Started deploy [analytics/refinery@92c63c9] (thin): Regular analytics weekly train THIN [analytics/refinery@92c63c9]
  • 18:59 bblack: lvs1020: running puppet agent with lvs role + config for first time
  • 18:58 milimetric@deploy1002: Finished deploy [analytics/refinery@92c63c9]: Regular analytics weekly train [analytics/refinery@92c63c9] (duration: 19m 49s)
  • 18:40 bblack: lvs1016: puppet agent disabled, pybal stopped
  • 18:39 bblack: lvs1016: downtimed for attempt at moving its role to lvs1020 (expect a few minor related alerts, such as BGP ones for eqiad routers)
  • 18:38 milimetric@deploy1002: Started deploy [analytics/refinery@92c63c9]: Regular analytics weekly train [analytics/refinery@92c63c9]
  • 18:34 majavah: deployed updated patch for T297322
  • 18:28 ryankemper: T297468 [Elastic] `ryankemper@relforge1003:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service logstash.service`
  • 18:25 otto@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main,name=codfw
  • 18:25 ottomata: repooling eventgate-main discovery to include codfw - T296699 - confctl --object-type discovery select 'dnsdisc=eventgate-main,name=codfw' set/pooled=true
  • 18:21 ryankemper: T297468 [Elastic] Performing manual rolling restart of `relforge`. Starting with `ryankemper@relforge1004:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service logstash.service` (non-master node)
  • 18:17 ryankemper: T297468 `sudo cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic rolling restart" --nodes-per-run 3 --start-datetime 2021-12-14T01:27:58 --task-id T297468` on `ryankemper@cumin1001` tmux `elastic_restarts`
  • 18:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic rolling restart - ryankemper@cumin1001 - T297468
  • 17:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1015.eqiad.wmnet
  • 17:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1015.eqiad.wmnet
  • 17:48 mutante: people2002 - apt-get install --reinstall linux-image-5.10.0-9-amd64 to fix Icinga DPKG alert
  • 17:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1014.eqiad.wmnet
  • 17:41 topranks: Temporarily deactivated BGP peering to AS8932 at AMS-IX (cr2-esams) as peer is constantly tripping max-prefix configuration for a few days, and according to peeringdb they should be within limit.
  • 17:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1014.eqiad.wmnet
  • 17:39 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:38 mutante: aphlict1001 - (Phabricator realtime notifications) - out of disk, attempting to gzip a large log
  • 17:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:35 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 17:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1013.eqiad.wmnet
  • 17:31 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 17:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1013.eqiad.wmnet
  • 17:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1012.eqiad.wmnet
  • 17:23 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:23 mutante: elastic1043 is down and alerting since > 6h
  • 17:21 mutante: icinga - re-enabling active monitoring checks on mx2001 (T297128)
  • 17:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1012.eqiad.wmnet
  • 17:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1011.eqiad.wmnet
  • 17:10 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:06 hnowlan@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: name=restbase2026.codfw.wmnet
  • 16:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mirror1001.wikimedia.org with OS bullseye
  • 16:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:43 Amir1: rolling restart of php-fpm on all mediawiki hosts (T297517 T297667)
  • 16:33 jhathaway@cumin1001: START - Cookbook sre.hosts.reimage for host mirror1001.wikimedia.org with OS bullseye
  • 16:30 jhathaway@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mirror1001.wikimedia.org with OS bullseye
  • 16:30 jhathaway@cumin1001: START - Cookbook sre.hosts.reimage for host mirror1001.wikimedia.org with OS bullseye
  • 16:24 jhathaway@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mirror1001.wikimedia.org with OS bullseye
  • 16:24 jhathaway@cumin1001: START - Cookbook sre.hosts.reimage for host mirror1001.wikimedia.org with OS bullseye
  • 16:21 jhathaway@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mirror1001.wikimedia.org with OS bullseye
  • 16:21 jhathaway@cumin1001: START - Cookbook sre.hosts.reimage for host mirror1001.wikimedia.org with OS bullseye
  • 16:20 accraze@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 16:00 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1011.eqiad.wmnet
  • 15:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1010.eqiad.wmnet
  • 15:54 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/includes/cache/LinkCache.php: Backport: cache: Add four fields to LinkCache::getSelectFields (T297669) (duration: 00m 57s)
  • 15:53 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1010.eqiad.wmnet
  • 15:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:50 moritzm: drain primary/secondary instances off ganeti2023 T296622
  • 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:49 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host aqs1010.eqiad.wmnet
  • 15:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1010.eqiad.wmnet
  • 15:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2018.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 15:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2018.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 15:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:21 hashar@deploy1002: Finished scap: Push wmf.13 without promoting any wikis (duration: 29m 31s)
  • 15:15 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1019.eqiad.wmnet with OS buster
  • 15:13 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1018.eqiad.wmnet with OS buster
  • 15:09 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cp4025.ulsfo.wmnet with OS buster
  • 14:52 hashar@deploy1002: Started scap: Push wmf.13 without promoting any wikis
  • 14:49 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1019.eqiad.wmnet with OS buster
  • 14:47 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1018.eqiad.wmnet with OS buster
  • 14:16 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/includes/OutputPage.php: Backport: Reuse the query result in addCategoryLinks instead of relying on cache (T297669) (duration: 00m 57s)
  • 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:02 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4025.ulsfo.wmnet with OS buster
  • 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T277354)', diff saved to https://phabricator.wikimedia.org/P18229 and previous config saved to /var/cache/conftool/dbconfig/20211214-135601-marostegui.json
  • 13:55 Lucas_WMDE: Deployed patch for T297570
  • 13:51 vgutierrez: depool cp4025 to be reimaged as cache::upload_envoy - T271421
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18228 and previous config saved to /var/cache/conftool/dbconfig/20211214-134056-marostegui.json
  • 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 13:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18227 and previous config saved to /var/cache/conftool/dbconfig/20211214-132551-marostegui.json
  • 13:16 moritzm: installing libsamplerate security updates
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T277354)', diff saved to https://phabricator.wikimedia.org/P18226 and previous config saved to /var/cache/conftool/dbconfig/20211214-131047-marostegui.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T277354)', diff saved to https://phabricator.wikimedia.org/P18225 and previous config saved to /var/cache/conftool/dbconfig/20211214-130853-marostegui.json
  • 13:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 13:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T277354)', diff saved to https://phabricator.wikimedia.org/P18224 and previous config saved to /var/cache/conftool/dbconfig/20211214-130845-marostegui.json
  • 12:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netflow5001.eqsin.wmnet
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18223 and previous config saved to /var/cache/conftool/dbconfig/20211214-125340-marostegui.json
  • 12:45 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netflow5001.eqsin.wmnet
  • 12:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netflow3001.esams.wmnet
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18222 and previous config saved to /var/cache/conftool/dbconfig/20211214-123836-marostegui.json
  • 12:27 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netflow3001.esams.wmnet
  • 12:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netflow1001.eqiad.wmnet
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T277354)', diff saved to https://phabricator.wikimedia.org/P18221 and previous config saved to /var/cache/conftool/dbconfig/20211214-122331-marostegui.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T277354)', diff saved to https://phabricator.wikimedia.org/P18220 and previous config saved to /var/cache/conftool/dbconfig/20211214-122137-marostegui.json
  • 12:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T277354)', diff saved to https://phabricator.wikimedia.org/P18219 and previous config saved to /var/cache/conftool/dbconfig/20211214-122129-marostegui.json
  • 12:19 moritzm: drain primary/secondary instances off ganeti2018 T296622
  • 12:14 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netflow1001.eqiad.wmnet
  • 12:12 _joe_: regenerating pybal config-master files
  • 12:11 Lucas_WMDE: UTC morning backport+config window done
  • 12:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set wgLexemeEnableDataTransclusion to false everywhere (T294159) (no-op) (duration: 00m 59s)
  • 12:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P18218 and previous config saved to /var/cache/conftool/dbconfig/20211214-120624-marostegui.json
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
  • 12:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netflow2001.codfw.wmnet
  • 11:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
  • 11:51 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netflow2001.codfw.wmnet
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P18217 and previous config saved to /var/cache/conftool/dbconfig/20211214-115119-marostegui.json
  • 11:41 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2025.codfw.wmnet
  • 11:40 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=restbase2025.codfw.wmnet
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T277354)', diff saved to https://phabricator.wikimedia.org/P18216 and previous config saved to /var/cache/conftool/dbconfig/20211214-113615-marostegui.json
  • 11:35 hnowlan: joining restbase2026-b to cassandra
  • 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2008.codfw.wmnet
  • 11:34 jbond: move idp service to eqiad
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T277354)', diff saved to https://phabricator.wikimedia.org/P18215 and previous config saved to /var/cache/conftool/dbconfig/20211214-113428-marostegui.json
  • 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db[1106,1154].eqiad.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db[1106,1154].eqiad.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T277354)', diff saved to https://phabricator.wikimedia.org/P18214 and previous config saved to /var/cache/conftool/dbconfig/20211214-113417-marostegui.json
  • 11:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2008.codfw.wmnet
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P18213 and previous config saved to /var/cache/conftool/dbconfig/20211214-111912-marostegui.json
  • 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
  • 11:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P18212 and previous config saved to /var/cache/conftool/dbconfig/20211214-110407-marostegui.json
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T277354)', diff saved to https://phabricator.wikimedia.org/P18211 and previous config saved to /var/cache/conftool/dbconfig/20211214-104903-marostegui.json
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T277354)', diff saved to https://phabricator.wikimedia.org/P18210 and previous config saved to /var/cache/conftool/dbconfig/20211214-104717-marostegui.json
  • 10:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 10:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 10:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 10:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 10:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T277354)', diff saved to https://phabricator.wikimedia.org/P18209 and previous config saved to /var/cache/conftool/dbconfig/20211214-104538-marostegui.json
  • 10:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netflow4001.ulsfo.wmnet
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P18208 and previous config saved to /var/cache/conftool/dbconfig/20211214-103033-marostegui.json
  • 10:29 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netflow4001.ulsfo.wmnet
  • 10:22 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts netflow2001.codfw.wmnet
  • 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2021.codfw.wmnet with OS buster
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P18207 and previous config saved to /var/cache/conftool/dbconfig/20211214-101528-marostegui.json
  • 10:10 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netflow2001.codfw.wmnet
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T277354)', diff saved to https://phabricator.wikimedia.org/P18206 and previous config saved to /var/cache/conftool/dbconfig/20211214-100023-marostegui.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T277354)', diff saved to https://phabricator.wikimedia.org/P18205 and previous config saved to /var/cache/conftool/dbconfig/20211214-095837-marostegui.json
  • 09:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 09:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T277354)', diff saved to https://phabricator.wikimedia.org/P18204 and previous config saved to /var/cache/conftool/dbconfig/20211214-095829-marostegui.json
  • 09:46 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2021.codfw.wmnet with OS buster
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2017.codfw.wmnet with OS buster
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P18203 and previous config saved to /var/cache/conftool/dbconfig/20211214-094324-marostegui.json
  • 09:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P18202 and previous config saved to /var/cache/conftool/dbconfig/20211214-092820-marostegui.json
  • 09:15 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw rolling restart - ryankemper@cumin2001 - T297468
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T277354)', diff saved to https://phabricator.wikimedia.org/P18201 and previous config saved to /var/cache/conftool/dbconfig/20211214-091315-marostegui.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T277354)', diff saved to https://phabricator.wikimedia.org/P18200 and previous config saved to /var/cache/conftool/dbconfig/20211214-091130-marostegui.json
  • 09:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T277354)', diff saved to https://phabricator.wikimedia.org/P18199 and previous config saved to /var/cache/conftool/dbconfig/20211214-090948-marostegui.json
  • 09:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow1002.eqiad.wmnet
  • 08:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2017.codfw.wmnet with OS buster
  • 08:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow5002.eqsin.wmnet
  • 08:54 moritzm: failover Ganeti master to ganeti2016 T296622
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P18198 and previous config saved to /var/cache/conftool/dbconfig/20211214-085443-marostegui.json
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2008.codfw.wmnet with OS buster
  • 08:49 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host netflow1002.eqiad.wmnet
  • 08:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow4002.ulsfo.wmnet
  • 08:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow3002.esams.wmnet
  • 08:43 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host netflow5002.eqsin.wmnet
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P18197 and previous config saved to /var/cache/conftool/dbconfig/20211214-083938-marostegui.json
  • 08:35 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host netflow4002.ulsfo.wmnet
  • 08:34 dcausse: restart blazegraph on wdqs1013 (jvm stuck for 5h)
  • 08:33 ayounsi@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host netflow5002.eqsin.wmnet
  • 08:33 ayounsi@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host netflow4002.ulsfo.wmnet
  • 08:30 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host netflow5002.eqsin.wmnet
  • 08:30 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host netflow4002.ulsfo.wmnet
  • 08:29 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host netflow3002.esams.wmnet
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T277354)', diff saved to https://phabricator.wikimedia.org/P18196 and previous config saved to /var/cache/conftool/dbconfig/20211214-082433-marostegui.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T277354)', diff saved to https://phabricator.wikimedia.org/P18195 and previous config saved to /var/cache/conftool/dbconfig/20211214-082249-marostegui.json
  • 08:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 08:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T277354)', diff saved to https://phabricator.wikimedia.org/P18194 and previous config saved to /var/cache/conftool/dbconfig/20211214-082241-marostegui.json
  • 08:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2008.codfw.wmnet with OS buster
  • 08:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P18193 and previous config saved to /var/cache/conftool/dbconfig/20211214-080736-marostegui.json
  • 08:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P18192 and previous config saved to /var/cache/conftool/dbconfig/20211214-075232-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T277354)', diff saved to https://phabricator.wikimedia.org/P18191 and previous config saved to /var/cache/conftool/dbconfig/20211214-073727-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T277354)', diff saved to https://phabricator.wikimedia.org/P18190 and previous config saved to /var/cache/conftool/dbconfig/20211214-073541-marostegui.json
  • 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T277354)', diff saved to https://phabricator.wikimedia.org/P18189 and previous config saved to /var/cache/conftool/dbconfig/20211214-073534-marostegui.json
  • 07:24 ryankemper: T297468 `sudo cookbook sre.elasticsearch.rolling-operation search_codfw "codfw rolling restart" --nodes-per-run 3 --start-datetime 2021-12-14T01:27:58 --task-id T297468` on `ryankemper@cumin2001` tmux `elastic_restarts`
  • 07:24 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw rolling restart - ryankemper@cumin2001 - T297468
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P18188 and previous config saved to /var/cache/conftool/dbconfig/20211214-072029-marostegui.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P18187 and previous config saved to /var/cache/conftool/dbconfig/20211214-070524-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T277354)', diff saved to https://phabricator.wikimedia.org/P18186 and previous config saved to /var/cache/conftool/dbconfig/20211214-065019-marostegui.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T277354)', diff saved to https://phabricator.wikimedia.org/P18185 and previous config saved to /var/cache/conftool/dbconfig/20211214-064833-marostegui.json
  • 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T277354)', diff saved to https://phabricator.wikimedia.org/P18184 and previous config saved to /var/cache/conftool/dbconfig/20211214-064825-marostegui.json
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P18183 and previous config saved to /var/cache/conftool/dbconfig/20211214-063321-marostegui.json
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P18182 and previous config saved to /var/cache/conftool/dbconfig/20211214-061816-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T277354)', diff saved to https://phabricator.wikimedia.org/P18181 and previous config saved to /var/cache/conftool/dbconfig/20211214-060311-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T277354)', diff saved to https://phabricator.wikimedia.org/P18180 and previous config saved to /var/cache/conftool/dbconfig/20211214-060125-marostegui.json
  • 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 06:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 15 hosts with reason: Maintenance
  • 05:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 15 hosts with reason: Maintenance
  • 05:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:09 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/DiscussionTools/includes/Hooks/HookUtils.php: Backport: Cache page properties in memory to avoid extra queries (T297132 T297669) (duration: 00m 57s)
  • 04:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad rolling restart - ryankemper@cumin1001 - T297468
  • 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:42 ryankemper: T297468 `sudo cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad rolling restart" --nodes-per-run 3 --start-datetime 2021-12-14T01:27:58 --task-id T297468` on `ryankemper@cumin1001` tmux `elastic_restarts`
  • 01:41 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad rolling restart - ryankemper@cumin1001 - T297468
  • 01:41 tgr: UTC late deploys done
  • 01:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wgEventStreams: Add WelcomeSurvey Interaction schema (T267273) (duration: 00m 56s)
  • 01:24 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Clean up readers web team config (duration: 00m 55s)
  • 01:20 tgr@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Config: Clean up readers web team config (duration: 00m 55s)
  • 01:11 tgr@deploy1002: Synchronized dblists/mobile-anon-talk.dblist: Config: Clean up readers web team config (duration: 00m 55s)
  • 01:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:10 tgr@deploy1002: Synchronized wmf-config/config/: Config: Clean up readers web team config (duration: 00m 56s)
  • 01:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:58 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [Attempt 2] MinervaDonateLink is enabled in production"" (duration: 00m 57s)
  • 00:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:41 tgr@deploy1002: Synchronized images/mobile/: Config: Remove broken wikipedia-wordmark-en.png symlink (T278193) (duration: 00m 56s)
  • 00:36 tgr@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Default commons search experience is MediaSearch (T297484) (duration: 00m 56s)
  • 00:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-12-13

  • 23:56 hnowlan: joining restbase2026-a to cassandra cluster
  • 23:28 jhathaway@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mirror1001.wikimedia.org with OS bullseye
  • 23:06 jhathaway@cumin1001: START - Cookbook sre.hosts.reimage for host mirror1001.wikimedia.org with OS bullseye
  • 22:06 sbassett: deployed security patch for T297571 (sync-file 2)
  • 22:02 sbassett: deployed security patch for T297571 (sync-file 1)
  • 21:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1004.eqiad.wmnet with OS buster
  • 21:46 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS buster
  • 21:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1020.eqiad.wmnet with OS buster
  • 21:21 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1004.eqiad.wmnet with OS buster
  • 21:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS buster
  • 21:04 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS buster
  • 20:57 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS buster
  • 20:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1020.eqiad.wmnet with OS buster
  • 20:51 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 20:45 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1020.eqiad.wmnet with OS buster
  • 20:45 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS buster
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:34 mdipietro@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1014.eqiad.wmnet
  • 20:22 majavah: deployed patch for T297322
  • 20:16 mdipietro@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1014.eqiad.wmnet
  • 20:15 mdipietro@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1013.eqiad.wmnet
  • 20:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1020.eqiad.wmnet with OS buster
  • 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS buster
  • 19:48 mdipietro@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1013.eqiad.wmnet
  • 19:47 mdipietro@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1012.eqiad.wmnet
  • 19:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:42 urbanecm: UTC evening B&C window done
  • 19:40 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/VisualEditor/includes/VisualEditorHooks.php: fa01add: Check VisualEditorDisableForAnons in getEditPageEditor() (T296269) (duration: 00m 56s)
  • 19:39 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/VisualEditor/includes/VisualEditorHooks.php: 8144ab6: Check VisualEditorDisableForAnons in getEditPageEditor() (T296269) (duration: 00m 56s)
  • 19:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:33 mdipietro@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1012.eqiad.wmnet
  • 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: fd325c5: kartographer: Enable tegola on ruwiki (T280767) (duration: 00m 57s)
  • 19:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bb98942: Fix format of VectorWebABTestEnrollment (T295972) (duration: 00m 57s)
  • 19:14 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1020.eqiad.wmnet with OS bullseye
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:11 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye
  • 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:55 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1019.eqiad.wmnet with OS bullseye
  • 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1020.eqiad.wmnet with OS bullseye
  • 18:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1018.eqiad.wmnet with OS bullseye
  • 18:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1020.eqiad.wmnet with OS bullseye
  • 18:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye
  • 18:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1020.eqiad.wmnet with OS bullseye
  • 18:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1019.eqiad.wmnet with OS bullseye
  • 18:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1018.eqiad.wmnet with OS bullseye
  • 18:25 dancy: dancy@deploy1002 rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.12 T293953
  • 18:25 dancy@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.12 refs T293954
  • 18:24 hnowlan: joining restbase2025-c to cassandra cluster
  • 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:21 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/includes/: Backport: DeprecationHelper: avoid closures (T297236) (duration: 01m 02s)
  • 18:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 18:13 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 18:13 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye
  • 18:12 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 18:10 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 18:09 jbond: upload cas_6.4.4-1+wmf10u2
  • 18:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 18:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:51 accraze@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 17:49 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikibaseLexeme: Backport: Add form:hasGrammaticalFeature() method (T297478) (no-op because Lexeme Lua is not yet enabled in prod) (duration: 00m 57s)
  • 17:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:47 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikibaseLexeme: Backport: Add lexeme:getLemma(), sense:getGloss(), form:getRepresentation() (T297024) (no-op because Lexeme Lua is not yet enabled in prod) (duration: 00m 57s)
  • 17:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:45 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikibaseLexeme: Backport: Remove most of mw.wikibase.lexeme Lua module (T297404) (no-op because Lexeme Lua is not yet enabled in prod) (duration: 00m 58s)
  • 17:43 accraze@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 17:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2021.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 17:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2021.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 17:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:37 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:37 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 17:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:55 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T277354)', diff saved to https://phabricator.wikimedia.org/P18177 and previous config saved to /var/cache/conftool/dbconfig/20211213-161414-marostegui.json
  • 15:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P18176 and previous config saved to /var/cache/conftool/dbconfig/20211213-155909-marostegui.json
  • 15:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:53 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/WikibaseLexeme: Backport: Add form:hasGrammaticalFeature() method (T297478) (no-op because Lexeme Lua is not yet enabled in prod) (duration: 00m 57s)
  • 15:51 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/WikibaseLexeme: Backport: Add lexeme:getLemma(), sense:getGloss(), form:getRepresentation() (T297024) (no-op because Lexeme Lua is not yet enabled in prod) (duration: 00m 57s)
  • 15:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:49 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/WikibaseLexeme: Backport: Remove most of mw.wikibase.lexeme Lua module (T297404) (no-op because Lexeme Lua is not yet enabled in prod) (duration: 00m 58s)
  • 15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P18175 and previous config saved to /var/cache/conftool/dbconfig/20211213-154404-marostegui.json
  • 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T277354)', diff saved to https://phabricator.wikimedia.org/P18174 and previous config saved to /var/cache/conftool/dbconfig/20211213-152859-marostegui.json
  • 15:25 robh: dns6001 returned to service (icinga checks going green) via T286507
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T277354)', diff saved to https://phabricator.wikimedia.org/P18173 and previous config saved to /var/cache/conftool/dbconfig/20211213-151657-marostegui.json
  • 15:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db[1112,1154].eqiad.wmnet with reason: Maintenance
  • 15:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db[1112,1154].eqiad.wmnet with reason: Maintenance
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T277354)', diff saved to https://phabricator.wikimedia.org/P18172 and previous config saved to /var/cache/conftool/dbconfig/20211213-151645-marostegui.json
  • 15:15 hnowlan: joining restbase2025-b to cassandra cluster
  • 15:15 robh: dns6002 bios update done, returned to green in icinga, dns6001 coming down next for firmware update via T286507
  • 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P18171 and previous config saved to /var/cache/conftool/dbconfig/20211213-150141-marostegui.json
  • 14:55 jbond: upload cas 6.4.4 deb package
  • 14:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2024.codfw.wmnet
  • 14:54 robh: dns6002 rebooting for firmware updates via T286507
  • 14:54 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=restbase2024.codfw.wmnet
  • 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P18170 and previous config saved to /var/cache/conftool/dbconfig/20211213-144636-marostegui.json
  • 14:34 moritzm: imported fastnetmon 1.1.7+deb11u1 for bullseye-wikimedia https://phabricator.wikimedia.org/T297595
  • 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T277354)', diff saved to https://phabricator.wikimedia.org/P18169 and previous config saved to /var/cache/conftool/dbconfig/20211213-143131-marostegui.json
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T277354)', diff saved to https://phabricator.wikimedia.org/P18168 and previous config saved to /var/cache/conftool/dbconfig/20211213-142141-marostegui.json
  • 14:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T277354)', diff saved to https://phabricator.wikimedia.org/P18167 and previous config saved to /var/cache/conftool/dbconfig/20211213-141052-marostegui.json
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P18166 and previous config saved to /var/cache/conftool/dbconfig/20211213-135547-marostegui.json
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P18165 and previous config saved to /var/cache/conftool/dbconfig/20211213-134042-marostegui.json
  • 13:31 moritzm: installing wireshark security updates
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T277354)', diff saved to https://phabricator.wikimedia.org/P18164 and previous config saved to /var/cache/conftool/dbconfig/20211213-132537-marostegui.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T277354)', diff saved to https://phabricator.wikimedia.org/P18163 and previous config saved to /var/cache/conftool/dbconfig/20211213-131538-marostegui.json
  • 13:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 13:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T277354)', diff saved to https://phabricator.wikimedia.org/P18162 and previous config saved to /var/cache/conftool/dbconfig/20211213-131529-marostegui.json
  • 13:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow2002.codfw.wmnet
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P18161 and previous config saved to /var/cache/conftool/dbconfig/20211213-130024-marostegui.json
  • 12:48 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host netflow2002.codfw.wmnet
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P18160 and previous config saved to /var/cache/conftool/dbconfig/20211213-124519-marostegui.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T277354)', diff saved to https://phabricator.wikimedia.org/P18159 and previous config saved to /var/cache/conftool/dbconfig/20211213-123014-marostegui.json
  • 12:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:25 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/MediaSearch/resources/components/SearchResults.vue: 7d0fa97: Disable event logging for Quickview interactions (T297400) (duration: 00m 56s)
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T277354)', diff saved to https://phabricator.wikimedia.org/P18158 and previous config saved to /var/cache/conftool/dbconfig/20211213-121811-marostegui.json
  • 12:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 12:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T277354)', diff saved to https://phabricator.wikimedia.org/P18157 and previous config saved to /var/cache/conftool/dbconfig/20211213-121803-marostegui.json
  • 12:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 361214b: Enable Disambiguator notifications on more wikis (T297175) (duration: 00m 56s)
  • 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:07 hnowlan: joining restbase2025-a to cassandra cluster
  • 12:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on restbase[2025-2026].codfw.wmnet with reason: New cassandra hosts awaiting syncing
  • 12:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on restbase[2025-2026].codfw.wmnet with reason: New cassandra hosts awaiting syncing
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P18156 and previous config saved to /var/cache/conftool/dbconfig/20211213-120259-marostegui.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P18155 and previous config saved to /var/cache/conftool/dbconfig/20211213-114754-marostegui.json
  • 11:39 majavah: deployed patch for T297574
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T277354)', diff saved to https://phabricator.wikimedia.org/P18154 and previous config saved to /var/cache/conftool/dbconfig/20211213-113249-marostegui.json
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T277354)', diff saved to https://phabricator.wikimedia.org/P18153 and previous config saved to /var/cache/conftool/dbconfig/20211213-112245-marostegui.json
  • 11:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 11:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 11:22 urbanecm: Run namespaceDupes.php --wiki=$WIKI --fix --add-prefix=BROKEN for wikis in P18152
  • 11:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:53 urbanecm: mwscript namespaceDupes.php --wiki={mswiki,sqwiki,bclwiki,idwiki,siwiki,tlwiki,rowiki} --add-prefix=BROKEN --fix
  • 10:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 7 hosts with reason: Maintenance
  • 10:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 7 hosts with reason: Maintenance
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1172', diff saved to https://phabricator.wikimedia.org/P18151 and previous config saved to /var/cache/conftool/dbconfig/20211213-101707-marostegui.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T277354)', diff saved to https://phabricator.wikimedia.org/P18150 and previous config saved to /var/cache/conftool/dbconfig/20211213-101427-marostegui.json
  • 10:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 10:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 10:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 10:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126', diff saved to https://phabricator.wikimedia.org/P18149 and previous config saved to /var/cache/conftool/dbconfig/20211213-101013-marostegui.json
  • 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T277354)', diff saved to https://phabricator.wikimedia.org/P18148 and previous config saved to /var/cache/conftool/dbconfig/20211213-100238-marostegui.json
  • 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 10:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 10:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1114', diff saved to https://phabricator.wikimedia.org/P18147 and previous config saved to /var/cache/conftool/dbconfig/20211213-100143-marostegui.json
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2130 and db2074', diff saved to https://phabricator.wikimedia.org/P18146 and previous config saved to /var/cache/conftool/dbconfig/20211213-095949-marostegui.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T277354)', diff saved to https://phabricator.wikimedia.org/P18145 and previous config saved to /var/cache/conftool/dbconfig/20211213-095851-marostegui.json
  • 09:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 09:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1111', diff saved to https://phabricator.wikimedia.org/P18144 and previous config saved to /var/cache/conftool/dbconfig/20211213-095728-marostegui.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T277354)', diff saved to https://phabricator.wikimedia.org/P18143 and previous config saved to /var/cache/conftool/dbconfig/20211213-094900-marostegui.json
  • 09:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 09:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T277354)', diff saved to https://phabricator.wikimedia.org/P18142 and previous config saved to /var/cache/conftool/dbconfig/20211213-094853-marostegui.json
  • 09:45 urbanecm: Staging at mwdebug1001 ended
  • 09:43 urbanecm: pwnwiki: Create DB tables for GrowthExperiments
  • 09:43 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 09:42 urbanecm: Stagging at mwdebug1001
  • 09:38 Amir1: revoking DROP from centralauth grant of wikiadmin (T249683)
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P18141 and previous config saved to /var/cache/conftool/dbconfig/20211213-093348-marostegui.json
  • 09:32 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 09:31 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:31 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 09:31 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:31 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 09:29 dcausse: restarting blazegraph on wdqs1012 (jvm stuck for 6h)
  • 09:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:28 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 09:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:25 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:24 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P18140 and previous config saved to /var/cache/conftool/dbconfig/20211213-091844-marostegui.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T277354)', diff saved to https://phabricator.wikimedia.org/P18139 and previous config saved to /var/cache/conftool/dbconfig/20211213-090339-marostegui.json
  • 08:48 Amir1: removing grant of '%a%' on db1123 (s3) T296537
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T277354)', diff saved to https://phabricator.wikimedia.org/P18138 and previous config saved to /var/cache/conftool/dbconfig/20211213-084729-marostegui.json
  • 08:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 08:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T277354)', diff saved to https://phabricator.wikimedia.org/P18137 and previous config saved to /var/cache/conftool/dbconfig/20211213-084721-marostegui.json
  • 08:45 Amir1: fixing centralauth grants of wikiuser on all of s7 T296537
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P18136 and previous config saved to /var/cache/conftool/dbconfig/20211213-083217-marostegui.json
  • 08:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:20 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs: Fix idwiki's autoreview config (T288404) (duration: 00m 56s)
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P18135 and previous config saved to /var/cache/conftool/dbconfig/20211213-081712-marostegui.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T277354)', diff saved to https://phabricator.wikimedia.org/P18133 and previous config saved to /var/cache/conftool/dbconfig/20211213-080207-marostegui.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T277354)', diff saved to https://phabricator.wikimedia.org/P18132 and previous config saved to /var/cache/conftool/dbconfig/20211213-080101-marostegui.json
  • 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T277354)', diff saved to https://phabricator.wikimedia.org/P18131 and previous config saved to /var/cache/conftool/dbconfig/20211213-080054-marostegui.json
  • 07:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:46 moritzm: drain primary/secondary instances off ganeti2021 T296622
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P18130 and previous config saved to /var/cache/conftool/dbconfig/20211213-074549-marostegui.json
  • 07:40 Amir1: start of clean up of flaggedtemplates on all flaggedrevs wikis: T296380
  • 07:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: Change logic of pruneChange to allow deleting rows more flexibly (T296380) (duration: 00m 57s)
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P18129 and previous config saved to /var/cache/conftool/dbconfig/20211213-073044-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T277354)', diff saved to https://phabricator.wikimedia.org/P18128 and previous config saved to /var/cache/conftool/dbconfig/20211213-071539-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T277354)', diff saved to https://phabricator.wikimedia.org/P18127 and previous config saved to /var/cache/conftool/dbconfig/20211213-071433-marostegui.json
  • 07:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 07:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 07:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 13 hosts with reason: Maintenance
  • 07:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 13 hosts with reason: Maintenance
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T277354)', diff saved to https://phabricator.wikimedia.org/P18126 and previous config saved to /var/cache/conftool/dbconfig/20211213-070430-marostegui.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P18125 and previous config saved to /var/cache/conftool/dbconfig/20211213-070204-root.json
  • 06:51 elukey: run `apt-get clean` on aphlict1001 to free some space
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18124 and previous config saved to /var/cache/conftool/dbconfig/20211213-064926-marostegui.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P18123 and previous config saved to /var/cache/conftool/dbconfig/20211213-064700-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18122 and previous config saved to /var/cache/conftool/dbconfig/20211213-063421-marostegui.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P18121 and previous config saved to /var/cache/conftool/dbconfig/20211213-063156-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T277354)', diff saved to https://phabricator.wikimedia.org/P18120 and previous config saved to /var/cache/conftool/dbconfig/20211213-061916-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T277354)', diff saved to https://phabricator.wikimedia.org/P18119 and previous config saved to /var/cache/conftool/dbconfig/20211213-061756-marostegui.json
  • 06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 06:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P18118 and previous config saved to /var/cache/conftool/dbconfig/20211213-061652-root.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 for a restart', diff saved to https://phabricator.wikimedia.org/P18117 and previous config saved to /var/cache/conftool/dbconfig/20211213-060343-marostegui.json

2021-12-12

  • 14:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
  • 14:30 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
  • 14:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on graphite1004.eqiad.wmnet with reason: powercycle
  • 14:23 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on graphite1004.eqiad.wmnet with reason: powercycle
  • 14:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host graphite1004.eqiad.wmnet
  • 14:08 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
  • 04:17 ejegg: updated SmashPig standalone (IPN listener) from 211f8e65 to 9e885819

2021-12-11

  • 19:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 00:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-12-10

  • 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:09 dancy@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.9 refs T293953
  • 21:10 rzl: sudo cumin -b7 -s10 -p0 'A:mw-eqiad and not P{mw1414.eqiad.wmnet}' restart-php7.2-fpm - T297517
  • 21:09 rzl: rzl@mw1414:~$ sudo depool - preserving for investigation, T297517
  • 20:43 rzl: sudo cumin -b2 -s10 -p0 'A:parsoid and not P{wtp1025.eqiad.wmnet}' restart-php7.2-fpm - T297517
  • 20:38 rzl: rzl@wtp1025:~$ sudo restart-php7.2-fpm - T297517 - rolling restart to follow
  • 18:50 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts copernicium.wikimedia.org
  • 18:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti2017.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 18:11 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti2017.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 18:04 jhathaway@cumin1001: START - Cookbook sre.hosts.decommission for hosts copernicium.wikimedia.org
  • 17:21 dancy@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/Cite/modules/ve-cite/ve.ui.MWReferencesListDialog.js: Backport: ve.ui.MWReferencesListDialog: Fix exception caused by a copy-paste mistake (T297418) (duration: 00m 58s)
  • 17:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:56 dancy@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: Backport: Fix PageRecord lookup (T297431) (duration: 00m 58s)
  • 16:56 jynus: increase backup2007's allocated disk space
  • 16:43 dancy@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: Backport: Fix PageRecord lookup (T297431) (duration: 00m 58s)
  • 16:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:54 jynus: increase backup2006's allocated disk space
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T277354)', diff saved to https://phabricator.wikimedia.org/P18112 and previous config saved to /var/cache/conftool/dbconfig/20211210-152410-marostegui.json
  • 15:19 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 15:16 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:15 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:15 jelto: remove tiller from eqiad Kubernetes cluster
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18111 and previous config saved to /var/cache/conftool/dbconfig/20211210-150906-marostegui.json
  • 15:08 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 15:06 jynus: increase backup2005's allocated disk space
  • 15:01 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:01 jelto: remove tiller from codfw Kubernetes cluster
  • 15:01 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:55 moritzm: drain primary/secondary instances off ganeti2017 T296622
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18110 and previous config saved to /var/cache/conftool/dbconfig/20211210-145401-marostegui.json
  • 14:50 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 14:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on restbase2026.codfw.wmnet with reason: New cassandra hosts awaiting syncing
  • 14:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on restbase2026.codfw.wmnet with reason: New cassandra hosts awaiting syncing
  • 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti2008.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 14:48 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti2008.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 14:48 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:48 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:48 jelto: remove tiller from staging-eqiad Kubernetes cluster
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T277354)', diff saved to https://phabricator.wikimedia.org/P18109 and previous config saved to /var/cache/conftool/dbconfig/20211210-143856-marostegui.json
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T277354)', diff saved to https://phabricator.wikimedia.org/P18108 and previous config saved to /var/cache/conftool/dbconfig/20211210-143636-marostegui.json
  • 14:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 14:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T277354)', diff saved to https://phabricator.wikimedia.org/P18107 and previous config saved to /var/cache/conftool/dbconfig/20211210-143628-marostegui.json
  • 14:36 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2026.codfw.wmnet with OS buster
  • 14:33 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:33 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:33 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:33 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:29 jelto: remove tiller from staging-codfw Kubernetes cluster
  • 14:28 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:27 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18106 and previous config saved to /var/cache/conftool/dbconfig/20211210-142123-marostegui.json
  • 14:17 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:17 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18105 and previous config saved to /var/cache/conftool/dbconfig/20211210-140618-marostegui.json
  • 14:01 jynus: increase backup2004's allocated disk space
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T277354)', diff saved to https://phabricator.wikimedia.org/P18104 and previous config saved to /var/cache/conftool/dbconfig/20211210-135114-marostegui.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T277354)', diff saved to https://phabricator.wikimedia.org/P18103 and previous config saved to /var/cache/conftool/dbconfig/20211210-134953-marostegui.json
  • 13:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Maintenance
  • 13:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Maintenance
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T277354)', diff saved to https://phabricator.wikimedia.org/P18102 and previous config saved to /var/cache/conftool/dbconfig/20211210-134941-marostegui.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18101 and previous config saved to /var/cache/conftool/dbconfig/20211210-133437-marostegui.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18100 and previous config saved to /var/cache/conftool/dbconfig/20211210-131932-marostegui.json
  • 13:17 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2025.codfw.wmnet with OS buster
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T277354)', diff saved to https://phabricator.wikimedia.org/P18099 and previous config saved to /var/cache/conftool/dbconfig/20211210-130427-marostegui.json
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T277354)', diff saved to https://phabricator.wikimedia.org/P18098 and previous config saved to /var/cache/conftool/dbconfig/20211210-130051-marostegui.json
  • 13:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 13:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 12:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on restbase[2024-2025].codfw.wmnet with reason: New cassandra hosts awaiting syncing
  • 12:56 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on restbase[2024-2025].codfw.wmnet with reason: New cassandra hosts awaiting syncing
  • 12:53 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2026.codfw.wmnet with OS buster
  • 12:51 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2024.codfw.wmnet with OS buster
  • 12:37 hnowlan: including cassandra-tools in cassandra311 component of buster-wikimedia
  • 12:31 _joe_: manually modifying configmaps for rsyslog in mwdebug for live troubleshooting.
  • 12:28 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2025.codfw.wmnet with OS buster
  • 12:08 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:02 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2024.codfw.wmnet with OS buster
  • 10:03 jayme: published docker-registry.discovery.wmnet/cert-manager/cainjector:1.5.4-2 docker-registry.discovery.wmnet/cert-manager/webhook:1.5.4-2 docker-registry.discovery.wmnet/cert-manager/controller:1.5.4-2
  • 10:00 vgutierrez: repool cp5006
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T277354)', diff saved to https://phabricator.wikimedia.org/P18097 and previous config saved to /var/cache/conftool/dbconfig/20211210-095833-marostegui.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18096 and previous config saved to /var/cache/conftool/dbconfig/20211210-094328-marostegui.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18095 and previous config saved to /var/cache/conftool/dbconfig/20211210-092823-marostegui.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T277354)', diff saved to https://phabricator.wikimedia.org/P18094 and previous config saved to /var/cache/conftool/dbconfig/20211210-091319-marostegui.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T277354)', diff saved to https://phabricator.wikimedia.org/P18093 and previous config saved to /var/cache/conftool/dbconfig/20211210-091041-marostegui.json
  • 09:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T277354)', diff saved to https://phabricator.wikimedia.org/P18092 and previous config saved to /var/cache/conftool/dbconfig/20211210-091034-marostegui.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18091 and previous config saved to /var/cache/conftool/dbconfig/20211210-085529-marostegui.json
  • 08:53 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:53 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18090 and previous config saved to /var/cache/conftool/dbconfig/20211210-084024-marostegui.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T277354)', diff saved to https://phabricator.wikimedia.org/P18089 and previous config saved to /var/cache/conftool/dbconfig/20211210-082520-marostegui.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T277354)', diff saved to https://phabricator.wikimedia.org/P18088 and previous config saved to /var/cache/conftool/dbconfig/20211210-082041-marostegui.json
  • 08:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T277354)', diff saved to https://phabricator.wikimedia.org/P18087 and previous config saved to /var/cache/conftool/dbconfig/20211210-082034-marostegui.json
  • 08:13 moritzm: drain primary/secondary instance off ganeti2008 T296622
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18086 and previous config saved to /var/cache/conftool/dbconfig/20211210-080529-marostegui.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18085 and previous config saved to /var/cache/conftool/dbconfig/20211210-075024-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T277354)', diff saved to https://phabricator.wikimedia.org/P18084 and previous config saved to /var/cache/conftool/dbconfig/20211210-073520-marostegui.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T277354)', diff saved to https://phabricator.wikimedia.org/P18083 and previous config saved to /var/cache/conftool/dbconfig/20211210-073342-marostegui.json
  • 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 07:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 11 hosts with reason: Maintenance
  • 06:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 11 hosts with reason: Maintenance
  • 05:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 01:54 eileen: civicrm revision c47cf762 -> a88cd178
  • 00:36 cjming: end of UTC late backport & config window
  • 00:34 cjming@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/MediaSearch/resources/components/QuickView.vue: Backport: Search_result_page_id should be integer (T297400) (duration: 00m 55s)
  • 00:33 cjming@deploy1002: Synchronized php-1.38.0-wmf.12/skins/Vector: Backport: Update A/B test enrollment name (T292587) (duration: 00m 56s)
  • 00:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:23 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "VE on zh.wiki: Enable single-edit-tab mode" (T296269) (duration: 00m 56s)
  • 00:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:17 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update WebABTestEnrollment name (T295972) (duration: 00m 57s)
  • 00:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-12-09

  • 22:14 dancy@deploy1002: Synchronized README: testing https://gerrit.wikimedia.org/r/745572 (duration: 00m 55s)
  • 22:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:05 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: Fix the mistake in passing parameter (T296380) (duration: 02m 11s)
  • 21:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:58 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:04 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.12 refs T293953
  • 19:54 legoktm: deployed patch for T297416
  • 19:40 majavah: deployed patch for T297416
  • 19:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
  • 19:23 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
  • 19:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4e6cba0: VE on zh.wiki: Enable single-edit-tab mode (T296269) (duration: 01m 05s)
  • 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 06bd3e6: kartographer: Enable tegola on frwiki (duration: 01m 05s)
  • 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c1e9551: Deploy sticky header and A/B test enrollment to office, test wikis (T295972) (duration: 01m 06s)
  • 18:51 cwhite: powercycle graphite2003 T297265
  • 18:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:15 elukey: kafka-main2003 back in service with the old OS (stretch). Re-created a new puppet host key and signed it on the puppet master
  • 18:11 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:48 cwhite: point kibana7 to OpenSearch in codfw T288621
  • 17:46 elukey@cumin1001: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for kafka-main2003.codfw.wmnet: Renew puppet certificate - elukey@cumin1001
  • 17:46 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for kafka-main2003.codfw.wmnet: Renew puppet certificate - elukey@cumin1001
  • 17:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2003.codfw.wmnet with OS buster
  • 17:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2003.codfw.wmnet with OS buster
  • 17:00 elukey: stop kafka* on kafka-main2003 as pre-step before reimaging
  • 16:56 hnowlan: remove restbase certificates and configuration entries for decommissioned hosts
  • 16:54 mvernon@deploy1002: Synchronized private/PrivateSettings.php: Update swift config T296767 (duration: 01m 05s)
  • 16:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:17 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:44 elukey: run `ipmitool -I lanplus -H "kafka-main2003.mgmt.codfw.wmnet" -U root -E mc reset cold` from cumin2001
  • 14:39 moritzm: installing postgres security updates on eqiad maps master (and replicas)
  • 14:37 jayme: updated calico chart to calico-0.1.15 on all kubernetes clusters (introducing IPAMConfig) - T296303
  • 14:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:30 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:30 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 14:29 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 14:26 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:22 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:21 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:20 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:20 moritzm: installing postgres security updates on codfw maps master (and replicas)
  • 14:17 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:15 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 14:14 moritzm: installing python-babel security updates
  • 14:07 moritzm: installing cups security updates on stretch hosts
  • 13:58 moritzm: installing postgres security updates on netboxdb hosts
  • 13:55 moritzm: installing postgres security updates on puppetdb2002
  • 13:39 moritzm: installing tar security updates on stretch
  • 13:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:29 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 13:29 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:29 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 13:29 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:23 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:23 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:59 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/includes/media/DjVuHandler.php: Backport: media: Invalidate all file-djvu WAN caches (T296001) (duration: 01m 05s)
  • 12:57 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: Change logic of pruneChange to allow deleting rows more flexibly (T296380) (duration: 01m 05s)
  • 12:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:55 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: Major fixes to maintenance/pruneRevData.php (T290769) (duration: 01m 05s)
  • 12:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:47 kharlan@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/CacheDecorator.php: Backport: CacheDecorator: Bump cache version (T297248) (duration: 01m 05s)
  • 12:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:23 ladsgroup@deploy1002: Synchronized wmf-config/config/zhwiki.yaml: Config: Enable VE on zh.wiki, but only for logged-in users (T296269) (duration: 01m 05s)
  • 12:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:16 ladsgroup@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: Config: Enable VE on zh.wiki, but only for logged-in users (T296269) (duration: 01m 05s)
  • 12:14 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable VE on zh.wiki, but only for logged-in users (T296269) (duration: 01m 06s)
  • 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:00 topranks: Changing export policy applied on ulsfo CRs for local confed to not rewrite next-hop for routes learnt from other WMF POPs (T295672)
  • 11:44 topranks: Re-enabling multihop BGP session from cr1-eqiad to cr2-eqord (T295672)
  • 11:38 moritzm: added ganeti2027 to ganeti codfw cluster T294139
  • 11:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:23 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
  • 11:20 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 11:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:19 topranks: Changing export policy applied on eqiad CRs for local confed to not rewrite next-hop for routes learnt from other WMF POPs (T295672)
  • 11:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:13 elukey: reboot ores2001 (lost connectivity, we suspect some weird problem with the NIC, but no traces in the kernel logs)
  • 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:07 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/WikibaseLexeme/resources/widgets/: Backport: Fix LexemeHeader and GlossWidget mounting (T297328) (duration: 01m 06s)
  • 11:07 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
  • 11:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
  • 11:00 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
  • 10:58 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
  • 10:52 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:48 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
  • 10:45 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 10:02 vgutierrez: pool durum2002
  • 10:00 vgutierrez: depool durum2002
  • 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
  • 09:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2020.codfw.wmnet with OS buster

2021-12-09

  • 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2020.codfw.wmnet with OS buster
  • 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2020.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 08:12 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2020.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 03:37 cwhite: bounce superset on an-tool1010 and 1005 to pick up statsd changes T247963
  • 03:34 cwhite: bounce navtiming on webperf1001 to pick up statsd changes T297265
  • 03:32 cwhite@deploy1002: Synchronized wmf-config/ProductionServices.php: fail over statsd to graphite2003 T297265 (duration: 01m 05s)
  • 03:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 03:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:54 cwhite: failover statsd ingest host to graphite2003 T297265
  • 00:26 rzl: graphite1004.mgmt: /admin1-> racadm serveraction powercycle (T297265)
  • 00:17 legoktm: deployed updated patches for T297322
  • 00:11 rzl: rzl@graphite1004:~$ sudo shutdown -r now T297265

2021-12-08

  • 22:18 legoktm@deploy1002: Synchronized php-1.38.0-wmf.9/includes/actions/: T297322 (duration: 01m 05s)
  • 22:16 legoktm@deploy1002: Synchronized php-1.38.0-wmf.12/includes/actions/: T297322 (duration: 01m 05s)
  • 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:43 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.12 refs T293953 (duration: 01m 04s)
  • 21:41 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.12 refs T293953
  • 21:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:30 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/WikidataPageBanner/includes/WikidataPageBanner.php: Backport: Make sure 'enable-toc' key is set (T297318) (duration: 01m 05s)
  • 21:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:49 taavi@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/3D/src/Hooks.php: Backport: Remove use of $wgUseAjax (duration: 01m 07s)
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:28 jhathaway: enable exim on mx2001
  • 20:27 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 20:27 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 20:26 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:22 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.9 refs T293953 (duration: 01m 04s)
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:21 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.9 refs T293953
  • 20:17 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.12 refs T293953 (duration: 01m 05s)
  • 20:16 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.12 refs T293953
  • 20:11 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:05 majavah: utc evening deploys done
  • 19:05 taavi@deploy1002: Synchronized wmf-config/interwiki.php: Config: Update interwiki cache (duration: 01m 06s)
  • 16:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:33 taavi@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/CodeMirror/resources/modules/ve-cm/ve.ui.CodeMirror.init.less: Backport: Fix invalid reference to core resources/ directory (T296639) (duration: 01m 06s)
  • 15:49 krinkle@deploy1002: Synchronized php-1.38.0-wmf.12/resources/src/mediawiki.base/: Ie9fa768c0dc1 (duration: 01m 06s)
  • 15:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:04 Amir1: removing rest of wikiuser@localhost (T296537)
  • 14:17 moritzm: drain primary/secondary instance off ganeti2020 T296622
  • 14:01 moritzm: installing nss regression updates for stretch
  • 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: switch to drbd storage
  • 13:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: switch to drbd storage
  • 13:56 moritzm: drain primary/secondary instance off ganeti2015 T296622
  • 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
  • 13:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
  • 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
  • 13:00 ema: powercycle cp5006 T290005
  • 12:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
  • 10:43 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5006.eqsin.wmnet
  • 10:42 ema: depool cp5006, the host is down T290005#7555417
  • 10:35 ema: cp3051: repool w/ single backend experiment enabled T288106
  • 10:23 ema: cp3051: stop ats-be and clear its cache T288106
  • 10:22 ema: cp3051: depool to enable single backend experiment T288106
  • 10:16 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=testwiki --custom-groups=steward --force "Dom walden"
  • 10:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2014.codfw.wmnet with OS buster
  • 09:58 majavah: remove all users from obsolete "shell" and "clouadmin" groups on labtestwiki (labtestwikitech.wikimedia.org)
  • 09:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:29 dcausse: restarting blazegraph on wdqs1006 (jvm stuck for 24h)
  • 09:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:23 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2014.codfw.wmnet with OS buster
  • 09:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:18 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove UserMerge rights from labswiki (wikitech) (duration: 01m 07s)
  • 09:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2013.codfw.wmnet with OS buster
  • 08:34 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2013.codfw.wmnet with OS buster
  • 04:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 03:37 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 02:09 legoktm: powercycle graphite1004 via mgmt
  • 00:51 ebernhardson@deploy1002: Synchronized php-1.38.0-wmf.12/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/AddImageSubmissionHandler.php: backport window for 744896 (duration: 01m 05s)
  • 00:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:09 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T296897 Move cirrus traffic back to eqiad (duration: 01m 08s)

2021-12-07

  • 23:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 23:01 jgleeson: updated payments-wiki from 4a4ef51d to 2e164062
  • 22:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:15 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.12 refs T293953
  • 22:12 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.7 (duration: 04m 18s)
  • 22:07 dancy@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.12 refs T293953 (duration: 44m 14s)
  • 22:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 22:06 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:23 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.12 refs T293953
  • 21:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 19:58 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@c21117f] (wcqs): Deploy version 0.3.95 to wcqs (duration: 01m 48s)
  • 19:56 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@c21117f] (wcqs): Deploy version 0.3.95 to wcqs
  • 19:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 19:49 eileen: revision civicrm 311382de -> c47cf762
  • 19:26 legoktm: upgrading sacp to 4.1.0 everywhere (T296867)
  • 19:26 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 19:18 herron: graphite1004.mgmt: racadm serveraction powercycle
  • 19:13 ebernhardson: start outage recovery for commonswiki against eqiad cirrus cluster after snapshot restore
  • 18:47 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 18:46 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 18:45 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 18:38 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 18:33 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 18:27 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
  • 17:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
  • 17:41 herron: graphite1004.mgmt: racadm serveraction powercycle
  • 17:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
  • 17:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
  • 17:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
  • 17:35 root@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 17:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
  • 17:33 root@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 17:32 root@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 17:31 root@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 17:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:27 root@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 17:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:25 root@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 17:19 root@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 16:25 Amir1: deleting broken flaggedtemplates rows on dewiki (T297094)
  • 16:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
  • 16:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
  • 16:07 root@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 16:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 16:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 16:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
  • 16:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
  • 15:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
  • 15:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
  • 15:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2084.codfw.wmnet with reason: Reracking T296930
  • 15:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2084.codfw.wmnet with reason: Reracking T296930
  • 15:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
  • 15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:47 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevision.php: Backport: Do not inject rev id of template when it's empty (duration: 00m 57s)
  • 15:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
  • 15:33 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 10 hosts with reason: debugging bird/anycast-hc issues
  • 15:33 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 10 hosts with reason: debugging bird/anycast-hc issues
  • 15:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2026.codfw.wmnet with OS buster
  • 15:21 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2002.codfw.wmnet with reason: debugging bird/anycast-hc issues
  • 15:21 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum2002.codfw.wmnet with reason: debugging bird/anycast-hc issues
  • 15:14 sukhe: running authdns-update for Gerrit:744094
  • 15:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2026.codfw.wmnet with OS buster
  • 14:38 jbond: renable puppet fleet wide post monitoring refactor 744787
  • 14:28 godog: reboot graphite1004 - T297180
  • 14:15 Amir1: fixing heartbeat grants for wikiuser across the cluster (T296537)
  • 14:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti[2013-2014].codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 14:11 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti[2013-2014].codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to drbd storage
  • 14:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to drbd storage
  • 13:52 Amir1: removing wikiuser@localhost on s6 (T296537)
  • 13:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2026.codfw.wmnet with OS buster
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2004.codfw.wmnet with reason: switch to drbd storage
  • 13:42 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2004.codfw.wmnet with reason: switch to drbd storage
  • 13:40 godog: reboot graphite2003 - T297180
  • 13:39 jbond: disable puppet fleet wide to rollout 744787
  • 13:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2026.codfw.wmnet with OS buster
  • 13:16 jelto: update GitLab to 14.4.4-ce.0
  • 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti2014.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 13:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti2014.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 12:46 Lucas_WMDE: UTC morning backport+config window done
  • 12:46 Lucas_WMDE: deployed Update termbox to 2021-12-06-171243-production (T297006)
  • 12:44 lucaswerkmeister-wmde@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 12:42 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 12:39 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 12:39 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 12:24 jbond: merge refactor of monitoring classes 725045
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T277354)', diff saved to https://phabricator.wikimedia.org/P18071 and previous config saved to /var/cache/conftool/dbconfig/20211207-121655-marostegui.json
  • 12:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable reply tool by default on mediawikiwiki (T296444) (duration: 00m 57s)
  • 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P18070 and previous config saved to /var/cache/conftool/dbconfig/20211207-120150-marostegui.json
  • 11:51 moritzm: draining primary/secondary instances off ganeti2014 T296622
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P18069 and previous config saved to /var/cache/conftool/dbconfig/20211207-114645-marostegui.json
  • 11:38 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1028.eqiad.wmnet
  • 11:32 cmooney@cumin1001: START - Cookbook sre.hosts.dhcp for host cloudvirt1028.eqiad.wmnet
  • 11:31 topranks: removing IP addressing on cloudvirt1028 manually and forcing DHCP to debug reimage failure (T296906)
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T277354)', diff saved to https://phabricator.wikimedia.org/P18068 and previous config saved to /var/cache/conftool/dbconfig/20211207-113140-marostegui.json
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T277354)', diff saved to https://phabricator.wikimedia.org/P18067 and previous config saved to /var/cache/conftool/dbconfig/20211207-113005-marostegui.json
  • 11:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Maintenance T277354
  • 11:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Maintenance T277354
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T277354)', diff saved to https://phabricator.wikimedia.org/P18066 and previous config saved to /var/cache/conftool/dbconfig/20211207-112707-marostegui.json
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2004.codfw.wmnet with reason: switch to drbd storage
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2004.codfw.wmnet with reason: switch to drbd storage
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P18065 and previous config saved to /var/cache/conftool/dbconfig/20211207-111203-marostegui.json
  • 11:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
  • 11:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P18064 and previous config saved to /var/cache/conftool/dbconfig/20211207-105658-marostegui.json
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T277354)', diff saved to https://phabricator.wikimedia.org/P18063 and previous config saved to /var/cache/conftool/dbconfig/20211207-104153-marostegui.json
  • 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T277354)', diff saved to https://phabricator.wikimedia.org/P18062 and previous config saved to /var/cache/conftool/dbconfig/20211207-104018-marostegui.json
  • 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1182.eqiad.wmnet with reason: Maintenance T277354
  • 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1182.eqiad.wmnet with reason: Maintenance T277354
  • 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T277354)', diff saved to https://phabricator.wikimedia.org/P18061 and previous config saved to /var/cache/conftool/dbconfig/20211207-104010-marostegui.json
  • 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
  • 10:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
  • 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18060 and previous config saved to /var/cache/conftool/dbconfig/20211207-102505-marostegui.json
  • 10:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti2013.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 10:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti2013.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2012.codfw.wmnet with OS buster
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18059 and previous config saved to /var/cache/conftool/dbconfig/20211207-101001-marostegui.json
  • 10:01 marostegui: Deploy schema change on mailman (m5) T286552
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T277354)', diff saved to https://phabricator.wikimedia.org/P18058 and previous config saved to /var/cache/conftool/dbconfig/20211207-095456-marostegui.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T277354)', diff saved to https://phabricator.wikimedia.org/P18057 and previous config saved to /var/cache/conftool/dbconfig/20211207-095319-marostegui.json
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1170.eqiad.wmnet with reason: Maintenance T277354
  • 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1170.eqiad.wmnet with reason: Maintenance T277354
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T277354)', diff saved to https://phabricator.wikimedia.org/P18056 and previous config saved to /var/cache/conftool/dbconfig/20211207-095312-marostegui.json
  • 09:40 XioNoX: codfw, normalize VRRP - T289241
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18055 and previous config saved to /var/cache/conftool/dbconfig/20211207-093807-marostegui.json
  • 09:38 XioNoX: cr2-codfw - FPC 1 PIC 1 Need bounce - T289241
  • 09:34 XioNoX: move all VRRP primary to cr1-codfw - T289241
  • 09:31 XioNoX: cr1-codfw - FPC 1 PIC 0 Need bounce - T289241
  • 09:29 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2012.codfw.wmnet with OS buster
  • 09:27 XioNoX: move all VRRP primary to cr2-codfw - https://phabricator.wikimedia.org/T289241
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2016.codfw.wmnet with OS buster
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18054 and previous config saved to /var/cache/conftool/dbconfig/20211207-092302-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T277354)', diff saved to https://phabricator.wikimedia.org/P18053 and previous config saved to /var/cache/conftool/dbconfig/20211207-090758-marostegui.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T277354)', diff saved to https://phabricator.wikimedia.org/P18052 and previous config saved to /var/cache/conftool/dbconfig/20211207-090620-marostegui.json
  • 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T277354
  • 09:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T277354
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T277354)', diff saved to https://phabricator.wikimedia.org/P18051 and previous config saved to /var/cache/conftool/dbconfig/20211207-090613-marostegui.json
  • 08:55 moritzm: draining primary/secondary instances off ganeti2013 T296622
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P18050 and previous config saved to /var/cache/conftool/dbconfig/20211207-085108-marostegui.json
  • 08:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2016.codfw.wmnet with OS buster
  • 08:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2016.codfw.wmnet with OS buster
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P18049 and previous config saved to /var/cache/conftool/dbconfig/20211207-083604-marostegui.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T277354)', diff saved to https://phabricator.wikimedia.org/P18048 and previous config saved to /var/cache/conftool/dbconfig/20211207-082059-marostegui.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T277354)', diff saved to https://phabricator.wikimedia.org/P18047 and previous config saved to /var/cache/conftool/dbconfig/20211207-081936-marostegui.json
  • 08:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1129.eqiad.wmnet with reason: Maintenance T277354
  • 08:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1129.eqiad.wmnet with reason: Maintenance T277354
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T277354)', diff saved to https://phabricator.wikimedia.org/P18046 and previous config saved to /var/cache/conftool/dbconfig/20211207-081928-marostegui.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P18045 and previous config saved to /var/cache/conftool/dbconfig/20211207-080424-marostegui.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P18044 and previous config saved to /var/cache/conftool/dbconfig/20211207-074919-marostegui.json
  • 07:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2178202: Deploy Growth mentor dashboard to all wikis (T278920) (duration: 00m 58s)
  • 07:37 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T277354)', diff saved to https://phabricator.wikimedia.org/P18043 and previous config saved to /var/cache/conftool/dbconfig/20211207-073413-marostegui.json
  • 07:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T277354)', diff saved to https://phabricator.wikimedia.org/P18042 and previous config saved to /var/cache/conftool/dbconfig/20211207-073252-marostegui.json
  • 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1162.eqiad.wmnet with reason: Maintenance T277354
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1162.eqiad.wmnet with reason: Maintenance T277354
  • 07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: Maintenance T277354
  • 07:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 8 hosts with reason: Maintenance T277354
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T277354)', diff saved to https://phabricator.wikimedia.org/P18041 and previous config saved to /var/cache/conftool/dbconfig/20211207-072311-marostegui.json
  • 07:16 marostegui: power off db2074, db2078, db2101, db2130, dbproxy2004 T296930
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18040 and previous config saved to /var/cache/conftool/dbconfig/20211207-070806-marostegui.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18039 and previous config saved to /var/cache/conftool/dbconfig/20211207-065301-marostegui.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T277354)', diff saved to https://phabricator.wikimedia.org/P18038 and previous config saved to /var/cache/conftool/dbconfig/20211207-063756-marostegui.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T277354)', diff saved to https://phabricator.wikimedia.org/P18037 and previous config saved to /var/cache/conftool/dbconfig/20211207-063621-marostegui.json
  • 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1105.eqiad.wmnet with reason: Maintenance T277354
  • 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1105.eqiad.wmnet with reason: Maintenance T277354
  • 06:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance T277354
  • 06:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance T277354
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1100 (T277354)', diff saved to https://phabricator.wikimedia.org/P18036 and previous config saved to /var/cache/conftool/dbconfig/20211207-063140-marostegui.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18035 and previous config saved to /var/cache/conftool/dbconfig/20211207-061635-marostegui.json
  • 06:14 marostegui: Apply SET GLOBAL innodb_checksum_algorithm=full_crc32; on db1107 T287244
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18034 and previous config saved to /var/cache/conftool/dbconfig/20211207-060130-marostegui.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074 and db2130 T296930', diff saved to https://phabricator.wikimedia.org/P18033 and previous config saved to /var/cache/conftool/dbconfig/20211207-055808-marostegui.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1100 (T277354)', diff saved to https://phabricator.wikimedia.org/P18032 and previous config saved to /var/cache/conftool/dbconfig/20211207-054625-marostegui.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T277354)', diff saved to https://phabricator.wikimedia.org/P18031 and previous config saved to /var/cache/conftool/dbconfig/20211207-054506-marostegui.json
  • 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1100.eqiad.wmnet with reason: Maintenance T277354
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1100.eqiad.wmnet with reason: Maintenance T277354
  • 00:10 cwhite: end codfw opensearch upgrade T288621

2021-12-06

  • 22:19 mstyles@deploy1002: Synchronized php-1.38.0-wmf.9/includes/content/ContentModelChange.php: Deploy security patch for T271037 (duration: 00m 56s)
  • 20:14 cwhite: begin codfw opensearch upgrade T288621
  • 20:14 cwhite: begin codfw opensearch upgrade T288612
  • 19:58 legoktm: trying new dump of Special:CodeReview on mwmaint1002 (T205361)
  • 19:26 legoktm: installing php-yaml on all appservers
  • 19:08 damilare: updated civicrm from b82183b9 to 311382de
  • 19:04 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: bnwikibooks: add autopatrolled and patroller user groups (T296640) (duration: 00m 56s)
  • 19:03 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 19:02 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 19:02 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 19:00 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 18:52 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 18:45 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 18:43 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 18:34 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 18:00 majavah: "foreachwiki namespaceDupes.php --fix | tee namespaceDupes-T293839-fix.txt" FINISHED about 15 minutes ago T293839
  • 17:27 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T296897 Move cirrus traffic to codfw (duration: 00m 56s)
  • 16:24 majavah: starting "foreachwiki namespaceDupes.php --fix | tee namespaceDupes-T293839-fix.txt" in mwmaint1002 screen session, T293839
  • 15:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti2012.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 15:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti2012.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 14:45 elukey: roll restart of nfacctd on netflow* nodes to pick up the new CA bundle for librdkafka
  • 14:19 moritzm: draining primary/secondary instances off ganeti2012 T296622
  • 14:06 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2016.codfw.wmnet with OS buster
  • 14:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4d8a75d: Deploy Growth features on zhwiki in dark mode (T287884) (duration: 00m 56s)
  • 13:56 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=zhwiki --phab=T287884
  • 13:52 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zhwiki growthexperiments # T287884
  • 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti2016.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 13:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti2016.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 13:30 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:25 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:03 majavah: $ mwscript namespaceDupes.php --wiki barwiki --fix --add-prefix=BROKEN # T293839
  • 12:58 majavah: mwscript namespaceDupes.php --wiki skwiki --fix --add-prefix=BROKEN # T293839
  • 12:54 majavah: mwscript namespaceDupes.php --wiki skwiki --fix # T293839
  • 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2011.codfw.wmnet with reason: readding to cluster after reimage
  • 12:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2011.codfw.wmnet with reason: readding to cluster after reimage
  • 12:48 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set default two-letter NS_PROJECT aliases (T293839) (duration: 00m 55s)
  • 12:41 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Autopatroller level page protection for English Wiktionary (T296580) (duration: 00m 56s)
  • 12:28 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable SandboxLink extension for bnwikivoyage (T296637) (duration: 00m 55s)
  • 12:22 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable groups autopatrolled and patroller for bnwikivoyage (T296637) (duration: 00m 56s)
  • 12:15 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable SectionTranslation in Malayalam, Malay, Azerbaijani, Tamil, Bashkir and Albanian WPs (T285842) (duration: 00m 56s)
  • 12:08 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: hewiki: add "templateeditor" permission group (T296769) (duration: 00m 57s)
  • 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
  • 11:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
  • 11:28 Amir1: dropping wikiadmin@localhost from all of s3 (T296511)
  • 11:21 Amir1: dropping wikiadmin@localhost from all of s2 (T296511)
  • 11:12 moritzm: draining primary/secondary instances off ganeti2016 T296622
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: switch to drbd storage
  • 10:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: switch to drbd storage
  • 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2011.codfw.wmnet with OS buster
  • 10:31 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:23 moritzm: draining primary/secondary instances off ganeti2015 T296622
  • 09:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2011.codfw.wmnet with OS buster
  • 09:09 elukey: move kafka main codfw to fixed uid/gid for the kafka user (requires a stop/start of all daemons) - T296982
  • 08:13 moritzm: installing remaining icu security updates on buster

2021-12-04

  • 01:14 mutante: mx2001 - did not come back from reboot, did not get IP on interface, could not start ferm, logged in via console with root password, in /etc/network/interfaces replaced all "ens5" with "ens13", rebooted again, selected previous kernel version
  • 00:54 mutante: rebooting mx2001
  • 00:31 jynus: manually restarting clamav on otrs1001 after being killed

2021-12-03

  • 20:29 cstone: revision changed from 2c2e22cd to b82183b9
  • 17:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 17:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 17:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 17:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 17:35 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 17:35 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:22 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 16:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 16:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 16:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 16:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 16:42 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 16:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 16:39 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 14:25 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner2001.codfw.wmnet
  • 14:10 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner2001.codfw.wmnet
  • 12:53 moritzm: installing nss security updates on stretch
  • 12:37 moritzm: draining primary/secondary instances off ganeti2007 T296622
  • 12:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2022.codfw.wmnet to ganeti01.svc.codfw.wmnet
  • 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2022.codfw.wmnet to ganeti01.svc.codfw.wmnet
  • 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
  • 12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2022.codfw.wmnet with OS buster
  • 11:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2022.codfw.wmnet with OS buster
  • 11:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2011.codfw.wmnet with OS buster
  • 11:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2011.codfw.wmnet with OS buster
  • 11:06 jynus: stop and shutdown db1102 T296546
  • 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 11:01 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 09:38 moritzm: draining primary/secondary instances off ganeti2011 T296622
  • 09:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2009.codfw.wmnet to ganeti01.svc.codfw.wmnet
  • 09:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2009.codfw.wmnet to ganeti01.svc.codfw.wmnet
  • 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
  • 09:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161 (T277354)', diff saved to https://phabricator.wikimedia.org/P18019 and previous config saved to /var/cache/conftool/dbconfig/20211203-091537-marostegui.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18018 and previous config saved to /var/cache/conftool/dbconfig/20211203-090033-marostegui.json
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2009.codfw.wmnet with OS buster
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18017 and previous config saved to /var/cache/conftool/dbconfig/20211203-084528-marostegui.json
  • 08:44 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:43 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161 (T277354)', diff saved to https://phabricator.wikimedia.org/P18016 and previous config saved to /var/cache/conftool/dbconfig/20211203-083023-marostegui.json
  • 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2009.codfw.wmnet with OS buster
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T277354)', diff saved to https://phabricator.wikimedia.org/P18015 and previous config saved to /var/cache/conftool/dbconfig/20211203-082859-marostegui.json
  • 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db[1154,1161].eqiad.wmnet with reason: Maintenance T277354
  • 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db[1154,1161].eqiad.wmnet with reason: Maintenance T277354
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110 (T277354)', diff saved to https://phabricator.wikimedia.org/P18014 and previous config saved to /var/cache/conftool/dbconfig/20211203-082848-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18013 and previous config saved to /var/cache/conftool/dbconfig/20211203-081343-marostegui.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18012 and previous config saved to /var/cache/conftool/dbconfig/20211203-075839-marostegui.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110 (T277354)', diff saved to https://phabricator.wikimedia.org/P18011 and previous config saved to /var/cache/conftool/dbconfig/20211203-074334-marostegui.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T277354)', diff saved to https://phabricator.wikimedia.org/P18010 and previous config saved to /var/cache/conftool/dbconfig/20211203-073910-marostegui.json
  • 07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance T277354
  • 07:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance T277354
  • 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance T277354
  • 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance T277354
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P18009 and previous config saved to /var/cache/conftool/dbconfig/20211203-073404-marostegui.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18008 and previous config saved to /var/cache/conftool/dbconfig/20211203-071900-marostegui.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18007 and previous config saved to /var/cache/conftool/dbconfig/20211203-070355-marostegui.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P18006 and previous config saved to /var/cache/conftool/dbconfig/20211203-064850-marostegui.json
  • 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P18005 and previous config saved to /var/cache/conftool/dbconfig/20211203-062019-marostegui.json
  • 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance T277354
  • 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance T277354
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P18004 and previous config saved to /var/cache/conftool/dbconfig/20211203-062011-marostegui.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18003 and previous config saved to /var/cache/conftool/dbconfig/20211203-060506-marostegui.json
  • 06:02 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18002 and previous config saved to /var/cache/conftool/dbconfig/20211203-055001-marostegui.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P18001 and previous config saved to /var/cache/conftool/dbconfig/20211203-053457-marostegui.json
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P18000 and previous config saved to /var/cache/conftool/dbconfig/20211203-053032-marostegui.json
  • 05:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance T277354
  • 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance T277354
  • 01:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2025.codfw.wmnet with OS buster
  • 01:06 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2025.codfw.wmnet with OS buster
  • 01:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2024.codfw.wmnet with OS buster
  • 01:01 tgr: UTC late deploys done
  • 01:00 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: Add an image: Add test version of GEInfoboxTemplates (T291232) (duration: 00m 57s)
  • 00:44 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/python3-imagecatalog/imagecatalog_0.0.1-1_amd64.changes
  • 00:37 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes: Backport: Avoid references to TemplateCollectionFeature step2 (duration: 00m 56s)
  • 00:36 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/Config/Validation/GrowthConfigValidation.php: Backport: Avoid references to TemplateCollectionFeature step 1 (duration: 00m 56s)
  • 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2024.codfw.wmnet with OS buster

2021-12-02

  • 20:05 legoktm: re-pooling mw1414 following testing
  • 19:35 legoktm: installing yaml PHP extension on canaries
  • 19:29 andrewbogott: upgrading wikitech-static deb packages as well as moving to mediawiki 1.37.0
  • 19:26 majavah: UTC evening deploys done
  • 19:26 taavi@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/modules/ext.wikimediaEvents/webUIScroll.js: Backport: Update scroll instrument (T294246) (duration: 00m 56s)
  • 19:22 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Drop old config names for CentralAuth denylist controls (T277932) (duration: 00m 56s)
  • 19:12 taavi@deploy1002: Synchronized wmf-config: Config: GrowthExperiments configuration fixes (T294737) (duration: 00m 57s)
  • 18:56 legoktm: upgraded scap to 4.1.0 on A:mw-canary, A:parsoid-canary, A:mw-jobrunner-canary (T296867)
  • 18:45 legoktm: uploaded scap 4.1.0 to apt.wm.o (T296867)
  • 18:22 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 18:19 vgutierrez: re-enable puppet on cp3064 - T296874
  • 18:14 hoo: Started Wikibase rebuildItemsPerSite on mwmaint1002 for wikidatawiki. Can be killed at any time, if necessary.
  • 17:51 vgutierrez: puppet disabled on cp3064 to manually increase number of maxconns in HAProxy - T296874
  • 17:38 ryankemper: [WDQS] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/743216/; as a result of the fix `'-Dwdqs.throttling-filter.time-bucket-capacity-in-seconds=240', '-Dwdqs.throttling-filter.time-bucket-refill-amount-in-seconds=120', '-Dwdqs.throttling-filter.ban-duration-in-minutes=60'` will now be in the `extra_jvm_opts` for `wdqs-internal` hosts
  • 15:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2022.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 15:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2022.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17997 and previous config saved to /var/cache/conftool/dbconfig/20211202-145151-marostegui.json
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17996 and previous config saved to /var/cache/conftool/dbconfig/20211202-143646-marostegui.json
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17995 and previous config saved to /var/cache/conftool/dbconfig/20211202-142141-marostegui.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17994 and previous config saved to /var/cache/conftool/dbconfig/20211202-140636-marostegui.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17993 and previous config saved to /var/cache/conftool/dbconfig/20211202-140557-marostegui.json
  • 14:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance T277354
  • 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance T277354
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 (T277354)', diff saved to https://phabricator.wikimedia.org/P17992 and previous config saved to /var/cache/conftool/dbconfig/20211202-140548-marostegui.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 (T277354)', diff saved to https://phabricator.wikimedia.org/P17990 and previous config saved to /var/cache/conftool/dbconfig/20211202-135043-marostegui.json
  • 13:49 hnowlan: roll-restarting tilerator,tileratorui,kartotherian in eqiad
  • 13:37 hnowlan: roll-restarting tilerator,tileratorui,kartotherian in codfw
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 (T277354)', diff saved to https://phabricator.wikimedia.org/P17989 and previous config saved to /var/cache/conftool/dbconfig/20211202-133538-marostegui.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 (T277354)', diff saved to https://phabricator.wikimedia.org/P17988 and previous config saved to /var/cache/conftool/dbconfig/20211202-132034-marostegui.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T277354)', diff saved to https://phabricator.wikimedia.org/P17987 and previous config saved to /var/cache/conftool/dbconfig/20211202-131959-marostegui.json
  • 13:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2094,2128].codfw.wmnet with reason: Maintenance T277354
  • 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2094,2128].codfw.wmnet with reason: Maintenance T277354
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 (T277354)', diff saved to https://phabricator.wikimedia.org/P17986 and previous config saved to /var/cache/conftool/dbconfig/20211202-131949-marostegui.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 (T277354)', diff saved to https://phabricator.wikimedia.org/P17985 and previous config saved to /var/cache/conftool/dbconfig/20211202-130444-marostegui.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 (T277354)', diff saved to https://phabricator.wikimedia.org/P17983 and previous config saved to /var/cache/conftool/dbconfig/20211202-124940-marostegui.json
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 (T277354)', diff saved to https://phabricator.wikimedia.org/P17982 and previous config saved to /var/cache/conftool/dbconfig/20211202-123435-marostegui.json
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2113 (T277354)', diff saved to https://phabricator.wikimedia.org/P17981 and previous config saved to /var/cache/conftool/dbconfig/20211202-123356-marostegui.json
  • 12:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2113.codfw.wmnet with reason: Maintenance T277354
  • 12:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2113.codfw.wmnet with reason: Maintenance T277354
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 (T277354)', diff saved to https://phabricator.wikimedia.org/P17980 and previous config saved to /var/cache/conftool/dbconfig/20211202-123348-marostegui.json
  • 12:31 moritzm: installing NSS security updates
  • 12:27 Lucas_WMDE: UTC morning backport+config window done
  • 12:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Wikisource: enable proofreading change-tagging for all Wikisources (T289140) (duration: 00m 57s)
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 (T277354)', diff saved to https://phabricator.wikimedia.org/P17979 and previous config saved to /var/cache/conftool/dbconfig/20211202-121843-marostegui.json
  • 12:12 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2009.codfw.wmnet with OS buster
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 (T277354)', diff saved to https://phabricator.wikimedia.org/P17978 and previous config saved to /var/cache/conftool/dbconfig/20211202-120338-marostegui.json
  • 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2009.codfw.wmnet with OS buster
  • 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 (T277354)', diff saved to https://phabricator.wikimedia.org/P17977 and previous config saved to /var/cache/conftool/dbconfig/20211202-114833-marostegui.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T277354)', diff saved to https://phabricator.wikimedia.org/P17976 and previous config saved to /var/cache/conftool/dbconfig/20211202-114755-marostegui.json
  • 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance T277354
  • 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance T277354
  • 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance T277354
  • 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance T277354
  • 11:47 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17975 and previous config saved to /var/cache/conftool/dbconfig/20211202-114711-marostegui.json
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17974 and previous config saved to /var/cache/conftool/dbconfig/20211202-113206-marostegui.json
  • 11:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:21 moritzm: draining primary/secondary instances off ganeti2022 T296622
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17973 and previous config saved to /var/cache/conftool/dbconfig/20211202-111702-marostegui.json
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17972 and previous config saved to /var/cache/conftool/dbconfig/20211202-110157-marostegui.json
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2089:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17971 and previous config saved to /var/cache/conftool/dbconfig/20211202-110120-marostegui.json
  • 11:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2089.codfw.wmnet with reason: Maintenance T277354
  • 11:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2089.codfw.wmnet with reason: Maintenance T277354
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 (T277354)', diff saved to https://phabricator.wikimedia.org/P17970 and previous config saved to /var/cache/conftool/dbconfig/20211202-110110-marostegui.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 (T277354)', diff saved to https://phabricator.wikimedia.org/P17969 and previous config saved to /var/cache/conftool/dbconfig/20211202-104606-marostegui.json
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 (T277354)', diff saved to https://phabricator.wikimedia.org/P17968 and previous config saved to /var/cache/conftool/dbconfig/20211202-103100-marostegui.json
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 (T277354)', diff saved to https://phabricator.wikimedia.org/P17967 and previous config saved to /var/cache/conftool/dbconfig/20211202-101555-marostegui.json
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2075 (T277354)', diff saved to https://phabricator.wikimedia.org/P17966 and previous config saved to /var/cache/conftool/dbconfig/20211202-101522-marostegui.json
  • 10:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2075.codfw.wmnet with reason: Maintenance T277354
  • 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2075.codfw.wmnet with reason: Maintenance T277354
  • 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Maintenance T277354
  • 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Maintenance T277354
  • 10:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance T277354
  • 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance T277354
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17964 and previous config saved to /var/cache/conftool/dbconfig/20211202-100307-marostegui.json
  • 09:52 moritzm: draining primary/secondary instances off ganeti2009 T296622
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17963 and previous config saved to /var/cache/conftool/dbconfig/20211202-094802-marostegui.json
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17962 and previous config saved to /var/cache/conftool/dbconfig/20211202-093257-marostegui.json
  • 09:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2010.codfw.wmnet to ganeti01.svc.codfw.wmnet
  • 09:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2010.codfw.wmnet to ganeti01.svc.codfw.wmnet
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17961 and previous config saved to /var/cache/conftool/dbconfig/20211202-091753-marostegui.json
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T277354)', diff saved to https://phabricator.wikimedia.org/P17960 and previous config saved to /var/cache/conftool/dbconfig/20211202-091629-marostegui.json
  • 09:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance T277354
  • 09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance T277354
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
  • 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2010.codfw.wmnet with OS buster
  • 08:29 dcausse: restarting blazegraph on wdqs1007 (jvm stuck for 4h)
  • 08:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
  • 02:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 02:43 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 02:40 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 02:15 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 02:14 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1028.eqiad.wmnet with OS buster
  • 01:52 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
  • 01:21 ryankemper: T280001 Rolling restart of low-traffic pybal hosts complete. All of `wcqs` is pooled and the pybal / ipvs related alerts have cleared
  • 01:16 ryankemper: T280001 Pooled `wcqs200[1-3]` (had been left unpooled from when we last removed wcqs from production)
  • 01:12 ryankemper: T280001 Restarting pybal on low-traffic primaries `lvs2009` and `lvs1015`: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'`
  • 01:11 ryankemper: T280001 Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015`
  • 01:08 ryankemper: T280001 Sanity check of `sudo ipvsadm -L -n` on backup `lvs2010` and `lvs1016` looks good (for ex `lvs1016` has `TCP 10.2.2.67:443 wrr`)
  • 01:07 ryankemper: T280001 Restarting pybal on low-traffic backups: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2010*,lvs1016*}' 'sudo systemctl restart pybal'`
  • 01:02 ryankemper: T280001 `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`
  • 01:01 ryankemper: T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/742841
  • 01:00 ryankemper: T280001 About to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/742841 to bring `wcqs` into state `lvs_setup`, after which I'll perform a rolling restart of pybal
  • 00:24 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/skins/Vector/: a7586cd: Update scroll observer to allow event logging (T292586) (duration: 00m 57s)

2021-12-01

  • 22:15 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 07s)
  • 22:15 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
  • 22:13 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 07s)
  • 22:13 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
  • 22:12 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 07s)
  • 22:12 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
  • 22:12 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 01m 23s)
  • 22:11 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
  • 22:10 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
  • 22:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
  • 22:10 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
  • 22:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
  • 22:09 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
  • 22:09 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
  • 21:12 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
  • 21:12 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
  • 21:11 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 16s)
  • 21:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
  • 21:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257]: (no justification provided)
  • 21:09 razzi@deploy1002: Finished deploy [analytics/refinery@3b1b794]: Regular analytics weekly train [analytics/refinery@3b1b794] (duration: 21m 18s)
  • 21:06 jynus: installing python-monotonic on ms-fe2011, ms-fe2012 (breaks swift-proxy)
  • 21:02 jynus: installing python-monotonic on ms-fe2010
  • 20:48 razzi@deploy1002: Started deploy [analytics/refinery@3b1b794]: Regular analytics weekly train [analytics/refinery@3b1b794]
  • 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 19:46 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
  • 19:46 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
  • 19:30 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 22s)
  • 19:30 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
  • 19:27 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 02m 26s)
  • 19:25 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
  • 19:24 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 03s)
  • 19:24 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
  • 19:24 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 03s)
  • 19:24 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
  • 19:18 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 03s)
  • 19:18 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
  • 19:13 majavah: UTC evening deploys done
  • 19:11 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add mediawiki.web_ui_scroll stream (T292586) (duration: 00m 57s)
  • 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:41 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1089.eqiad.wmnet with OS buster
  • 18:39 vgutierrez: pool cp1089 using HAProxy as TLS terminator - T290005
  • 17:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1089.eqiad.wmnet with OS buster
  • 17:54 vgutierrez: depool cp1089 to be reimaged as cache::text_haproxy - T290005
  • 16:08 moritzm: installing postgresql-9.6 security updates
  • 15:54 godog: bounce logstash on eqiad/codfw to apply template changes
  • 15:53 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 15:42 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2010.codfw.wmnet with OS buster
  • 15:27 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
  • 15:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2010.codfw.wmnet with OS buster
  • 15:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
  • 15:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 (T277354)', diff saved to https://phabricator.wikimedia.org/P17955 and previous config saved to /var/cache/conftool/dbconfig/20211201-150853-marostegui.json
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 (T277354)', diff saved to https://phabricator.wikimedia.org/P17954 and previous config saved to /var/cache/conftool/dbconfig/20211201-145348-marostegui.json
  • 14:42 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2010.codfw.wmnet with OS buster
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 (T277354)', diff saved to https://phabricator.wikimedia.org/P17953 and previous config saved to /var/cache/conftool/dbconfig/20211201-143843-marostegui.json
  • 14:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetboard1001.eqiad.wmnet
  • 14:29 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard1001.eqiad.wmnet
  • 14:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetboard2001.codfw.wmnet
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 (T277354)', diff saved to https://phabricator.wikimedia.org/P17951 and previous config saved to /var/cache/conftool/dbconfig/20211201-142339-marostegui.json
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T277354)', diff saved to https://phabricator.wikimedia.org/P17950 and previous config saved to /var/cache/conftool/dbconfig/20211201-142227-marostegui.json
  • 14:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance T277354
  • 14:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance T277354
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 (T277354)', diff saved to https://phabricator.wikimedia.org/P17949 and previous config saved to /var/cache/conftool/dbconfig/20211201-142219-marostegui.json
  • 14:13 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2001.codfw.wmnet
  • 14:13 jynus: started commonswiki codfw media backup at 8 threads of parallelism
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 (T277354)', diff saved to https://phabricator.wikimedia.org/P17948 and previous config saved to /var/cache/conftool/dbconfig/20211201-140715-marostegui.json
  • 13:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
  • 13:56 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 (T277354)', diff saved to https://phabricator.wikimedia.org/P17947 and previous config saved to /var/cache/conftool/dbconfig/20211201-135210-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 (T277354)', diff saved to https://phabricator.wikimedia.org/P17946 and previous config saved to /var/cache/conftool/dbconfig/20211201-133705-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T277354)', diff saved to https://phabricator.wikimedia.org/P17945 and previous config saved to /var/cache/conftool/dbconfig/20211201-133554-marostegui.json
  • 13:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance T277354
  • 13:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance T277354
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 (T277354)', diff saved to https://phabricator.wikimedia.org/P17944 and previous config saved to /var/cache/conftool/dbconfig/20211201-133546-marostegui.json
  • 13:30 moritzm: set "sudo gnt-cluster modify --hypervisor-parameters kvm:machine_version=pc-i440fx-2.8" for ganeti eqiad cluster T294120
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 (T277354)', diff saved to https://phabricator.wikimedia.org/P17942 and previous config saved to /var/cache/conftool/dbconfig/20211201-132041-marostegui.json
  • 13:19 vgutierrez: restore haproxy 2.2.9 on cp3064 - T290005
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 (T277354)', diff saved to https://phabricator.wikimedia.org/P17939 and previous config saved to /var/cache/conftool/dbconfig/20211201-130536-marostegui.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 (T277354)', diff saved to https://phabricator.wikimedia.org/P17938 and previous config saved to /var/cache/conftool/dbconfig/20211201-125031-marostegui.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T277354)', diff saved to https://phabricator.wikimedia.org/P17937 and previous config saved to /var/cache/conftool/dbconfig/20211201-124919-marostegui.json
  • 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1131.eqiad.wmnet with reason: Maintenance T277354
  • 12:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1131.eqiad.wmnet with reason: Maintenance T277354
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 (T277354)', diff saved to https://phabricator.wikimedia.org/P17936 and previous config saved to /var/cache/conftool/dbconfig/20211201-122020-marostegui.json
  • 12:11 urbanecm: EU B&C window done
  • 12:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c8ab29b: enwikisource: enable anonymous talk page mobile tabs (T47955) (duration: 00m 56s)
  • 12:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2bd14e8: Add templateeditor group and protection level at viwiki (T296154) (duration: 00m 56s)
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 (T277354)', diff saved to https://phabricator.wikimedia.org/P17935 and previous config saved to /var/cache/conftool/dbconfig/20211201-120515-marostegui.json
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 (T277354)', diff saved to https://phabricator.wikimedia.org/P17934 and previous config saved to /var/cache/conftool/dbconfig/20211201-115011-marostegui.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 (T277354)', diff saved to https://phabricator.wikimedia.org/P17933 and previous config saved to /var/cache/conftool/dbconfig/20211201-113506-marostegui.json
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T277354)', diff saved to https://phabricator.wikimedia.org/P17932 and previous config saved to /var/cache/conftool/dbconfig/20211201-113354-marostegui.json
  • 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1155,1165].eqiad.wmnet with reason: Maintenance T277354
  • 11:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1155,1165].eqiad.wmnet with reason: Maintenance T277354
  • 11:31 vgutierrez: test HAProxy 2.4.9 on cp3064 - T290005
  • 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1140.eqiad.wmnet with reason: Maintenance T277354
  • 11:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1140.eqiad.wmnet with reason: Maintenance T277354
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17931 and previous config saved to /var/cache/conftool/dbconfig/20211201-112952-marostegui.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17930 and previous config saved to /var/cache/conftool/dbconfig/20211201-111448-marostegui.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17929 and previous config saved to /var/cache/conftool/dbconfig/20211201-105943-marostegui.json
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17928 and previous config saved to /var/cache/conftool/dbconfig/20211201-104438-marostegui.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17927 and previous config saved to /var/cache/conftool/dbconfig/20211201-104316-marostegui.json
  • 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance T277354
  • 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance T277354
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17926 and previous config saved to /var/cache/conftool/dbconfig/20211201-104308-marostegui.json
  • 10:29 Lucas_WMDE: Deployed patch for T296578
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17925 and previous config saved to /var/cache/conftool/dbconfig/20211201-102804-marostegui.json
  • 10:23 vgutierrez: test haproxy_2.2.19-1~bpo10+1 on cp3064 - T290005
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17924 and previous config saved to /var/cache/conftool/dbconfig/20211201-101259-marostegui.json
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17923 and previous config saved to /var/cache/conftool/dbconfig/20211201-095754-marostegui.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17922 and previous config saved to /var/cache/conftool/dbconfig/20211201-095632-marostegui.json
  • 09:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1098.eqiad.wmnet with reason: Maintenance T277354
  • 09:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1098.eqiad.wmnet with reason: Maintenance T277354
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17921 and previous config saved to /var/cache/conftool/dbconfig/20211201-095624-marostegui.json
  • 09:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:46 taavi@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: beta: Update mx host (duration: 00m 55s)
  • 09:43 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwiki extensions/CheckUser/maintenance/fixTrailingSpacesInLogs.php
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17920 and previous config saved to /var/cache/conftool/dbconfig/20211201-094120-marostegui.json
  • 09:39 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevision.php: Backport: Drop using ft_title and ft_namespace (T296380) (duration: 00m 56s)
  • 09:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17919 and previous config saved to /var/cache/conftool/dbconfig/20211201-092615-marostegui.json
  • 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd2005.codfw.wmnet with reason: Switch to DRBD for migration
  • 09:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd2005.codfw.wmnet with reason: Switch to DRBD for migration
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17918 and previous config saved to /var/cache/conftool/dbconfig/20211201-091110-marostegui.json
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T277354)', diff saved to https://phabricator.wikimedia.org/P17917 and previous config saved to /var/cache/conftool/dbconfig/20211201-090948-marostegui.json
  • 09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance T277354
  • 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance T277354
  • 09:03 vgutierrez: rolling restart of haproxy and varnish on O:cache::text_haproxy and O:cache::upload_haproxy - T290005
  • 08:56 moritzm: draining primary/secondary instance off ganeti2010 T296622
  • 08:51 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:41 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:32 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance T277354
  • 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance T277354
  • 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance T277354
  • 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance T277354
  • 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2117.codfw.wmnet with reason: Maintenance T277354
  • 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2117.codfw.wmnet with reason: Maintenance T277354
  • 00:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:32 catrope@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/NewcomerTasks/NewcomerTasksUserOptionsLookup.php: Backport: Newcomer tasks: Fix filtering of non-existent task types (T296366) (duration: 00m 56s)
  • 00:10 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable A/B test enrollment instrumentation. (T292587) (duration: 00m 56s)
  • 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-11-30

  • 23:59 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 23:57 mutante: deploy1002 - kube_env miscweb staging ; helmfile -e staging destroy
  • 23:56 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:09 mutante: gerrit - added Majavah to wmf-deployment group for T296777
  • 22:30 krinkle@deploy1002: Finished deploy [integration/docroot@2af7007]: Ia89b6591639e5 (duration: 00m 09s)
  • 22:30 krinkle@deploy1002: Started deploy [integration/docroot@2af7007]: Ia89b6591639e5
  • 22:21 mutante: welcome Majavah to MediaWiki deployers (T296777)
  • 20:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5443b78: uzwiki: Deploy Growth features to newcomers (T294245) (duration: 00m 57s)
  • 18:09 legoktm: uploaded php-yaml for component/php72 (T296331)
  • 18:08 vgutierrez: restart haproxy on cp3064 - T290005
  • 17:44 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 fully', diff saved to https://phabricator.wikimedia.org/P17912 and previous config saved to /var/cache/conftool/dbconfig/20211130-174434-jynus.json
  • 17:39 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 at 50%', diff saved to https://phabricator.wikimedia.org/P17911 and previous config saved to /var/cache/conftool/dbconfig/20211130-173935-jynus.json
  • 17:35 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 at 25%', diff saved to https://phabricator.wikimedia.org/P17910 and previous config saved to /var/cache/conftool/dbconfig/20211130-173517-jynus.json
  • 17:34 moritzm: installing libvorbis security updates
  • 17:15 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 at 5%', diff saved to https://phabricator.wikimedia.org/P17908 and previous config saved to /var/cache/conftool/dbconfig/20211130-171550-jynus.json
  • 17:00 jynus: move db1139:s1 under db1118
  • 16:57 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1163 fully', diff saved to https://phabricator.wikimedia.org/P17907 and previous config saved to /var/cache/conftool/dbconfig/20211130-165718-jynus.json
  • 16:29 XioNoX: Move cr2-codfw lumen transit link to BO cable - T289241
  • 16:26 XioNoX: Move cr2-codfw eqord link to BO cable - T289241
  • 16:23 XioNoX: Move cr2-codfw pfw3 link to BO cable - T289241
  • 16:20 Emperor: reboot ms-be2059 to fix device enumeration order re T295563
  • 16:14 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1163 at 25%', diff saved to https://phabricator.wikimedia.org/P17906 and previous config saved to /var/cache/conftool/dbconfig/20211130-161457-jynus.json
  • 16:13 XioNoX: cr2-codfw bounce fpc 1 pic 0 (vrrp backup) - T289241
  • 16:07 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1163 at 50%', diff saved to https://phabricator.wikimedia.org/P17905 and previous config saved to /var/cache/conftool/dbconfig/20211130-160748-jynus.json
  • 16:06 bblack: lvs2007 - repooling into service
  • 16:01 bblack: lvs2007 - depooling for network maint - do not push LVS config changes please!
  • 15:41 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts puppetboard2001.codfw.wmnet
  • 15:41 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2001.codfw.wmnet
  • 15:38 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts puppetboard2001.codfw.wmnet
  • 15:37 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2001.codfw.wmnet
  • 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:12 jforrester@deploy1002: Synchronized multiversion/MWMultiVersion.php: Add wikifunctions hard-coded value to setSiteInfoForWiki for Beta Cluster T284162 (duration: 00m 56s)
  • 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:45 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 13:25 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 (T277354)', diff saved to https://phabricator.wikimedia.org/P17904 and previous config saved to /var/cache/conftool/dbconfig/20211130-131124-marostegui.json
  • 13:05 topranks: Running homer against CR routers to adjust loopback4 filter enabling local NTP queries for status. T296623
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 (T277354)', diff saved to https://phabricator.wikimedia.org/P17903 and previous config saved to /var/cache/conftool/dbconfig/20211130-125620-marostegui.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 (T277354)', diff saved to https://phabricator.wikimedia.org/P17902 and previous config saved to /var/cache/conftool/dbconfig/20211130-124115-marostegui.json
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 (T277354)', diff saved to https://phabricator.wikimedia.org/P17901 and previous config saved to /var/cache/conftool/dbconfig/20211130-122610-marostegui.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T277354)', diff saved to https://phabricator.wikimedia.org/P17900 and previous config saved to /var/cache/conftool/dbconfig/20211130-122555-marostegui.json
  • 12:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance T277354
  • 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance T277354
  • 12:09 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts puppetboard1001.eqiad.wmnet
  • 12:02 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard1001.eqiad.wmnet
  • 11:50 moritzm: running "sudo gnt-cluster renew-crypto --new-node-certificates --new-rapi-certificate --new-spice-certificate" for Ganeti codfw cluster T296622
  • 11:01 hnowlan: restarting tilerator, kartotherian and tileratorui for updates in eqiad
  • 11:01 hnowlan: restarting tilerator, kartotherian and tileratorui in codfw
  • 10:39 elukey: rollout wmf-certificates 0~20211129-1 fleet wide (add group/others permissions to the cert bundle)
  • 10:30 lucaswerkmeister-wmde@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 10:29 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 09:58 moritzm: installing remaining ICU security updates
  • 09:06 Amir1: dropping wikiadmin@localhost from all pooled replicas of s6 (T296511)
  • 08:24 dcausse: restarting blazegraph on wdqs1006 (jvm stuck for 6hours)
  • 08:14 Amir1: revoking DROP from wikiadmin on all pooled replicas (T249683)
  • 03:46 ejegg: updated payments-wiki from dbc92132 to 4a4ef51d
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:17 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable scroll tracking for all users (T292586) (duration: 00m 55s)
  • 00:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:14 catrope@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/modules/ext.wikimediaEvents/readingDepth.js: Backport: Provide fallback for config variable when not present (duration: 00m 55s)
  • 00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:13 catrope@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: allow sysops to set/remove reviewer group on ckbwiki (T294696) (duration: 00m 55s)

2021-11-29

  • 22:32 sbassett@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/EntitySchema/src/MediaWiki/Specials/SetEntitySchemaLabelDescriptionAliases.php: Deploy security patch for T296578 (duration: 00m 55s)
  • 22:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:20 sbassett@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/FileImporter/src/Remote/MediaWiki/HttpApiLookup.php: Backport: SECURITY: Fix special page displaying unescaped user input (T296605) (duration: 00m 56s)
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:46 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Fix wgWikiLambdaOrchestratorLocation service pointer typo (duration: 00m 55s)
  • 20:27 tgr: UTC evening deploys done
  • 20:26 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Start imagerecommendation variant experiment (duration: 00m 55s)
  • 20:23 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/AddImageSubmissionHandler.php: Backport: AddImage: Refresh user's task feed after undecided rejection (T296491) (duration: 00m 56s)
  • 20:21 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: SuggestedEdits: Drop isActivated() check in getJsData (T296626) (duration: 00m 56s)
  • 20:17 ejegg: updated payments-wiki from d1d6f024 -> dbc92132
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:10 eileen: civicrm
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:00 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T295705 Move CirrusSearch traffic back to eqiad (duration: 00m 56s)
  • 19:42 legoktm: uploaded php-yaml_2.2.1+2.1.0+2.0.4+1.3.2-2+wmf1~buster1_amd64.changes to apt.wm.o (T296331)
  • 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:16 vgutierrez: pool cp3064 - T290005
  • 18:55 bblack: repooling esams
  • 18:48 bblack: esams: shifting depool method to esams-offline (now that its config is fixed)
  • 18:42 legoktm: depooling esams
  • 18:17 vgutierrez: depool cp3064 - T290005
  • 17:58 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: rdbms: Add DB host to TransactionProfiler logging and fix time fields (T295706) (duration: 00m 56s)
  • 17:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:40 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Initial Beta Cluster deployment of Wikifunctions: III - CS for T289315 (duration: 00m 55s)
  • 17:38 vgutierrez: pool cp3064 - T290005
  • 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:25 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 17:25 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:22 jforrester@deploy1002: Synchronized wmf-config/ProductionServices.php: Initial Beta Cluster deployment of Wikifunctions: II - Services for T289315 (duration: 00m 55s)
  • 17:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:18 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Initial Beta Cluster deployment of Wikifunctions: I - IS for T289315 (duration: 00m 55s)
  • 17:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 06d8d25: foundationwiki: Remove explicit wmgUseOAuth (duration: 00m 57s)
  • 16:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:56 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: bad34ed: Make foundationwiki a standard CentralAuth wiki (T205347) (duration: 00m 56s)
  • 16:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 567f2a9: Revert "foundationwiki: Set wmgLocalAuthLoginOnly=false temporarily" (T205347) (duration: 00m 56s)
  • 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2069.codfw.wmnet with OS buster
  • 16:04 moritzm: sudo gnt-cluster upgrade --to 2.16 for Ganeti codfw cluster
  • 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:52 vgutierrez: depool cp3064 - T290005
  • 15:51 James_F: Running mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=enwiki en wikimedia wikifunctionswiki wikifunctions.beta.wmflabs.org in Beta Cluster for T284162
  • 15:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2069.codfw.wmnet with OS buster
  • 15:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:47 papaul: power down logstash2028 for IDRAC reset
  • 15:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:15 moritzm: gnt-cluster renew-crypto --new-cluster-certificate for codfw Ganeti cluster T296622
  • 14:40 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:38 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:37 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:55 vgutierrez: repool cp3064 - T290005
  • 12:51 moritzm: upgrading ganeti codfw cluster to 2.16 backport T296622
  • 12:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:32 vgutierrez: depool cp3064 - T290005
  • 12:32 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: 0570440: Fix error handling in SuggestedEdits::getActionData (T296366) (duration: 05m 37s)
  • 12:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7fdea3e: Add planet4589.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T296136) (duration: 00m 56s)
  • 12:11 vgutierrez: pool cp3064 (text) using HAProxy as TLS terminator - T290005
  • 12:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3064.esams.wmnet with OS buster
  • 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:07 urbanecm@deploy1002: Synchronized docroot/: 4662224: Remove search.wikimedia.org files (T289224) (duration: 00m 56s)
  • 11:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/CentralAuth/includes/CentralAuthUser.php: 5fc6aaa: Fix "Mark entries as bot entries" feature(2/2; T296297) (duration: 00m 55s)
  • 10:57 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/CentralAuth/includes/Special/SpecialMultiLock.php: 5fc6aaa: Fix "Mark entries as bot entries" feature (1/2; T296297) (duration: 00m 56s)
  • 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d01652e: Disable Growth IP research survey (T294568) (duration: 00m 56s)
  • 10:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS buster
  • 10:45 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3064.esams.wmnet with OS buster
  • 10:02 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS buster
  • 10:01 vgutierrez: depool cp3064 to be reimaged as cache::text_haproxy - T290005
  • 09:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2041.codfw.wmnet with OS buster
  • 09:52 vgutierrez: pool cp2041 with HAProxy as TLS terminator - T290005
  • 09:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:34 moritzm: rolling restart of mediawiki canaries to pick up ICU security updates
  • 09:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: 3a89286: foundationwiki: Do not enable wmgUsePageViewInfo explicitly (duration: 00m 55s)
  • 09:32 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=foundationwiki 'inactive' # removing nonexistent group; backup left at P17888
  • 09:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 786313c: foundationwiki: Clear group add/remove declarations (duration: 00m 55s)
  • 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c3f47dc: foundationwiki: Disable hard redirects (duration: 00m 57s)
  • 08:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2041.codfw.wmnet with OS buster
  • 08:56 vgutierrez: depool cp2041 to be reimaged as cache::text_haproxy - T290005
  • 08:54 moritzm: installing ICU security updates on buster
  • 08:33 moritzm: installing bluez security updates
  • 08:26 moritzm: installing libvpx security updates
  • 08:19 moritzm: instaling libntlm security updates
  • 08:07 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Upgrade of mwparserfromhell - T296563 (duration: 07m 01s)
  • 08:00 marostegui: Restart db2078 and db1117
  • 08:00 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Upgrade of mwparserfromhell - T296563
  • 07:31 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - T296563 - (second attempt, no git update submodules the first time) (duration: 00m 04s)
  • 07:31 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - T296563 - (second attempt, no git update submodules the first time)
  • 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2014.codfw.wmnet with OS bullseye
  • 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2014.codfw.wmnet with OS bullseye

2021-11-28

  • 17:14 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - T296563 (duration: 02m 11s)
  • 17:12 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - T296563

2021-11-27

  • 19:55 andrew@deploy1002: Finished deploy [horizon/deploy@6115b3b]: network UI updates for T296548 (duration: 04m 14s)
  • 19:51 andrew@deploy1002: Started deploy [horizon/deploy@6115b3b]: network UI updates for T296548
  • 19:47 andrew@deploy1002: Finished deploy [horizon/deploy@6115b3b]: network UI tests in codfw1dev (duration: 02m 01s)
  • 19:45 andrew@deploy1002: Started deploy [horizon/deploy@6115b3b]: network UI tests in codfw1dev
  • 12:22 elukey: drop /var/tmp/core files from ores100[2,4] root partition full
  • 12:10 elukey: drop /var/tmp/core files from ores1009, root partition full
  • 11:55 elukey: disable coredumps for ORES celery units (will cause a roll restart of all celeries) - T296563
  • 11:46 elukey: drop ores coredumps from ores1008
  • 09:56 elukey: powercycle analytics1071, soft lockup stacktraces in the tty
  • 09:51 elukey: move ores coredump files from /var/cache/tmp to /srv/coredumps on ores100[6,7,8] and ores2003 to free space on the root partition

2021-11-26

  • 16:11 arnoldokoth: drain kubestage1002 node in prep for decommissioning
  • 16:05 arnoldokoth: drain kubestage1001 node in prep for decommissioning
  • 15:46 elukey: move /var/tmp/core/* to /srv/coredumps on ores1008 to free root space
  • 14:30 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 14:25 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 14:21 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 13:48 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 13:46 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 13:25 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:25 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:21 vgutierrez: restarting HAProxy on O:cache::upload_haproxy - T290005
  • 11:41 akosiaris: T296303 cleanup weird state of calico-codfw cluster
  • 11:41 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:41 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 11:39 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 11:25 vgutierrez: restarting HAProxy on O:cache::(text|upload)_haproxy - T290005
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after fixing users T296274', diff saved to https://phabricator.wikimedia.org/P17880 and previous config saved to /var/cache/conftool/dbconfig/20211126-102340-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T296274)', diff saved to https://phabricator.wikimedia.org/P17879 and previous config saved to /var/cache/conftool/dbconfig/20211126-101714-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance T296274
  • 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance T296274
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after fixing users T296274', diff saved to https://phabricator.wikimedia.org/P17878 and previous config saved to /var/cache/conftool/dbconfig/20211126-101423-ladsgroup.json
  • 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T296274)', diff saved to https://phabricator.wikimedia.org/P17877 and previous config saved to /var/cache/conftool/dbconfig/20211126-100547-ladsgroup.json
  • 10:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance T296274
  • 10:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance T296274
  • 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance T296143
  • 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance T296143
  • 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17876 and previous config saved to /var/cache/conftool/dbconfig/20211126-082834-ladsgroup.json
  • 08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17875 and previous config saved to /var/cache/conftool/dbconfig/20211126-081329-ladsgroup.json
  • 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17874 and previous config saved to /var/cache/conftool/dbconfig/20211126-075824-ladsgroup.json
  • 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17873 and previous config saved to /var/cache/conftool/dbconfig/20211126-074320-ladsgroup.json
  • 06:28 Amir1: killing extensions/MachineVision/maintenance/fetchSuggestions.php in mwmaint
  • 06:19 Amir1: killing lingering process from mwmaint to depooled db (db1160) that was depooled nine hours ago

2021-11-25

  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17872 and previous config saved to /var/cache/conftool/dbconfig/20211125-204357-ladsgroup.json
  • 20:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance T296143
  • 20:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance T296143
  • 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance T296143
  • 19:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance T296143
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 (T296143)', diff saved to https://phabricator.wikimedia.org/P17871 and previous config saved to /var/cache/conftool/dbconfig/20211125-192850-ladsgroup.json
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 (T296143)', diff saved to https://phabricator.wikimedia.org/P17870 and previous config saved to /var/cache/conftool/dbconfig/20211125-191345-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 (T296143)', diff saved to https://phabricator.wikimedia.org/P17869 and previous config saved to /var/cache/conftool/dbconfig/20211125-185841-ladsgroup.json
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 (T296143)', diff saved to https://phabricator.wikimedia.org/P17868 and previous config saved to /var/cache/conftool/dbconfig/20211125-184336-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T296143)', diff saved to https://phabricator.wikimedia.org/P17867 and previous config saved to /var/cache/conftool/dbconfig/20211125-172714-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1149.eqiad.wmnet with reason: Maintenance T296143
  • 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1149.eqiad.wmnet with reason: Maintenance T296143
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 (T296143)', diff saved to https://phabricator.wikimedia.org/P17866 and previous config saved to /var/cache/conftool/dbconfig/20211125-172707-ladsgroup.json
  • 17:12 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=inference
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 (T296143)', diff saved to https://phabricator.wikimedia.org/P17864 and previous config saved to /var/cache/conftool/dbconfig/20211125-171202-ladsgroup.json
  • 16:57 volans@deploy1002: Finished deploy [netbox/deploy@87a36a7]: Deploy v2.10.4-wmf6 (duration: 06m 59s)
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 (T296143)', diff saved to https://phabricator.wikimedia.org/P17863 and previous config saved to /var/cache/conftool/dbconfig/20211125-165657-ladsgroup.json
  • 16:50 volans@deploy1002: Started deploy [netbox/deploy@87a36a7]: Deploy v2.10.4-wmf6
  • 16:49 jynus@cumin1001: dbctl commit (dc=all): 'Fully repool db1163', diff saved to https://phabricator.wikimedia.org/P17862 and previous config saved to /var/cache/conftool/dbconfig/20211125-164941-jynus.json
  • 16:46 volans@deploy1002: Finished deploy [netbox/deploy@87a36a7]: Test v2.10.4-wmf6 on netbox-next (duration: 01m 04s)
  • 16:45 volans@deploy1002: Started deploy [netbox/deploy@87a36a7]: Test v2.10.4-wmf6 on netbox-next
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 (T296143)', diff saved to https://phabricator.wikimedia.org/P17861 and previous config saved to /var/cache/conftool/dbconfig/20211125-164153-ladsgroup.json
  • 16:18 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163++', diff saved to https://phabricator.wikimedia.org/P17860 and previous config saved to /var/cache/conftool/dbconfig/20211125-161833-jynus.json
  • 16:14 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163+', diff saved to https://phabricator.wikimedia.org/P17859 and previous config saved to /var/cache/conftool/dbconfig/20211125-161404-jynus.json
  • 16:10 klausman: restarting pybal on lvs2009 T289835
  • 15:57 vgutierrez: restarting pybal on lvs2010 - T289835
  • 15:55 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163', diff saved to https://phabricator.wikimedia.org/P17856 and previous config saved to /var/cache/conftool/dbconfig/20211125-155538-jynus.json
  • 15:47 jynus: reenable gtid on db1163
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T296143)', diff saved to https://phabricator.wikimedia.org/P17853 and previous config saved to /var/cache/conftool/dbconfig/20211125-152906-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1148.eqiad.wmnet with reason: Maintenance T296143
  • 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1148.eqiad.wmnet with reason: Maintenance T296143
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 (T296143)', diff saved to https://phabricator.wikimedia.org/P17852 and previous config saved to /var/cache/conftool/dbconfig/20211125-152858-ladsgroup.json
  • 15:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1001.eqiad.wmnet
  • 15:19 klausman@cumin1001: conftool action : set/pooled=yes:weight=1; selector: cluster=ml_serve,service=kubesvc
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 (T296143)', diff saved to https://phabricator.wikimedia.org/P17851 and previous config saved to /var/cache/conftool/dbconfig/20211125-151354-ladsgroup.json
  • 15:13 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping1001.eqiad.wmnet
  • 15:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3001.esams.wmnet
  • 15:05 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping3001.esams.wmnet
  • 15:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2001.codfw.wmnet
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 (T296143)', diff saved to https://phabricator.wikimedia.org/P17850 and previous config saved to /var/cache/conftool/dbconfig/20211125-145849-ladsgroup.json
  • 14:54 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping2001.codfw.wmnet
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 (T296143)', diff saved to https://phabricator.wikimedia.org/P17849 and previous config saved to /var/cache/conftool/dbconfig/20211125-144344-ladsgroup.json
  • 14:42 XioNoX: Update ping redirect to point to new ping VMs - T295767
  • 14:25 jayme: uncordoned kubestage1003.eqiad.wmnet kubestage1004.eqiad.wmnet - T293729
  • 14:17 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 14:16 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:12 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping1002.eqiad.wmnet
  • 13:32 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping1002.eqiad.wmnet
  • 13:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2002.codfw.wmnet
  • 13:28 Amir1: killing lingering process from mwmaint to depooled db1147
  • 13:20 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping2002.codfw.wmnet
  • 13:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping3002.esams.wmnet
  • 13:05 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping3002.esams.wmnet
  • 12:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase202[1-3].codfw.wmnet: Restarting for certificate updates - hnowlan@cumin1001
  • 12:14 arturo: update repo bullseye-wikimedia/thirdparty/ceph-octopus (T296175)
  • 12:14 jynus: disable temp. gtid on db1163
  • 12:11 jynus@cumin1001: dbctl commit (dc=all): 'Temp. depool db1163 fully', diff saved to https://phabricator.wikimedia.org/P17847 and previous config saved to /var/cache/conftool/dbconfig/20211125-121138-jynus.json
  • 12:04 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1163 load even more', diff saved to https://phabricator.wikimedia.org/P17846 and previous config saved to /var/cache/conftool/dbconfig/20211125-120435-jynus.json
  • 11:56 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase202[1-3].codfw.wmnet: Restarting for certificate updates - hnowlan@cumin1001
  • 11:56 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1163 load', diff saved to https://phabricator.wikimedia.org/P17845 and previous config saved to /var/cache/conftool/dbconfig/20211125-115602-jynus.json
  • 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T296143)', diff saved to https://phabricator.wikimedia.org/P17844 and previous config saved to /var/cache/conftool/dbconfig/20211125-110443-ladsgroup.json
  • 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1147.eqiad.wmnet with reason: Maintenance T296143
  • 11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1147.eqiad.wmnet with reason: Maintenance T296143
  • 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 (T296143)', diff saved to https://phabricator.wikimedia.org/P17843 and previous config saved to /var/cache/conftool/dbconfig/20211125-110435-ladsgroup.json
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 (T296143)', diff saved to https://phabricator.wikimedia.org/P17842 and previous config saved to /var/cache/conftool/dbconfig/20211125-104930-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 (T296143)', diff saved to https://phabricator.wikimedia.org/P17841 and previous config saved to /var/cache/conftool/dbconfig/20211125-103425-ladsgroup.json
  • 10:25 vgutierrez: rolling restart of varnish and HAProxy on cp2042.codfw.wmnet,cp1090.eqiad.wmnet,cp[5012].eqsin.wmnet,cp3065.esams.wmnet,cp[4026,4032].ulsfo.wmnet' to disable PROXY protocol - T290005
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 (T296143)', diff saved to https://phabricator.wikimedia.org/P17840 and previous config saved to /var/cache/conftool/dbconfig/20211125-101921-ladsgroup.json
  • 09:55 jelto@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(apertium|api-gateway|apple-search|blubberoid|citoid|cxserver|echostore|eventgate-analytics|eventgate-analytics-external|eventgate-logging-external|eventstreams|eventstreams-internal|linkrecommendation|mathoid|mobileapps|proton|push-notifications|recommendation-api|sessionstore|shellbox|shellbox-constraints|shellbox-media|shellbox-syntaxh
  • 09:45 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 09:43 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 09:39 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 09:37 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 09:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
  • 09:29 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
  • 09:27 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
  • 09:24 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
  • 09:23 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
  • 09:21 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 09:19 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 09:16 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 09:10 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 09:05 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 09:02 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:59 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:51 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:50 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T296143)', diff saved to https://phabricator.wikimedia.org/P17837 and previous config saved to /var/cache/conftool/dbconfig/20211125-084834-ladsgroup.json
  • 08:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T296143
  • 08:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T296143
  • 08:47 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T296143
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T296143
  • 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T296143
  • 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T296143
  • 08:43 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 08:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T296143
  • 08:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T296143
  • 08:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T296143
  • 08:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T296143
  • 08:40 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 08:40 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T296143
  • 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance T296143
  • 08:37 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
  • 08:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 08:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 08:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 08:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 08:28 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 08:28 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 08:25 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 08:25 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 08:22 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:22 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:21 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:18 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 08:17 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 08:14 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 08:13 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 08:09 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 08:08 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 08:05 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 08:03 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 08:02 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 08:00 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 07:57 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 07:56 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1128.eqiad.wmnet with OS bullseye
  • 07:51 jelto@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(echostore|sessionstore)
  • 07:49 marostegui: Stop mysql on db1133 to clone db1128 as a test host T295965
  • 07:49 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 07:48 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 07:47 jayme: elevated MediaWiki exceptions and fatals (from ~07:35) due to a mistake during re-deploy of eventgate-main
  • 07:45 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 07:35 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 07:32 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 07:32 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 07:29 elukey_: elukey@mwdebug2002:~$ sudo systemctl reset-failed ifup@ens5.service
  • 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1128.eqiad.wmnet with OS bullseye
  • 07:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance T296143
  • 07:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance T296143
  • 07:20 jelto@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=(apertium|api-gateway|apple-search|blubberoid|citoid|cxserver|echostore|eventgate-analytics|eventgate-analytics-external|eventgate-logging-external|eventstreams|eventstreams-internal|linkrecommendation|mathoid|mobileapps|proton|push-notifications|recommendation-api|sessionstore|shellbox|shellbox-constraints|shellbox-media|shellbox-syntax
  • 07:17 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 32 hosts with reason: helm3 de-deploy T251305
  • 07:17 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 32 hosts with reason: helm3 de-deploy T251305
  • 07:10 jelto: downtime PyBal backends health check on lvs1015 and lvs1016 for helm3 de-deploy T251305. I'm keeping an eye on icing and remove downtime as soon as I'm finished
  • 07:09 jelto: start re-deploy procedure in eqiad Kubernetes T251305
  • 06:31 marostegui: Restart tendril's DB
  • 05:51 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 04:45 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS (duration: 05m 27s)
  • 04:43 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.93` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
  • 04:40 ryankemper@deploy1002: Started deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS
  • 04:39 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 04:38 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 04:38 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 04:35 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@29c5cd7]: 0.3.93 (duration: 09m 23s)
  • 04:30 ryankemper: [Elastic] Cleaning up dangling apt packages: `ryankemper@cumin1001:~$ sudo cumin -b 4 'elastic*' 'sudo apt autoremove -y'`
  • 04:27 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.93` on canary `wdqs1003`; proceeding to rest of fleet
  • 04:25 ryankemper@deploy1002: Started deploy [wdqs/wdqs@29c5cd7]: 0.3.93
  • 04:25 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.93`. Pre-deploy tests passing on canary `wdqs1003`
  • 03:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2072.codfw.wmnet with OS buster
  • 02:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2072.codfw.wmnet with OS buster
  • 02:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2071.codfw.wmnet with OS buster
  • 02:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2070.codfw.wmnet with OS buster
  • 02:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2071.codfw.wmnet with OS buster
  • 01:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2070.codfw.wmnet with OS buster
  • 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2068.codfw.wmnet with OS buster
  • 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2067.codfw.wmnet with OS buster
  • 01:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2068.codfw.wmnet with OS buster
  • 01:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2067.codfw.wmnet with OS buster
  • 00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2066.codfw.wmnet with OS buster

2021-11-24

  • 23:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS buster
  • 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2065.codfw.wmnet with OS buster
  • 23:44 mutante: puppetmaster1001:~] $ sudo puppet cert sign gitlab-runner1001.eqiad.wmnet | sudo install_console gitlab-runner1001.eqiad.wmnet (T295481)
  • 23:26 mutante: ganeti - bringing up new VM - sudo gnt-instance start gitlab-runner1001.eqiad.wmnet ; ran puppet on install1003; installing OS T295481
  • 23:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2065.codfw.wmnet with OS buster
  • 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2064.codfw.wmnet with OS buster
  • 23:09 mutante: mwmaint1002 - sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size 1M -delete - to fix Icinga alert about large files in client bucket
  • 23:08 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.eqiad.wmnet
  • 23:03 mutante: wcqs1001 - sudo systemctl restart wcqs-blazegraph - after <+jinxer-wm> (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wcqs1001:9195 is burning free allocators
  • 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.eqiad.wmnet
  • 22:50 mutante: Creating a new Ganeti VM and wondering which row to put it? [ganeti1009:~] $ for row in A B C D; do echo "row ${row}: $(sudo gnt-instance list -o name -F "pnode.group == 'row_${row}'" | wc -l) VMs"; done
  • 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.wikimedia.org
  • 22:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2064.codfw.wmnet with OS buster
  • 22:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2063.codfw.wmnet with OS buster
  • 22:38 mutante: running decom cookbook on gitlab-runner1001.wikimedia.org VM which was in state "ADMIN_down" and not used yet. to make room to recreate it as gitlab-runner1001.eqiad.wmnet T295481
  • 22:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.wikimedia.org
  • 22:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2063.codfw.wmnet with OS buster
  • 22:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2062.codfw.wmnet with OS buster
  • 21:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:35 legoktm@deploy1002: Synchronized wmf-config/: Improve docs on $wmgUseGlobalAbuseFilters and sort list of wikis (duration: 00m 57s)
  • 21:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2062.codfw.wmnet with OS buster
  • 21:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2061.codfw.wmnet with OS buster
  • 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:54 legoktm@deploy1002: Synchronized wmf-config/: Update configuration related to disabling Score functionality (duration: 00m 57s)
  • 20:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2061.codfw.wmnet with OS buster
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 (T296143)', diff saved to https://phabricator.wikimedia.org/P17834 and previous config saved to /var/cache/conftool/dbconfig/20211124-194857-ladsgroup.json
  • 19:38 razzi: `sudo maintain-views --all-databases --replace-all` on clouddb1018 for T292594
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 (T296143)', diff saved to https://phabricator.wikimedia.org/P17833 and previous config saved to /var/cache/conftool/dbconfig/20211124-193352-ladsgroup.json
  • 19:19 razzi: run `maintain-views --all-databases --replace-all` on clouddb1013 for T292594
  • 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 (T296143)', diff saved to https://phabricator.wikimedia.org/P17832 and previous config saved to /var/cache/conftool/dbconfig/20211124-191847-ladsgroup.json
  • 19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 (T296143)', diff saved to https://phabricator.wikimedia.org/P17831 and previous config saved to /var/cache/conftool/dbconfig/20211124-190343-ladsgroup.json
  • 18:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir2002.codfw.wmnet
  • 18:51 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2002.codfw.wmnet
  • 18:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir2001.codfw.wmnet
  • 18:43 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2001.codfw.wmnet
  • 18:42 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM ncredir2001.codfw.wmnet
  • 18:42 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2001.codfw.wmnet
  • 18:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief-test2001.codfw.wmnet
  • 18:36 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief-test2001.codfw.wmnet
  • 18:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2001.codfw.wmnet
  • 18:30 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2001.codfw.wmnet
  • 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T296143)', diff saved to https://phabricator.wikimedia.org/P17830 and previous config saved to /var/cache/conftool/dbconfig/20211124-174723-ladsgroup.json
  • 17:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance T296143
  • 17:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance T296143
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 (T296143)', diff saved to https://phabricator.wikimedia.org/P17829 and previous config saved to /var/cache/conftool/dbconfig/20211124-174615-ladsgroup.json
  • 17:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: rdbms: Add full query to transaction profiler (T295706) (duration: 00m 56s)
  • 17:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:34 jhathaway@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=puppetboard
  • 17:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 (T296143)', diff saved to https://phabricator.wikimedia.org/P17828 and previous config saved to /var/cache/conftool/dbconfig/20211124-173110-ladsgroup.json
  • 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:23 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2016.codfw.wmnet
  • 17:22 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
  • 17:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:21 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2016.codfw.wmnet
  • 17:20 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM chartmuseum2001.codfw.wmnet
  • 17:20 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2015.codfw.wmnet
  • 17:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2015.codfw.wmnet
  • 17:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM chartmuseum2001.codfw.wmnet
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 (T296143)', diff saved to https://phabricator.wikimedia.org/P17827 and previous config saved to /var/cache/conftool/dbconfig/20211124-171604-ladsgroup.json
  • 17:11 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2006.codfw.wmnet
  • 17:11 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry2004.codfw.wmnet
  • 17:08 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
  • 17:06 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry2004.codfw.wmnet
  • 17:06 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2006.codfw.wmnet
  • 17:05 mforns@deploy1002: Finished deploy [analytics/refinery@6253399] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6253399] (duration: 06m 45s)
  • 17:05 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry2003.codfw.wmnet
  • 17:01 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry2003.codfw.wmnet
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 (T296143)', diff saved to https://phabricator.wikimedia.org/P17826 and previous config saved to /var/cache/conftool/dbconfig/20211124-170100-ladsgroup.json
  • 17:00 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2005.codfw.wmnet
  • 16:58 mforns@deploy1002: Started deploy [analytics/refinery@6253399] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6253399]
  • 16:58 mforns@deploy1002: Finished deploy [analytics/refinery@6253399] (thin): Regular analytics weekly train THIN [analytics/refinery@6253399] (duration: 00m 07s)
  • 16:58 mforns@deploy1002: Started deploy [analytics/refinery@6253399] (thin): Regular analytics weekly train THIN [analytics/refinery@6253399]
  • 16:58 mforns@deploy1002: Finished deploy [analytics/refinery@6253399]: Regular analytics weekly train [analytics/refinery@6253399] (duration: 32m 50s)
  • 16:56 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2005.codfw.wmnet
  • 16:50 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:44 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:43 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2005.codfw.wmnet
  • 16:43 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:42 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:42 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:41 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2005.codfw.wmnet
  • 16:41 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:40 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:38 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2006.codfw.wmnet
  • 16:36 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2002.codfw.wmnet
  • 16:36 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2006.codfw.wmnet
  • 16:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: rdbms: Make TransactionProfiler logs more useful (T295706) (duration: 00m 57s)
  • 16:33 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2002.codfw.wmnet
  • 16:33 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2004.codfw.wmnet
  • 16:33 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:33 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:31 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2003.codfw.wmnet
  • 16:31 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2004.codfw.wmnet
  • 16:29 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2003.codfw.wmnet
  • 16:25 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2001.codfw.wmnet
  • 16:25 mforns@deploy1002: Started deploy [analytics/refinery@6253399]: Regular analytics weekly train [analytics/refinery@6253399]
  • 16:23 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2001.codfw.wmnet
  • 16:23 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2001.codfw.wmnet
  • 16:21 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2001.codfw.wmnet
  • 16:19 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2002.codfw.wmnet
  • 16:16 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2002.codfw.wmnet
  • 16:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:13 Amir1: start of "foreachwikiindblist s3 migrateRevisionActorTemp.php --sleep=2" in mwmaint1002 in a screen. It will take a month or so (T275246)
  • 16:09 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:09 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:00 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:00 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:00 btullis: systemctl reset-failed ifup@ens5.service on schema2004 T273026
  • 15:48 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema2004.codfw.wmnet
  • 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T296143)', diff saved to https://phabricator.wikimedia.org/P17821 and previous config saved to /var/cache/conftool/dbconfig/20211124-154533-ladsgroup.json
  • 15:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1143.eqiad.wmnet with reason: Maintenance T296143
  • 15:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1143.eqiad.wmnet with reason: Maintenance T296143
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 (T296143)', diff saved to https://phabricator.wikimedia.org/P17820 and previous config saved to /var/cache/conftool/dbconfig/20211124-154236-ladsgroup.json
  • 15:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafkamon2002.codfw.wmnet
  • 15:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM schema2004.codfw.wmnet
  • 15:36 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema2003.codfw.wmnet
  • 15:36 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kafkamon2002.codfw.wmnet
  • 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc2001.wikimedia.org
  • 15:34 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM schema2003.codfw.wmnet
  • 15:32 papaul: reboot ms-be2058 for firmware upgrade
  • 15:31 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM irc2001.wikimedia.org
  • 15:30 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2001.codfw.wmnet
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 (T296143)', diff saved to https://phabricator.wikimedia.org/P17819 and previous config saved to /var/cache/conftool/dbconfig/20211124-152731-ladsgroup.json
  • 15:23 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2001.codfw.wmnet
  • 15:21 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dragonfly-supernode2001.codfw.wmnet
  • 15:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM dragonfly-supernode2001.codfw.wmnet
  • 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab2001.wikimedia.org
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 (T296143)', diff saved to https://phabricator.wikimedia.org/P17817 and previous config saved to /var/cache/conftool/dbconfig/20211124-151226-ladsgroup.json
  • 15:08 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:08 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM gitlab2001.wikimedia.org
  • 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow2001.codfw.wmnet
  • 15:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow2001.codfw.wmnet
  • 14:59 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2002.codfw.wmnet
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 (T296143)', diff saved to https://phabricator.wikimedia.org/P17815 and previous config saved to /var/cache/conftool/dbconfig/20211124-145721-ladsgroup.json
  • 14:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum2002.codfw.wmnet
  • 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM search-loader2001.codfw.wmnet
  • 14:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM search-loader2001.codfw.wmnet
  • 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2001.codfw.wmnet
  • 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:44 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum2001.codfw.wmnet
  • 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2002.wikimedia.org
  • 14:39 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2031.codfw.wmnet
  • 14:36 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2031.codfw.wmnet
  • 14:36 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh2002.wikimedia.org
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2001.wikimedia.org
  • 14:33 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:32 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:31 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2030.codfw.wmnet
  • 14:31 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:31 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
  • 14:30 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:30 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:28 godog: systemctl reset-failed ifup@ens5.service on logstash2024 T273026
  • 14:28 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:28 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:27 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:27 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp2001.wikimedia.org
  • 14:26 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2030.codfw.wmnet
  • 14:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp2001.wikimedia.org
  • 14:21 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2025.codfw.wmnet
  • 14:19 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:19 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:15 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2025.codfw.wmnet
  • 14:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2001.wikimedia.org
  • 14:10 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2024.codfw.wmnet
  • 14:06 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2001.wikimedia.org
  • 14:00 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2024.codfw.wmnet
  • 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM serpens.wikimedia.org
  • 13:55 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2023.codfw.wmnet
  • 13:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM serpens.wikimedia.org
  • 13:49 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2023.codfw.wmnet
  • 13:41 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2006.codfw.wmnet
  • 13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1142.eqiad.wmnet with reason: Maintenance T296143
  • 13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1142.eqiad.wmnet with reason: Maintenance T296143
  • 13:39 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2006.codfw.wmnet
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T296143)', diff saved to https://phabricator.wikimedia.org/P17813 and previous config saved to /var/cache/conftool/dbconfig/20211124-133809-ladsgroup.json
  • 13:37 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2005.codfw.wmnet
  • 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2006.wikimedia.org
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 (T296143)', diff saved to https://phabricator.wikimedia.org/P17812 and previous config saved to /var/cache/conftool/dbconfig/20211124-133628-ladsgroup.json
  • 13:36 XioNoX: add Jayme r/o user to all network devices
  • 13:35 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2005.codfw.wmnet
  • 13:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2006.wikimedia.org
  • 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2005.wikimedia.org
  • 13:30 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2004.codfw.wmnet
  • 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2005.wikimedia.org
  • 13:27 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2004.codfw.wmnet
  • 13:27 filippo@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM logstash2004.codfw.wmnet
  • 13:27 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2004.codfw.wmnet
  • 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-corp2001.wikimedia.org
  • 13:22 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-corp2001.wikimedia.org
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 (T296143)', diff saved to https://phabricator.wikimedia.org/P17811 and previous config saved to /var/cache/conftool/dbconfig/20211124-131519-ladsgroup.json
  • 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 (T296143)', diff saved to https://phabricator.wikimedia.org/P17810 and previous config saved to /var/cache/conftool/dbconfig/20211124-130200-ladsgroup.json
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM apt2001.wikimedia.org
  • 12:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM apt2001.wikimedia.org
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM grafana2001.codfw.wmnet
  • 12:51 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM grafana2001.codfw.wmnet
  • 12:48 jbond: enable puppet post puppetdb reboot
  • 12:48 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:47 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetdb2002.codfw.wmnet
  • 12:46 jelto@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(apertium|api-gateway|apple-search|blubberoid|citoid|cxserver|echostore|eventgate-analytics|eventgate-analytics-external|eventgate-logging-external|eventstreams|eventstreams-internal|linkrecommendation|mathoid|mobileapps|proton|push-notifications|recommendation-api|sessionstore|shellbox|shellbox-constraints|shellbox-media|shellbox-syntaxh
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 (T296143)', diff saved to https://phabricator.wikimedia.org/P17809 and previous config saved to /var/cache/conftool/dbconfig/20211124-124420-ladsgroup.json
  • 12:43 jbond@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM puppetdb2002.codfw.wmnet
  • 12:37 jbond: disable puppet for puppetdb reboot
  • 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader2002.wikimedia.org
  • 12:29 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 12:29 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader2002.wikimedia.org
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader2001.wikimedia.org
  • 12:25 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader2001.wikimedia.org
  • 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM releases2002.codfw.wmnet
  • 12:23 awight: EU scap deployment finished
  • 12:21 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM releases2002.codfw.wmnet
  • 12:21 awight@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Replace global with parent scope (duration: 00m 55s)
  • 12:16 awight@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [lint] fully-qualify classname (duration: 00m 55s)
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netboxdb2001.codfw.wmnet
  • 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netboxdb2001.codfw.wmnet
  • 12:10 awight@deploy1002: Synchronized wmf-config: Config: VisualEditor template dialog: new sidebar and inline descriptions (T284203, T286992) (duration: 00m 57s)
  • 12:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox2001.wikimedia.org
  • 12:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:03 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox2001.wikimedia.org
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox-dev2001.wikimedia.org
  • 12:02 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 12:01 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 11:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox-dev2001.wikimedia.org
  • 11:58 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki2002.codfw.wmnet
  • 11:56 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 11:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM rpki2002.codfw.wmnet
  • 11:53 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter2003.codfw.wmnet
  • 11:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
  • 11:49 moritzm: systemctl reset-failed ifup@ens5.service on poolcounter2003 T273026
  • 11:48 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
  • 11:45 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
  • 11:45 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
  • 11:44 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
  • 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter2003.codfw.wmnet
  • 11:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter2004.codfw.wmnet
  • 11:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 11:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 11:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter2004.codfw.wmnet
  • 11:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 11:35 godog: bounce apache2 on logstash1025
  • 11:35 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 11:32 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 11:27 Amir1: optimizing image.commonswiki in db1141 (T296143)
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T296143)', diff saved to https://phabricator.wikimedia.org/P17808 and previous config saved to /var/cache/conftool/dbconfig/20211124-112539-ladsgroup.json
  • 11:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1141.eqiad.wmnet with reason: Maintenance T296143
  • 11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1141.eqiad.wmnet with reason: Maintenance T296143
  • 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2004.codfw.wmnet
  • 11:23 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 11:21 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 11:19 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2004.codfw.wmnet
  • 11:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 11:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2003.codfw.wmnet
  • 11:15 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2003.codfw.wmnet
  • 11:13 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:08 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf2002.codfw.wmnet
  • 11:05 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 11:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf2002.codfw.wmnet
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf2001.codfw.wmnet
  • 10:53 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 10:52 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 10:51 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf2001.codfw.wmnet
  • 10:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 10:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM xhgui2001.codfw.wmnet
  • 10:48 XioNoX: rollback: disable ping-offload for codfw - T294119
  • 10:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM xhgui2001.codfw.wmnet
  • 10:47 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
  • 10:46 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
  • 10:44 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 10:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 10:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM people2002.codfw.wmnet
  • 10:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 10:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 10:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM people2002.codfw.wmnet
  • 10:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 10:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 10:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 10:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 10:33 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 10:33 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ping2001.codfw.wmnet
  • 10:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ping2001.codfw.wmnet
  • 10:27 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 10:25 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 10:25 XioNoX: disable ping-offload for codfw - T294119
  • 10:24 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 10:21 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:20 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:20 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 10:17 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:14 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:13 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:12 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 10:06 jelto: downtime PyBal backends health check for helm3 de-deploy T251305. I'm keeping an eye on icing and remove downtime as soon as I'm finished
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard2002.codfw.wmnet
  • 10:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard2002.codfw.wmnet
  • 10:02 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
  • 10:02 vgutierrez: repool cp5006 - T290005
  • 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard2001.codfw.wmnet
  • 10:00 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
  • 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard2001.codfw.wmnet
  • 09:58 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM debmonitor2002.codfw.wmnet
  • 09:56 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:56 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
  • 09:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM debmonitor2002.codfw.wmnet
  • 09:54 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:53 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
  • 09:53 vgutierrez: restart varnish/haproxy on cp5006 - T290005
  • 09:53 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
  • 09:52 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install2003.wikimedia.org
  • 09:49 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
  • 09:46 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
  • 09:46 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install2003.wikimedia.org
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mx2001.wikimedia.org
  • 09:45 vgutierrez: depool cp5006 - T290005
  • 09:43 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
  • 09:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mx2001.wikimedia.org
  • 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM planet2002.codfw.wmnet
  • 09:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM planet2002.codfw.wmnet
  • 09:30 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=apple-search,name=eqiad
  • 09:24 jelto@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=(apertium|api-gateway|blubberoid|citoid|cxserver|echostore|eventgate-analytics|eventgate-analytics-external|eventgate-logging-external|eventstreams|eventstreams-internal|linkrecommendation|mathoid|mobileapps|proton|push-notifications|recommendation-api|sessionstore|shellbox|shellbox-constraints|shellbox-media|shellbox-syntaxhighlight|she
  • 09:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM failoid2002.codfw.wmnet
  • 09:20 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM failoid2002.codfw.wmnet
  • 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
  • 09:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
  • 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on zotero.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on zotero.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wikifeeds.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on wikifeeds.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on termbox.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on termbox.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on tegola-vector-tiles.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on tegola-vector-tiles.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on similar-users.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on similar-users.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-timeline.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-timeline.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-syntaxhighlight.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-syntaxhighlight.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-media.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-media.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-constraints.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-constraints.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on sessionstore.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on sessionstore.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on recommendation-api.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on recommendation-api.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on push-notifications.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on push-notifications.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on proton.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on proton.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mobileapps.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mobileapps.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mathoid.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mathoid.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on linkrecommendation.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on linkrecommendation.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventstreams-internal.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventstreams-internal.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventstreams.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventstreams.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-main.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-main.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-logging-external.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-logging-external.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-analytics-external.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-analytics-external.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-analytics.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-analytics.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on echostore.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on echostore.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cxserver.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on cxserver.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on citoid.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on citoid.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on blubberoid.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on blubberoid.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on apple-search.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on apple-search.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on api-gateway.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on api-gateway.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on apertium.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on apertium.svc.codfw.wmnet with reason: helm3 de-deploy T251305
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM deneb.codfw.wmnet
  • 09:08 _joe_: switching search.wikimedia.org to be served by the apple-search servcie
  • 09:04 jelto: start re-deploy procedure in codfw Kubernetes T251305
  • 09:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM deneb.codfw.wmnet
  • 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:56 _joe_: repooling cp2027
  • 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:55 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 08:51 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set actor migration to write both on all wikis (T275246) (duration: 00m 57s)
  • 08:51 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 08:41 vgutierrez: depool cp2027
  • 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
  • 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
  • 07:23 elukey: reboot kubernetes1018 (role::insetup) to verify negotiated speed of eth interface
  • 07:12 elukey: drop /tmp/blockmgr-20fe4b2b-31fb-4a85-b5b1-bebe254120f8 and other blockmgr-* dirs on stat1006 to free space on the root partition
  • 06:47 Amir1: running optimize table with replication on db1155:3314 (T296143)
  • 06:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance (T296143)
  • 06:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance (T296143)
  • 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After optimize table (T296143)', diff saved to https://phabricator.wikimedia.org/P17807 and previous config saved to /var/cache/conftool/dbconfig/20211124-063228-root.json
  • 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After optimize table (T296143)', diff saved to https://phabricator.wikimedia.org/P17806 and previous config saved to /var/cache/conftool/dbconfig/20211124-061725-root.json
  • 06:05 marostegui: Upgrade db1128's kernel T288720
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After optimize table (T296143)', diff saved to https://phabricator.wikimedia.org/P17805 and previous config saved to /var/cache/conftool/dbconfig/20211124-060221-root.json
  • 05:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: After optimize table (T296143)', diff saved to https://phabricator.wikimedia.org/P17804 and previous config saved to /var/cache/conftool/dbconfig/20211124-054718-root.json
  • 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2012.codfw.wmnet with OS buster

2021-11-23

  • 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2012.codfw.wmnet with OS buster
  • 23:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2011.codfw.wmnet with OS buster
  • 23:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2011.codfw.wmnet with OS buster
  • 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2010.codfw.wmnet with OS buster
  • 22:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2010.codfw.wmnet with OS buster
  • 22:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2009.codfw.wmnet with OS buster
  • 21:58 tgr: UTC evening deploys done
  • 21:57 tgr@deploy1002: Finished scap: (no justification provided) (duration: 10m 03s)
  • 21:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS buster
  • 21:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2009.codfw.wmnet with OS buster
  • 21:53 krinkle@deploy1002: Finished deploy [integration/docroot@a3435a7]: (no justification provided) (duration: 00m 07s)
  • 21:53 krinkle@deploy1002: Started deploy [integration/docroot@a3435a7]: (no justification provided)
  • 21:47 tgr@deploy1002: Started scap: (no justification provided)
  • 21:47 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: Add Image: Validate GEInfoboxTemplates size (T294518) (duration: 00m 56s)
  • 21:39 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/Api/ApiQueryGrowthTasks.php: Backport: Structured task caching/filtering cherry-picks step 3 (duration: 00m 55s)
  • 21:35 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: Structured task caching/filtering cherry-picks step 2 (duration: 00m 57s)
  • 21:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS buster
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:04 legoktm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/Echo/: re-enable cross-wiki notifications by default (T296270) (duration: 00m 57s)
  • 19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:52 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:51 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: 7d5f779: Structured task caching/filtering cherry-picks, step 1 (duration: 00m 56s)
  • 19:42 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: c26e407: GrowthExperiments backports (duration: 01m 03s)
  • 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bf82bfb: Add new icons, wordmarks & taglines for several wikis (T290091; 2/2) (duration: 00m 56s)
  • 19:17 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: bf82bfb: Add new icons, wordmarks & taglines for several wikis (T290091; 1/2) (duration: 00m 56s)
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3993aac: Increase reading depth sampling rate to .1% (T294777) (duration: 00m 57s)
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:29 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 18:25 ejegg: updated SmashPig standalone (IPN listener) from be68299b -> 211f8e65
  • 18:18 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 18:18 cmjohnson1: upgrading msw-c1-eqiad T259758
  • 18:04 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 18:01 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 18:00 moritzm: systemctl reset-failed ifup@ens5.service on durum2001 T273026
  • 17:59 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 17:55 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2002.codfw.wmnet
  • 17:49 mutante: miscweb1002 - rm -rf /srv/deployments/scholarships (T243037)
  • 17:47 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum2002.codfw.wmnet
  • 17:42 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2001.codfw.wmnet
  • 17:35 ebernhardson: T295478 start snapshot of commonswiki_file from cirrus codfw -> swift eqiad
  • 17:34 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum2001.codfw.wmnet
  • 17:33 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2002.wikimedia.org
  • 17:31 cmjohnson1: upgrading msw's in row D eqiad T259758
  • 17:28 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2002.wikimedia.org
  • 17:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2012.codfw.wmnet with OS stretch
  • 17:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2002.codfw.wmnet with reason: apply new KVM machine settings
  • 17:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum2002.codfw.wmnet with reason: apply new KVM machine settings
  • 17:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2001.codfw.wmnet with reason: apply new KVM machine settings
  • 17:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum2001.codfw.wmnet with reason: apply new KVM machine settings
  • 17:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on doh2002.wikimedia.org with reason: apply new KVM machine settings
  • 17:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on doh2002.wikimedia.org with reason: apply new KVM machine settings
  • 17:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on doh2001.wikimedia.org with reason: apply new KVM machine settings
  • 17:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on doh2001.wikimedia.org with reason: apply new KVM machine settings
  • 17:15 sukhe@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh2001.wikimedia.org
  • 17:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug2002.codfw.wmnet
  • 17:14 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
  • 17:14 sukhe@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh2001.wikimedia.org
  • 17:11 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
  • 17:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug2002.codfw.wmnet
  • 17:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug2001.codfw.wmnet
  • 17:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug2001.codfw.wmnet
  • 16:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM miscweb2002.codfw.wmnet
  • 16:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM miscweb2002.codfw.wmnet
  • 16:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc2001.codfw.wmnet
  • 16:53 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc2001.codfw.wmnet
  • 16:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2012.codfw.wmnet with OS stretch
  • 16:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2001.codfw.wmnet
  • 16:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2001.codfw.wmnet
  • 16:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
  • 16:39 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
  • 16:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2011.codfw.wmnet with OS stretch
  • 16:13 cmjohnson1: updating mgmt switches in row C, racks C2-C8 eqiad T259758
  • 15:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2011.codfw.wmnet with OS stretch
  • 15:46 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 15:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2010.codfw.wmnet with OS stretch
  • 15:41 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 15:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 15:27 Emperor: rolling restart of thanos frontends T294380
  • 15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2010.codfw.wmnet with OS stretch
  • 14:40 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
  • 14:34 jbond@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=puppetboard
  • 14:30 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
  • 14:09 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 14:09 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 14:03 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 14:03 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 14:00 marostegui: Failover m5 from db1128 to db1132 - T288720
  • 14:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2006.codfw.wmnet with OS bullseye
  • 13:50 godog: powercycle (again) ms-be2058
  • 13:48 godog: add 80G to prometheus global in eqiad
  • 13:31 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host prometheus2006.codfw.wmnet with OS bullseye
  • 13:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2005.codfw.wmnet with OS bullseye
  • 13:01 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
  • 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching O:aqs: restarting to pick up new JRE - btullis@cumin1001
  • 12:52 aborrero@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudbackup1002-dev.eqiad.wmnet
  • 12:46 Lucas_WMDE: UTC morning backport+config window done
  • 12:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:43 aborrero@cumin1001: START - Cookbook sre.ganeti.makevm for new host cloudbackup1002-dev.eqiad.wmnet
  • 12:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Set up beta test environment for QuickSurveys (T293798) (beta only) (duration: 00m 55s)
  • 12:31 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 12:29 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/modules/page/ext.proofreadpage.page.edit.js: Backport: OSD: Handle cases where the image srcset attr is not set (T296260) (duration: 00m 56s)
  • 12:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:27 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 12:26 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/modules/page/ext.proofreadpage.page.edit.js: Backport: OSD: Add a ready hook for scripts (T180569) (duration: 00m 56s)
  • 12:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:21 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 12:12 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 12:09 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
  • 12:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
  • 11:54 btullis@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching O:aqs: restarting to pick up new JRE - btullis@cumin1001
  • 11:51 btullis@cumin1001: END (ERROR) - Cookbook sre.aqs.roll-restart (exit_code=97) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 11:51 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2002.codfw.wmnet
  • 11:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2002.codfw.wmnet
  • 11:25 godog: powercycle ms-be2058 - down and nothign on console
  • 11:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5012.eqsin.wmnet with OS buster
  • 11:15 vgutierrez: pool cp5012 (text) using HAProxy as TLS terminator - T290005
  • 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:08 Amir1: start of mwscript migrateRevisionActorTemp.php --wiki=testwiki --sleep=5 (T275246)
  • 11:05 jayme: cordoned kubestage1003.eqiad.wmnet kubestage1004.eqiad.wmnet (we have issues with POD IP prefix allocation) - T293729
  • 11:05 jayme: uncordoned kubestage1001.eqiad.wmnet kubestage1002.eqiad.wmnet (we have issues with POD IP prefix allocation) - T293729
  • 11:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:02 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set test wikis to write both for actor temp table migration (T275246) (duration: 00m 56s)
  • 10:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance T296143
  • 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance T296143
  • 10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1155.eqiad.wmnet with reason: Maintenance T296143
  • 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1155.eqiad.wmnet with reason: Maintenance T296143
  • 10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maintenance T296143
  • 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maintenance T296143
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T296143)', diff saved to https://phabricator.wikimedia.org/P17800 and previous config saved to /var/cache/conftool/dbconfig/20211123-102234-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1121.eqiad.wmnet with reason: Maintenance T296143
  • 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1121.eqiad.wmnet with reason: Maintenance T296143
  • 10:19 urbanecm@deploy1002: Finished scap: c98acaa: Backport localisation updates (duration: 11m 06s)
  • 10:19 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:08 urbanecm@deploy1002: Started scap: c98acaa: Backport localisation updates
  • 10:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5012.eqsin.wmnet with OS buster
  • 10:01 vgutierrez: depool cp5012 to be reimaged as cache::text_haproxy - T290005
  • 09:57 jayme: cordoned kubestage1001.eqiad.wmnet kubestage1002.eqiad.wmnet - T293729
  • 09:52 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1124.eqiad.wmnet with OS bullseye
  • 09:27 Amir1: dropping useless GRANTs on s6 eqiad replicas without replication (T296274)
  • 09:16 Amir1: dropping useless GRANTs on s6 eqiad master without replication (T296274)
  • 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1124.eqiad.wmnet with OS bullseye
  • 09:05 Amir1: fixing incorrect grants of wikiadmin on localhost in s6 master in codfw with replication
  • 07:52 topranks: Adjusting BGP on cr1-eqiad and cr2-eqiad to keep MED unchanged in iBGP.
  • 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
  • 06:41 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
  • 05:29 ryankemper: T295705 Downtimed `elastic2044` for one hour and doing a full reboot for good measure. Already ran the plugin upgrade: `DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install elasticsearch-oss wmf-elasticsearch-search-plugins`
  • 05:26 ryankemper: T295705 Rolling restart of `codfw` complete. `elastic2044` was manually restarted earlier today so the cookbook didn't restart it (b/c we pass in a datetime cutoff threshold) so I'm manually upgrading and restarting that host
  • 05:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 04:17 ryankemper: T295705 Properly disabled the sane-itizer; we don't want it running until after we (a) complete rolling restarts and (b) restore the missing `commonswikI_file` index (which is blocked on the restarts)
  • 03:42 Amir1: ladsgroup@mwmaint1002:~$ cat broken_imgs | xargs -I {} mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --verbose --mime 'image/*' --force --batch-size 1 --sleep 1 --start={} --end={} (T296001)
  • 03:37 Amir1: rebuilding metadata of all djvu files outside of commons (T296001)
  • 03:06 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 02:58 ryankemper: T295705 `elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.codfw.wmnet', port=9243): Read timed out. (read timeout=60))` Probably transient failure; will wait 10 mins and try again
  • 02:57 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 02:55 ryankemper: T295705 `ryankemper@cumin1001:~$ sudo cookbook sre.elasticsearch.rolling-operation codfw "codfw plugin upgrade + restart" --upgrade --nodes-per-run 2 --start-datetime 2021-11-18T18:55:54 --task-id T295705` on tmux `rolling_restarts_codfw`
  • 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:17 urbanecm: UTC late window done
  • 01:17 urbanecm@deploy1002: Finished scap: 69aa4a7: 7c0e074: Revert "Create redirect Special Pages for delete and protect action" (T295611; T296203; 4/4) (duration: 25m 50s)
  • 01:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:51 urbanecm@deploy1002: Started scap: 69aa4a7: 7c0e074: Revert "Create redirect Special Pages for delete and protect action" (T295611; T296203; 4/4)
  • 00:50 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/autoload.php: 7c0e074: Revert "Create redirect Special Pages for delete and protect action" (T295611; T296203; 3/4) (duration: 00m 55s)
  • 00:49 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/includes/specials/: 7c0e074: Revert "Create redirect Special Pages for delete and protect action" (T295611; T296203; 2/4) (duration: 00m 55s)
  • 00:48 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/includes/specialpage/SpecialPageFactory.php: 7c0e074: Revert "Create redirect Special Pages for delete and protect action" (T295611; T296203; 1/4) (duration: 00m 56s)
  • 00:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b920943: Enable reading depth instrumentation at low sampling rate (T294777) (duration: 00m 56s)
  • 00:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:30 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents: 3f860c7: fa9fbf1: WikimediaEvents bbackports (2/2; T294777) (duration: 00m 55s)
  • 00:28 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/extension.json: 3f860c7: Restore ReadingDepth instrument (1/2) (duration: 00m 56s)
  • 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:20 jeena: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/739908

2021-11-22

  • 23:55 mutante: acmechief1001, acmechief-test1001: sudo systemctl restart reload-acme-chief-backend.timer
  • 23:54 mutante: acmechief1001, acmechief-test1001: sudo systemctl start reload-acme-chief-backend.timer
  • 23:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe2011.codfw.wmnet with OS stretch
  • 23:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe2010.codfw.wmnet with OS stretch
  • 23:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2011.codfw.wmnet with OS stretch
  • 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2009.codfw.wmnet with OS stretch
  • 22:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2010.codfw.wmnet with OS stretch
  • 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2009.codfw.wmnet with OS stretch
  • 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2028.codfw.wmnet with OS buster
  • 21:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2028.codfw.wmnet with OS buster
  • 21:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2027.codfw.wmnet with OS buster
  • 21:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2027.codfw.wmnet with OS buster
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:23 legoktm@deploy1002: Synchronized wmf-config/PoolCounterSettings.php: Lower CirrusSearch maxqueues to be closer to number of workers (duration: 00m 56s)
  • 20:01 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 19:49 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:46 urbanecm: Evening B&C window completed
  • 19:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:44 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/: 10b8440: Use the WikiEditor ready hook instead of using() the lib (T296033) (duration: 00m 56s)
  • 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b6b05e3: kswiki: set wgTranslateNumerals to false (T296055) (duration: 00m 55s)
  • 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4aa8d5b: Enable SandboxLink on lawiki (T296073) (duration: 00m 56s)
  • 19:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1c082be: Enable mapframe on the Indonesian Wikipedia (T295571) (duration: 00m 56s)
  • 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:05 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 19:01 vgutierrez: pool cp4032 (text) using HAProxy as TLS terminator - T290005
  • 18:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001
  • 17:50 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:48 XioNoX: repool codfw
  • 17:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4032.ulsfo.wmnet with OS buster
  • 17:46 ejegg: updated fundraising python tools from d90f4c91 -> d1d7b100
  • 17:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:32 ebernhardson: restart both elasticsearch instances on elastic2044, reporting `connection refused` (after a brief period of `no route to host`) to masters even though the connection works outside elastic
  • 17:01 ryankemper: T295705 Beginning rolling restart w/ plugin upgrade of `cloudelastic`: `ryankemper@cumin1001:~$ sudo cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic plugin upgrade + restart" --upgrade --nodes-per-run 3 --start-datetime 2021-11-22T16:59:38 --task-id T295705` on tmux `rolling_restarts_cloudelastic`
  • 17:00 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001
  • 16:58 ryankemper: [Elastic] T295705 Rolling restart w/ plugin upgrade of `relforge` is complete
  • 16:55 ryankemper: [Elastic] T295705 Restarting second and final relforge host: `ryankemper@relforge1003:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service logstash.service`
  • 16:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4032.ulsfo.wmnet with OS buster
  • 16:52 ryankemper: [Elastic] T295705 Restarting first relforge host: `ryankemper@relforge1004:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service logstash.service`
  • 16:51 jayme: fleet wide updated wmf-certificates to 0~20211122-1
  • 16:50 vgutierrez: depol cp4032 to be reimaged as cache::text_haproxy - T290005
  • 16:49 ryankemper: [Elastic] T295705 Downtimed relforge* for 2 hours in order to performing a manual rolling restart of the two hosts `relforge1003` and `relforge1004`
  • 16:44 ryankemper: T295705 Upgrading `relforge` elasticsearch packages: `ryankemper@cumin1001:~$ sudo cumin -b 2 'relforge*' 'DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install elasticsearch-oss wmf-elasticsearch-search-plugins'`
  • 16:39 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 16:15 urbanecm: Password reset for Miraki@arbcom_dewiki per private request
  • 16:15 moritzm: installing postgresql-13 security updates on bullseye
  • 15:56 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:55 XioNoX: Telia DDoS auto-mitigation enabled on all circuits - T288926
  • 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:28 Amir1: revoking DROP for wikiadmin from db1100 (T249683)
  • 15:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2006.codfw.wmnet with OS bullseye
  • 15:17 moritzm: set kvm:machine_version=pc-i440fx-2.8 for Ganeti cluster in codfw T294119
  • 15:16 jayme: imported wmf-certificates 0~20211122-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
  • 15:13 _joe_: restarting pybal low-traffic in codfw, eqiad
  • 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:58 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.wikimedia.org
  • 14:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable DPL on opt-in wikis where not in use (T287916) (duration: 00m 56s)
  • 14:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2006.codfw.wmnet with OS bullseye
  • 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:51 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable DPL on Wikiversities where not in use (T287916) (duration: 00m 56s)
  • 14:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable DPL on Wikisources where not in use (T287916) (duration: 00m 56s)
  • 14:44 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.wikimedia.org
  • 14:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:23 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 14:06 akosiaris: repool wtp1025, wtp1041 to parsoid cluster. T296098
  • 14:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=parsoid,name=wtp1041.eqiad.wmnet
  • 14:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=parsoid,name=wtp1025.eqiad.wmnet
  • 13:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2005.codfw.wmnet with OS bullseye
  • 13:32 XioNoX: re-enable pybal on lvs2007 - T295118
  • 13:31 XioNoX: re-enable puppet on lvs2007
  • 13:30 XioNoX: re-enabling V6 between cr2-codfw and asw-b-codfw - T295118
  • 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
  • 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.9
  • 13:04 XioNoX: asw-b-codfw# set virtual-chassis member 7 mastership-priority 255 - T295118
  • 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:51 Lucas_WMDE: UTC morning backport+config window done
  • 12:51 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/includes/ProofreadPageLuaLibrary.php: Backport: Lua: use LinkBatch to speed up the template dependencies (T296092) (2/2) (duration: 01m 03s)
  • 12:49 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/includes/Pagination/Pagination.php: Backport: Lua: use LinkBatch to speed up the template dependencies (T296092) (1/2) (duration: 01m 04s)
  • 12:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:47 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/ProofreadPage/includes/ProofreadPageLuaLibrary.php: Backport: Lua: use LinkBatch to speed up the template dependencies (T296092) (2/2) (duration: 01m 03s)
  • 12:45 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/ProofreadPage/includes/Pagination/Pagination.php: Backport: Lua: use LinkBatch to speed up the template dependencies (T296092) (1/2) (duration: 01m 04s)
  • 12:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:19 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: 1.37.0 is out now, so there's no beta T289585 (duration: 01m 04s)
  • 12:11 hashar@deploy1002: Synchronized php-1.38.0-wmf.9/skins/MinervaNeue: Fix banners to show CentralNotice - T296077 (duration: 01m 04s)
  • 11:50 moritzm: installing qemu security updates on bullseye
  • 11:46 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:43 moritzm: installing krb5 security updates on stretch
  • 11:41 oblivian@cumin1001: START - Cookbook sre.dns.netbox
  • 11:39 oblivian@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:36 oblivian@cumin1001: START - Cookbook sre.dns.netbox
  • 11:34 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti-test2003.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
  • 11:26 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2003.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
  • 11:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 11:20 XioNoX: re-enable LibertyGlobal in esams
  • 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
  • 11:12 XioNoX: Revert "prepend_as_out for esams/knams"
  • 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2003.codfw.wmnet with OS buster
  • 10:54 elukey: apt-get purge up to linux-image-4.9.0-14-amd64 on sodium to free /boot space
  • 10:49 elukey: `apt-get remove linux-image-4.9.0-5-amd64 linux-image-4.9.0-6-amd64` on sodium to free /boot
  • 10:45 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
  • 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2003.codfw.wmnet with OS buster
  • 10:25 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1125.eqiad.wmnet with OS bullseye
  • 10:16 jbond: restart snmp gracefully cr2-eqord
  • 10:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
  • 09:45 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 09:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 09:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1125.eqiad.wmnet with OS bullseye
  • 09:35 moritzm: installing Linux 4.9.272 updates on Stretch hosts
  • 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:24 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 09:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 24b3a77: Growth: Disable filtering by unstarred mentees at arwiki, enwiki, fawiki (T293182) (duration: 01m 04s)
  • 09:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
  • 09:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:08 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1125.eqiad.wmnet with OS bullseye
  • 09:05 moritzm: installing 4.19.208-1 kernels on Stretch hosts with 4.19 kernels
  • 09:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:49 moritzm: drain ganeti-test2003 for forthcoming reimage
  • 08:44 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
  • 08:44 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: 4418c43: ApiSetMentorStatus: Use READ_LATEST to request back timestamp (T295305) (duration: 01m 08s)
  • 08:42 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 08:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17793 and previous config saved to /var/cache/conftool/dbconfig/20211122-082525-root.json
  • 08:15 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17792 and previous config saved to /var/cache/conftool/dbconfig/20211122-081022-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17791 and previous config saved to /var/cache/conftool/dbconfig/20211122-075518-root.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 40%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17790 and previous config saved to /var/cache/conftool/dbconfig/20211122-074015-root.json
  • 07:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance T296143
  • 07:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance T296143
  • 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance T296143
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance T296143
  • 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance T296143
  • 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance T296143
  • 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance T296143
  • 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance T296143
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17789 and previous config saved to /var/cache/conftool/dbconfig/20211122-072511-root.json
  • 07:17 Amir1: running optimize table on image table in commonswiki on codfw with replication enabled, it'll cause replication lag (T296143)
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 20%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17788 and previous config saved to /var/cache/conftool/dbconfig/20211122-071006-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17787 and previous config saved to /var/cache/conftool/dbconfig/20211122-065502-root.json
  • 06:46 marostegui: Revoke dump grants for scholarships database T296166
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17786 and previous config saved to /var/cache/conftool/dbconfig/20211122-063959-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17785 and previous config saved to /var/cache/conftool/dbconfig/20211122-062455-root.json
  • 03:30 Amir1: run optimize table on db2140 for image table (T296143)

2021-11-21

  • 13:17 dcausse: restarting blazegraph on wdqs1007 (jvm stuck for 10h)
  • 07:26 XioNoX: cr1-eqiad# deactivate protocols bgp group Confed_eqord
  • 05:22 Amir1: running clean up of djvu files in all wikis (T275268)
  • 05:13 Amir1: end of djvu metadata maint script run (T275268)

2021-11-20

  • 01:02 mutante: lists1001 - restarted apache, icinga alerts for the web UI, but recovered
  • 00:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 00:26 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 00:25 bblack: lvs3005 - re-enabling puppet + pybal
  • 00:25 legoktm@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 00:25 legoktm@cumin1001: START - Cookbook sre.network.cf
  • 00:24 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 00:23 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 00:06 bblack: lvs3005 - disabling puppet and stopping pybal (traffic will go to lvs3007)

2021-11-19

  • 23:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2005.codfw.wmnet with OS bullseye
  • 23:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
  • 23:24 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host prometheus2005.codfw.wmnet with OS bullseye
  • 23:15 mutante: LDAP - added mmartorana to wmf (91354e9e-5706-4289-9a60-98e8a7632853) T295789
  • 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
  • 20:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2018.codfw.wmnet with OS stretch
  • 20:21 mutante: phabricator - adding eigyan to WMF-NDA (phab projectt 61 - https://phabricator.wikimedia.org/project/members/61/ ) - since that is now standard when adding people to the wmf LDAP group (T295928)
  • 20:20 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor2002.codfw.wmnet
  • 20:05 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor2002.codfw.wmnet
  • 20:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2280.codfw.wmnet
  • 19:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2018.codfw.wmnet with OS stretch
  • 19:51 mutante: shutting down undead server mw2280 - not icinga and puppetdb but in debmonitor and still has IP and puppet cert
  • 19:45 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2280.codfw.wmnet
  • 18:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 18:10 andrew@deploy1002: Finished deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone (duration: 04m 19s)
  • 18:06 andrew@deploy1002: Started deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone
  • 17:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:41 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:25 andrew@deploy1002: Finished deploy [horizon/deploy@ee83e27]: fixing sudo rule editing (duration: 04m 10s)
  • 17:21 andrew@deploy1002: Started deploy [horizon/deploy@ee83e27]: fixing sudo rule editing
  • 17:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:42 thcipriani@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.9 refs T293950 T296098"
  • 16:35 thcipriani: rolling back to group0 for T296098
  • 16:20 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 15:31 akosiaris: roll restart wtp10* php7.2-fpm excluding wtp1025, wtp1041
  • 15:29 akosiaris: depooling wtp1041, wtp1025 from traffic. The entire of the parsoid cluster is in a memory pressure situation, it looks like a rolling restart of php-fpm will alleviate the pressure and gives us some time to drill more on the problem before the pressure builds up again.
  • 15:28 akosiaris@cumin1001: conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1025.eqiad.wmnet
  • 15:28 akosiaris@cumin1001: conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1041.eqiad.wmnet
  • 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
  • 14:49 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
  • 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 14:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2001.codfw.wmnet with OS buster
  • 14:15 jayme: fleet wide updated wmf-certificates to 0~20211119-1
  • 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2001.codfw.wmnet with OS buster
  • 13:23 moritzm: draining instances from ganeti-test2001 for reimage T284811
  • 13:02 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 12:10 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 12:06 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 11:54 hnowlan: roll-restarting cassandra on eqiad maps for java updates
  • 11:36 jayme: imported wmf-certificates 0~20211119-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
  • 09:53 XioNoX: run `commit full` on asw-b-codfw - T295118
  • 09:30 XioNoX: re-enable cr2-codfw<->asw-b7-codfw link after disabling inet6 on cr2-codfw:ae2 - T295118
  • 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 08:46 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 08:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: update wmf-netbox - ayounsi@cumin1001
  • 08:29 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: update wmf-netbox - ayounsi@cumin1001
  • 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:26 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes: Backport: Revert "Title: use PageStore instead of LinkCache" (duration: 01m 03s)
  • 08:23 ayounsi@deploy1002: Finished deploy [homer/deploy@dc007aa]: Homer CR738905 (duration: 01m 25s)
  • 08:22 ayounsi@deploy1002: Started deploy [homer/deploy@dc007aa]: Homer CR738905
  • 08:17 moritzm: installing mariadb-10.5 security updates on bullseye (as packaged in Debian, not the wmf-internal packages)
  • 06:55 marostegui: Reboot db1132 to pick up new kernel T288720
  • 06:23 marostegui: Upgrade clouddb1019
  • 05:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 04:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 04:55 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/media/DjVuImage.php: Backport: media: Store metadata of one-page documents correctly (T296001) (duration: 00m 56s)
  • 02:54 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/modules: Backport: Lazy-load structured task JS files (T296049) (duration: 00m 55s)
  • 02:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 02:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
  • 02:02 mutante: [puppetmaster1001:/var/run/confd-template] $ sudo rm .git-ssh*.err
  • 02:01 mutante: [puppetmaster2001:/var/run/confd-template] $ sudo rm .git-ssh*.err
  • 01:57 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor2001.codfw.wmnet
  • 01:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 01:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=phab2001-vcs.codfw.wmnet
  • 01:45 mutante: I think git-ssh6_22 is down (see alerts lvs2008/2009) due to the v6 issue from ongoing lvs maintenance. depooled in conftool
  • 01:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 01:40 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor2001.codfw.wmnet
  • 01:37 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 01:35 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/Cite/modules/ve-cite/ve.dm.MWReferenceNode.js: Backport for T296044 (duration: 00m 55s)
  • 01:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:31 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 01:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor2002.codfw.wmnet
  • 01:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor2001.codfw.wmnet
  • 01:19 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 01:18 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 01:09 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2002.codfw.wmnet
  • 01:09 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2001.codfw.wmnet
  • 01:05 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor2006.codfw.wmnet
  • 01:05 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor2005.codfw.wmnet
  • 00:56 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor2006.codfw.wmnet
  • 00:56 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor2005.codfw.wmnet
  • 00:55 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor2006.codfw.wmnet
  • 00:55 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor2005.codfw.wmnet
  • 00:33 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 00:08 brennen: end of UTC late deployment training window

2021-11-18

  • 23:47 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 23:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1001.eqiad.wmnet,service=miscweb
  • 23:28 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 23:27 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 22:52 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 22:48 XioNoX: asw-b-codfw> request system power-off member 7
  • 22:44 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 22:28 mutante: icinga (alert1001) - manually fix IP of mw1488.mgmt (was 0.0.0.0 is: 10.65.1.26) in /etc/icinga/objects/puppet_hosts.cfg , running puppet
  • 22:06 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor1003.eqiad.wmnet
  • 21:53 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor1003.eqiad.wmnet
  • 21:50 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor1004.eqiad.wmnet
  • 21:36 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor1004.eqiad.wmnet
  • 21:31 XioNoX: asw-b-codfw> request system power-off member 7
  • 21:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor1004.eqiad.wmnet
  • 21:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor1003.eqiad.wmnet
  • 21:01 ejegg: updated payments-wiki from abb2bd9d -> d1d6f024
  • 21:00 mutante: [puppetmaster1001:/var/run/confd-template] $ sudo rm .git-ssh*.err
  • 21:00 mutante: [puppetmaster2001:/var/run/confd-template] $ sudo rm .git-ssh*.err
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
  • 20:51 dcausse: restart blazegraph on wdqs1006 (jvm stuck)
  • 20:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 20:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 20:45 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=phab2001-vcs.codfw.wmnet
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:43 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.9 refs T293950
  • 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:31 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.9 refs T293950 (duration: 01m 03s)
  • 20:30 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.9 refs T293950
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:27 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.9/tests/phpunit/includes/page/PageStoreTest.php: Backport for T295931 (duration: 01m 03s)
  • 20:25 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.9/includes/page/PageStore.php: Backport for T295931 (duration: 01m 04s)
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:05 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 20:01 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 19:53 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1004.eqiad.wmnet
  • 19:52 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1003.eqiad.wmnet
  • 19:52 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor1006.eqiad.wmnet
  • 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 19:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4b4c0bc: Enable DiscussionTools automatic topic subscriptions as beta feature on most wikis (T290500) (duration: 01m 04s)
  • 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:13 twentyafterfour: upgrading php7.3 packages on phab1001
  • 19:07 twentyafterfour: rebooting phab2001 to apply updated php and kernel packages
  • 19:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-test-coord1002.eqiad.wmnet with OS bullseye
  • 19:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2001.codfw.wmnet with reason: kernel upgrade
  • 19:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2001.codfw.wmnet with reason: kernel upgrade
  • 18:57 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - T295705
  • 18:52 XioNoX: asw-b-codfw> request system reboot member 7 - T295118
  • 18:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bullseye
  • 17:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:49 XioNoX: asw-b-codfw> request system power-off member 7 - T295118
  • 15:39 XioNoX: lvs2007:~$ sudo service pybal stop - T295118
  • 15:36 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 15:35 XioNoX: cr2-codfw# set interfaces et-1/0/3 disable - T295118
  • 15:34 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 15:33 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 15:16 hnowlan: roll restarting cassandra on codfw maps for java updates
  • 15:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 14:44 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 14:38 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 14:37 hnowlan: roll-restarting sessionstore for java updates
  • 14:19 moritzm: installing testvm2003
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2003.codfw.wmnet
  • 13:34 moritzm: installing pam bugfix updates on bullseye hosts
  • 13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
  • 13:22 moritzm: failover ganeti master in test cluster to ganeti-test2002 T284811
  • 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudcephosd1016.wikimedia.org
  • 12:23 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudcephosd1016.wikimedia.org
  • 12:21 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 12:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:20 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mc1025.eqiad.wmnet
  • 12:16 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mc1025.eqiad.wmnet
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mc1026.eqiad.wmnet
  • 12:16 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mc1026.eqiad.wmnet
  • 12:15 marostegui: Upgrade dbstore1007 to 10.4.22 T290841 T295970
  • 12:15 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Tamil (ta) Section Translation in test wiki (T294223) (duration: 01m 05s)
  • 12:06 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6003.drmrs.wmnet with OS buster
  • 11:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6002.drmrs.wmnet with OS buster
  • 11:29 arturo: aborrero@apt1001:~$ sudo -i reprepro export
  • 11:27 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host lvs6003.drmrs.wmnet with OS buster
  • 11:26 arturo: aborrero@apt1001:~$ sudo -i reprepro processincoming default /srv/wikimedia/incoming/python-flask-keystone_0.2~git20201012.b5cd4da-1_amd64.changes (T295234)
  • 11:08 arturo: run aborrero@apt1001:~$ sudo -i reprepro processincoming default
  • 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 11:07 arturo: added python-flask-oslolog_0.1~git20201012.7803a46-1 to bullseye-wikimedia (T295234)
  • 11:06 arturo: aborrero@apt1001:~ $ for i in $(ll /srv/wikimedia/incoming/ | grep aborrero | awk -F' ' '{print $NF}') ; do rm /srv/wikimedia/incoming/$i ; done
  • 11:05 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host lvs6002.drmrs.wmnet with OS buster
  • 11:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 10:57 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6001.drmrs.wmnet with OS buster
  • 10:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2002.codfw.wmnet with OS buster
  • 10:17 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host lvs6001.drmrs.wmnet with OS buster
  • 10:12 topranks: Re-pooling eqiad in DNS after completing iBGP policy changes on cr1-eqiad and cr2-eqiad T295672
  • 10:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:01 moritzm: updating perf on buster hosts
  • 10:00 topranks: Re-enabling Equinix IXP port on cr1-eqiad following iBGP changes to address T295650
  • 09:56 ema: cp4021: repool w/ single backend experiment enabled T288106
  • 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2002.codfw.wmnet with OS buster
  • 09:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:41 ema: cp4021: stop ats-be and clear its cache T288106
  • 09:35 ema: cp4021: depool to enable single backend experiment T288106
  • 09:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1090.eqiad.wmnet with OS buster
  • 09:32 vgutierrez: pool cp1090 (upload) running HAProxy as TLS terminator - T290005
  • 09:18 jayme: systemctl start prune-production-images.service on deneb - T287222
  • 08:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1090.eqiad.wmnet with OS buster
  • 08:46 vgutierrez: depool cp1090 to be reimaged as cache::upload_haproxy - T290005
  • 08:45 moritzm: installing mariadb-10.3 security updates on buster (as packaged in Debian, not the wmf-internal packages)
  • 08:27 topranks: De-pool of Eqiad seems to be ok, transit/peering/transport links changed BW profile but nothing maxed, total LVS connections steady but have shifted to codfw. Proceeding to reconfigure iBGP policy on cr1-eqiad and cr2-eqiad maually.
  • 08:01 topranks: Depooling eqiad in authdns to allow for reconfiguration of CR routers on site (T295672)
  • 07:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/maintenance/migrateRevisionActorTemp.php: Backport: maintenance: Add waitForReplication and sleep in migrateRevisionActorTemp (T275246) (duration: 01m 04s)
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17772 and previous config saved to /var/cache/conftool/dbconfig/20211118-073507-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17771 and previous config saved to /var/cache/conftool/dbconfig/20211118-072004-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist from s5 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17770 and previous config saved to /var/cache/conftool/dbconfig/20211118-070620-marostegui.json
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17769 and previous config saved to /var/cache/conftool/dbconfig/20211118-070559-root.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17768 and previous config saved to /var/cache/conftool/dbconfig/20211118-070500-root.json
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17767 and previous config saved to /var/cache/conftool/dbconfig/20211118-065055-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 40%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17766 and previous config saved to /var/cache/conftool/dbconfig/20211118-064957-root.json
  • 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17765 and previous config saved to /var/cache/conftool/dbconfig/20211118-063552-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17764 and previous config saved to /var/cache/conftool/dbconfig/20211118-063453-root.json
  • 06:31 Amir1: revoked all grants from wikiadmin and gave back an explicit list on db1102:3312 (T249683)
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17763 and previous config saved to /var/cache/conftool/dbconfig/20211118-062048-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 20%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17762 and previous config saved to /var/cache/conftool/dbconfig/20211118-061949-root.json
  • 06:17 Amir1: revoked all grants from wikiadmin and gave back an explicit list on db1156 (T249683)
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17761 and previous config saved to /var/cache/conftool/dbconfig/20211118-060446-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 5%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17760 and previous config saved to /var/cache/conftool/dbconfig/20211118-054942-root.json
  • 05:47 marostegui: Upgrade clouddb1014
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 1%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17759 and previous config saved to /var/cache/conftool/dbconfig/20211118-053438-root.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 due to network issues (T295952)', diff saved to https://phabricator.wikimedia.org/P17758 and previous config saved to /var/cache/conftool/dbconfig/20211118-050802-ladsgroup.json
  • 04:23 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 02:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2006.codfw.wmnet
  • 02:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2005.codfw.wmnet
  • 01:56 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2006.codfw.wmnet
  • 01:48 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2006.codfw.wmnet
  • 01:47 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2005.codfw.wmnet
  • 01:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:42 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2005.codfw.wmnet
  • 01:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:35 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP - Config: Revert "Stop setting wgActorTableSchemaMigrationStage, no longer read in core" (T275246) (duration: 01m 04s)
  • 00:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thumbor2006.codfw.wmnet with OS stretch
  • 00:28 legoktm@cumin1001: START - Cookbook sre.hosts.reimage for host thumbor2006.codfw.wmnet with OS stretch
  • 00:26 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thumbor2005.codfw.wmnet with OS stretch
  • 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:20 ryankemper: T290902 Test host looks good, proceeding to rest of fleet `ryankemper@cumin1001:~$ sudo cumin -b 4 '*elastic*' 'sudo run-puppet-agent --force'`
  • 00:18 urbanecm: UTC late B&C finished
  • 00:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:18 ryankemper: T290902 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/739379; running puppet agent on arbitrary elastic host: `ryankemper@elastic1051:~$ sudo run-puppet-agent --force`
  • 00:17 ryankemper: T290902 Disabling puppet across all elastic*: `ryankemper@cumin1001:~$ sudo cumin '*elastic*' 'sudo disable-puppet "Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/739379"'`
  • 00:16 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 5110fe7: Migrate wmfHostnames to wmgHostnames (T45956) (duration: 01m 03s)
  • 00:12 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/brwikimedia.png and respective HD variants
  • 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:08 urbanecm@deploy1002: Synchronized static/images/project-logos: 59c3fe6: Lossless optimization of the brwikimedia logo (duration: 01m 04s)
  • 00:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:00 legoktm@cumin1001: START - Cookbook sre.hosts.reimage for host thumbor2005.codfw.wmnet with OS stretch

2021-11-17

  • 23:53 eileen: * revision 8054869b -> b3e2a122 (latest)
  • 23:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1003.eqiad.wmnet
  • 23:49 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1001.eqiad.wmnet
  • 23:45 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
  • 23:45 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor1006.eqiad.wmnet
  • 23:44 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1006.eqiad.wmnet
  • 23:43 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1006.eqiad.wmnet
  • 23:35 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor1005.eqiad.wmnet
  • 23:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 22:42 mutante: miscweb1002/2002 - moved /srv/deployment/scholarships to /root/ (T243037)
  • 21:42 ayounsi@deploy1002: Finished deploy [homer/deploy@dc007aa]: Homer CR738905 (duration: 01m 27s)
  • 21:41 ayounsi@deploy1002: Started deploy [homer/deploy@dc007aa]: Homer CR738905
  • 21:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:47 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:33 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.7"
  • 20:23 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.9 refs T293950 (duration: 01m 03s)
  • 20:22 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.9 refs T293950
  • 19:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:42 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/export/WikiExporter.php: Backport: export: Ignore rev_page_id index (T285149) (duration: 01m 04s)
  • 19:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8e167a5: Disable local file upload on the Chinese Wikisource (T295265) (duration: 01m 05s)
  • 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7b3a1d9: Make reply tool available as opt-out on commonswiki (T295838) (duration: 01m 05s)
  • 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2042.codfw.wmnet with OS buster
  • 18:57 ejegg: updated fundraising CiviCRM from 9c5f0b69 -> 8054869b
  • 18:56 vgutierrez: pool cp2042 (upload) running HAProxy as TLS terminator - T290005
  • 18:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2042.codfw.wmnet with OS buster
  • 18:05 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:01 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 17:59 vgutierrez: depool cp2042 to be reimaged as an HAProxy cache upload node - T290005
  • 17:41 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 17:25 cmooney@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host rpki2002.codfw.wmnet
  • 17:11 XioNoX: repool Telia eqiad-codfw transport
  • 17:10 cmooney@cumin2002: START - Cookbook sre.ganeti.makevm for new host rpki2002.codfw.wmnet
  • 16:34 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts rpki2001.codfw.wmnet
  • 16:32 mutante: LDAP - added jkieserman to wmf (T295693)
  • 16:28 cmooney@cumin2002: START - Cookbook sre.hosts.decommission for hosts rpki2001.codfw.wmnet
  • 16:28 XioNoX: drain Telia eqiad-codfw link
  • 16:27 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts rpki2001.codfw.wmnet
  • 16:21 XioNoX: move cr1-codfw<->cr2-eqdfw link to BO cable
  • 16:19 cmooney@cumin2002: START - Cookbook sre.hosts.decommission for hosts rpki2001.codfw.wmnet
  • 16:06 XioNoX: move cr1-codfw:xe-5/3/0 to BO cable
  • 16:04 XioNoX: re-enable Telia BGP on cr1-codfw
  • 16:01 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 15:59 bblack: netbox: added ganeti01 and ganeti02 cluster definitions for drmrs
  • 15:58 XioNoX: disable Telia BGP on cr1-codfw
  • 15:55 XioNoX: move codfw-ulsfo link to break-out cable
  • 15:46 mutante: restarting pybal on lvs1015
  • 15:43 _joe_: restarting pybal on lvs2009
  • 15:42 mutante: restarting pybal on lvs1016
  • 15:39 _joe_: restarting pybal on lvs2010
  • 15:35 XioNoX: drain ulsfo-codfw link
  • 14:47 moritzm: installing perl bugfix updates from Bullseye point release
  • 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Ganeti update tests
  • 14:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Ganeti update tests
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights on s5 special slaves in eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17755 and previous config saved to /var/cache/conftool/dbconfig/20211117-134942-marostegui.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges from s5 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17754 and previous config saved to /var/cache/conftool/dbconfig/20211117-134835-marostegui.json
  • 13:20 aborrero@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudbackup1001-dev.eqiad.wmnet
  • 13:10 aborrero@cumin1001: START - Cookbook sre.ganeti.makevm for new host cloudbackup1001-dev.eqiad.wmnet
  • 13:02 Lucas_WMDE: UTC morning backport+config window done
  • 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 12:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 12:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:24 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable disambiguator notifications on 6 Wikipedias (T293319) (duration: 01m 04s)
  • 12:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:22 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 12:17 topranks: Re-pooling ulsfo after completing routing changes on cr3-ulsfo and cr4-ulsfo (T295672)
  • 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:11 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 12:11 moritzm: failover ganeti master in test cluster to ganeti-test2003
  • 12:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable more languages for Section Translation in testwiki (T294223) (duration: 01m 52s)
  • 12:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:09 moritzm: installing testvm2002
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked from s5 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17753 and previous config saved to /var/cache/conftool/dbconfig/20211117-105120-marostegui.json
  • 10:45 dcausse: restarting blazegraph on wdqs1013 (jvm stuck)
  • 10:45 topranks: Commencing manual config on cr3-ulsfo and cr4-ulsfo (site depooled) to reconfigure iBGP (T295672)
  • 10:42 hnowlan: replaced all references to deploy1001 with deploy1002 in all .git/DEPLOY_HEAD directories on deploy1002:/srv/deployment
  • 10:41 ema: A:cp re-enable puppet after testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/738949/ T293879
  • 10:37 jayme: imported wmf-certificates 0~20211110-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
  • 10:31 ema: A:cp disable-puppet to merge and test https://gerrit.wikimedia.org/r/c/operations/puppet/+/738949/ T293879
  • 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
  • 10:18 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster
  • 10:18 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
  • 10:14 topranks: De-pool ulsfo in DNS to allow safe reconfiguration / test of changes to CR routers iBGP (T295672)
  • 10:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:00 moritzm: running "gnt-cluster upgrade --to 2.16" on ganeti test cluster
  • 09:59 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:59 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:53 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS buster
  • 09:48 moritzm: running "gnt-cluster renew-crypto --new-cluster-certificate" on ganeti test cluster
  • 09:39 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS buster
  • 09:35 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS buster
  • 09:19 _joe_: removing php 7.3 images from docker-registry.wikimedia.org
  • 09:13 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS buster
  • 09:11 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS buster
  • 09:03 moritzm: installing ffmpeg security updates on stretch
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17752 and previous config saved to /var/cache/conftool/dbconfig/20211117-090124-root.json
  • 08:56 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS buster
  • 08:54 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS buster
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17751 and previous config saved to /var/cache/conftool/dbconfig/20211117-084621-root.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17750 and previous config saved to /var/cache/conftool/dbconfig/20211117-083117-root.json
  • 08:30 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS buster
  • 08:24 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS buster
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 40%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17749 and previous config saved to /var/cache/conftool/dbconfig/20211117-081613-root.json
  • 08:14 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS buster
  • 08:11 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS buster
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17748 and previous config saved to /var/cache/conftool/dbconfig/20211117-080110-root.json
  • 07:49 elukey: restart coal, navtiming, statsv (refreshed by puppet) after https://gerrit.wikimedia.org/r/737970
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 20%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17747 and previous config saved to /var/cache/conftool/dbconfig/20211117-074606-root.json
  • 07:44 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS buster
  • 07:34 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS buster
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17746 and previous config saved to /var/cache/conftool/dbconfig/20211117-073102-root.json
  • 07:31 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS buster
  • 07:29 elukey: `apt-get clean` on an-tool1005 to free space in the root partition
  • 07:28 elukey: `sudo pkill -U jmixter` on stat100[5,8] to allow puppet to run and remove the offboarded user
  • 07:22 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS buster
  • 07:20 Amir1: start of clean up of autreview logs of ruwiki, deleting 3.5M rows (T285608)
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17745 and previous config saved to /var/cache/conftool/dbconfig/20211117-071559-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 1%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17744 and previous config saved to /var/cache/conftool/dbconfig/20211117-070055-root.json
  • 06:58 marostegui: Upgrade db1180 to 10.4.22
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 for upgrade', diff saved to https://phabricator.wikimedia.org/P17743 and previous config saved to /var/cache/conftool/dbconfig/20211117-065740-marostegui.json
  • 06:52 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS buster
  • 06:43 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS buster
  • 06:38 Amir1: start of deleting auto-review logs in arwiki (T285608) deleting 23M rows
  • 06:33 marostegui: Upgrade clouddb1018
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager from s5 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17742 and previous config saved to /var/cache/conftool/dbconfig/20211117-060426-marostegui.json
  • 03:16 eileen: checkout revision (c67b18b9 -> 9c5f0b69)
  • 02:10 eileen: * revision 817e514a -> c67b18b9 (latest) civicrm
  • 00:19 ryankemper: T276198 `ryankemper@cumin1001:~$ sudo cumin -b 3 '*elastic*' 'sudo run-puppet-agent --force'` Change looks good (no complaints from systemd), rolling out to rest of fleet / reenabling puppet
  • 00:15 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1006.eqiad.wmnet
  • 00:06 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1006.eqiad.wmnet

2021-11-16

  • 23:59 ryankemper: T276198 `ryankemper@elastic1049:~$ sudo run-puppet-agent --force` to test out https://gerrit.wikimedia.org/r/c/operations/puppet/+/739375
  • 23:57 ejegg: updated payments-wiki from 49ad5962 -> abb2bd9d
  • 23:27 ryankemper: T276198 `ryankemper@elastic1049:~$ sudo run-puppet-agent --force`; `elasticsearch_6@production-search-eqiad.service ` didn't restart but it looks like there might be slightly wrong with the new `ExecPreStart` line => `Executable path is not absolute, ignoring: systemd-tmpfiles --create /usr/lib/tmpfiles.d/elasticsearch.conf`
  • 23:27 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor1005.eqiad.wmnet
  • 23:25 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1005.eqiad.wmnet
  • 23:22 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1005.eqiad.wmnet
  • 23:22 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1005.eqiad.wmnet
  • 23:21 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1005.eqiad.wmnet
  • 23:19 ryankemper: T276198 `ryankemper@cumin1001:~$ sudo cumin '*elastic*' 'sudo disable-puppet "Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/721644"'` (done a few mins ago)
  • 20:51 mutante: [miscweb2002:/var/cache] $ sudo rm -rf scholarships/
  • 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:39 dcausse: restarting blazegraph on wdqs1005 (jvm stuck)
  • 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:09 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.9 refs T293950
  • 19:52 cmjohnson1: moving mgmt cables from old msw to new msw in b7-eqiad
  • 19:51 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6002.drmrs.wmnet with OS bullseye
  • 19:51 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6003.drmrs.wmnet with OS bullseye
  • 19:46 joal@deploy1002: Finished deploy [analytics/refinery@194b11b] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@194b11b] (duration: 06m 53s)
  • 19:43 cmjohnson1: moving mgmt cables from old msw to new msw in b5-eqiad
  • 19:40 joal@deploy1002: Started deploy [analytics/refinery@194b11b] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@194b11b]
  • 19:39 joal@deploy1002: Finished deploy [analytics/refinery@194b11b] (thin): Regular analytics weekly train THIN [analytics/refinery@194b11b] (duration: 00m 07s)
  • 19:39 joal@deploy1002: Started deploy [analytics/refinery@194b11b] (thin): Regular analytics weekly train THIN [analytics/refinery@194b11b]
  • 19:38 joal@deploy1002: Finished deploy [analytics/refinery@194b11b]: Regular analytics weekly train [analytics/refinery@194b11b] (duration: 22m 14s)
  • 19:34 cmjohnson1: moving mgmt cables from old msw to new msw in b3-eqiad
  • 19:27 cmjohnson1: moving mgmt cables from old msw to new msw in b2-eqiad
  • 19:18 cmjohnson1: moving mgmt cables from old msw to new msw in b1-eqiad
  • 19:16 joal@deploy1002: Started deploy [analytics/refinery@194b11b]: Regular analytics weekly train [analytics/refinery@194b11b]
  • 19:15 jhuneidi@deploy1002: Pruned MediaWiki: 1.38.0-wmf.6 (duration: 03m 17s)
  • 19:14 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6001.drmrs.wmnet with OS bullseye
  • 19:11 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.9 refs T293950 (duration: 36m 32s)
  • 19:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6004.drmrs.wmnet with OS bullseye
  • 19:11 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6002.drmrs.wmnet with OS bullseye
  • 19:11 cmjohnson1: moving mgmt cables from old msw to new msw in a7-eqiad
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:10 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6003.drmrs.wmnet with OS bullseye
  • 19:08 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 19:06 cmjohnson1: moving mgmt cables from old msw to new msw in a5-eqiad
  • 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:01 cmjohnson1: moving mgmt cables from old msw to new msw in a4-eqiad
  • 18:56 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti6003.drmrs.wmnet with OS bullseye
  • 18:55 cmjohnson1: moving mgmt cables from old msw to new msw in a3-eqiad
  • 18:51 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti6002.drmrs.wmnet with OS bullseye
  • 18:41 cmjohnson1: moving mgmt cables from old msw to new msw in a2-eqiad
  • 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:35 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.9 refs T293950
  • 18:31 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6003.drmrs.wmnet with OS bullseye
  • 18:31 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6001.drmrs.wmnet with OS bullseye
  • 18:31 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6002.drmrs.wmnet with OS bullseye
  • 18:31 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6004.drmrs.wmnet with OS bullseye
  • 18:28 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:26 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:17 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:20 mutante: removing scholarships.wikimedia.org from DNS - T243037
  • 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:11 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:56 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 16:27 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 16:23 herron: systemctl reset-failed ifup@ens13 on prometheus5001 T273026
  • 16:22 moritzm: systemctl reset-failed ifup@esn13 on durum5001 after restart T273026
  • 16:12 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 16:05 moritzm: powercycling ganeti5002
  • 15:53 andrewbogott: merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/525220 which makes read-only ldap the default for ldap clients
  • 14:44 cmooney@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host rpki2001.codfw.wmnet
  • 14:31 cmooney@cumin2002: START - Cookbook sre.ganeti.makevm for new host rpki2001.codfw.wmnet
  • 14:31 cmooney@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host rpki2002.codfw.wmnet
  • 14:24 jynus: re-adding backup user to db1108:analytics_meta T284150
  • 14:22 cmooney@cumin2002: START - Cookbook sre.ganeti.makevm for new host rpki2002.codfw.wmnet
  • 14:18 cmooney@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rpki2001.codfw.wmnet
  • 14:09 cmooney@cumin2002: START - Cookbook sre.hosts.decommission for hosts rpki2001.codfw.wmnet
  • 13:58 cmooney@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host rpki2001.codfw.wmnet
  • 13:51 cmooney@cumin2002: START - Cookbook sre.ganeti.makevm for new host rpki2001.codfw.wmnet
  • 13:23 moritzm: installing debconf bugfix updates on buster
  • 13:21 moritzm: prune unused packages from ping3001 T295767
  • 13:18 moritzm: prune unused packages from ping1001/ping2001 T295767
  • 13:05 moritzm: installing psmisc bugfix updates on buster hosts
  • 13:04 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS buster
  • 12:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:29 moritzm: installing Linux 4.19.208 updates on buster hosts (no reboots)
  • 12:24 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS buster
  • 12:22 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 12:13 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6006.drmrs.wmnet with OS buster
  • 11:55 moritzm: failover ganeti master in test cluster to ganeti-test2002
  • 11:34 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6006.drmrs.wmnet with OS buster
  • 11:31 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 11:03 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 10:30 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6005.drmrs.wmnet with OS buster
  • 10:21 ema: A:cp re-enable puppet after successful test on cp402[17] T293879
  • 10:20 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 10:15 moritzm: installing testvm2001
  • 10:06 arturo: updating deb packages on stretch-wikimedia/thirdparty/kubeadm-k8s-1-21 (T282942)
  • 10:02 ema: A:cp disable puppet to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/738910 on cp4021 T293879
  • 09:51 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6005.drmrs.wmnet with OS buster
  • 09:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6004.drmrs.wmnet with OS buster
  • 09:40 ayounsi@deploy1002: Finished deploy [homer/deploy@c570af3]: Homer CR738905 (duration: 01m 25s)
  • 09:39 ayounsi@deploy1002: Started deploy [homer/deploy@c570af3]: Homer CR738905
  • 09:09 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6004.drmrs.wmnet with OS buster
  • 08:54 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6003.drmrs.wmnet with OS buster
  • 08:14 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6003.drmrs.wmnet with OS buster
  • 08:04 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6002.drmrs.wmnet with OS buster
  • 07:25 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS buster
  • 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:28 urbanecm: UTC late window done
  • 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:23 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/WikimediaEvents/: 738399: 739004: WikimediaEvents backports (T294738) (duration: 00m 56s)
  • 00:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 50d9f26: GrowthExperiments: Set up GEHomepageNewAccountVariantsByPlatform (T294737) (duration: 00m 56s)
  • 00:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-11-15

  • 23:10 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1005.eqiad.wmnet
  • 22:59 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1005.eqiad.wmnet
  • 22:58 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on thumbor1005.eqiad.wmnet with reason: reboot after first puppet run
  • 22:58 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on thumbor1005.eqiad.wmnet with reason: reboot after first puppet run
  • 21:46 bblack: dns6002 - reboot for another round of bios fixups
  • 21:32 bblack: dns6001 - reboot for another round of bios fixups
  • 21:21 legoktm: uploaded php7.4_7.4.25-1+wmf2+buster1_amd64.changes to apt.wm.o with patch for T293568
  • 21:19 mutante: removing mediawiki font packages from remaining regular appservers globally (T294378)
  • 20:49 mutante: retiring https://scholarships.wikimedia.org - removing from ATS (T243037)
  • 20:49 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6001.drmrs.wmnet with OS buster
  • 20:09 Amir1: revoked all grants from wikiadmin and gave back an explicit list on clouddb1013:3311 (T249683)
  • 20:08 Amir1: revoked all grants from wikiadmin and gave back an explicit list on clouddb1021:3311 (T249683)
  • 20:07 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster
  • 20:03 Amir1: revoked all grants from wikiadmin and gave back an explicit list on db1102:3312 (T249683)
  • 19:57 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp6001.drmrs.wmnet with OS buster
  • 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:46 Amir1: revoked all grants from wikiadmins and gave back explicit list on db2101:3315 (T249683)
  • 19:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:45 urbanecm: UTC evening B&C window done
  • 19:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 898ebb1: Enable talk for mobile users on enwiki (T293946) (duration: 00m 57s)
  • 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:31 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster
  • 19:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cdac608: Change votewiki language back to English (T292685) (duration: 00m 56s)
  • 19:06 mutante: removing font packages from MW API appservers T294378
  • 18:58 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp6001.drmrs.wmnet with OS buster
  • 18:52 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 40 days, 0:00:00 on ps1-d1-codfw with reason: Testing new PDU devices T265435
  • 18:52 volans@cumin2002: START - Cookbook sre.hosts.downtime for 40 days, 0:00:00 on ps1-d1-codfw with reason: Testing new PDU devices T265435
  • 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:32 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster
  • 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1be2d39: Growth IP research survey: Fix platforms (T294568) (duration: 00m 55s)
  • 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d15948e: foundationwiki: Restrict editing in more namespaces (T294900) (duration: 00m 56s)
  • 18:19 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikimedia.org/T294580
  • 18:19 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikimedia.org/T294580
  • 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:07 arnoldokoth: upgrading gitlab version on gitlab2001 (T294580)
  • 18:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0065075: foundationwiki: Revoke edit from * (T294900) (duration: 00m 56s)
  • 16:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-coord1002.eqiad.wmnet with OS bullseye
  • 16:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bullseye
  • 16:34 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/createLocalAccount.php --wiki=enwiki 'MU test T244635 1'
  • 16:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:46 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.7/includes/media/DjVuHandler.php: Backport: media: Avoid logspam in case of lack of 'data' in metadata (duration: 00m 55s)
  • 15:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 734f3b0: uzwiki: Enable Growth features in dark mode (T294245; 3/3) (duration: 00m 55s)
  • 15:28 urbanecm@deploy1002: Synchronized wmf-config/config/uzwiki.yaml: 734f3b0: uzwiki: Enable Growth features in dark mode (T294245; 2/3) (duration: 00m 55s)
  • 15:26 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 734f3b0: uzwiki: Enable Growth features in dark mode (T294245; 1/3) (duration: 00m 55s)
  • 15:26 urbanecm: mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=uzwiki --phab=T294245 # T294245
  • 15:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:24 elukey: import AMD ROCm 4.5 in thirdparty/amd-rocm45 for buster-wikimedia - T295661
  • 15:18 urbanecm: uzwiki: Create growthexperiments tables (T294245)
  • 15:15 elukey: `reprepro --delete clearvanished` on apt1001 to clean-up thirdparty/amd-rocm38 (buster and stretch) - T295661
  • 14:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4f17e85: GrowthExperiments: Disable link recommendation frontend on dewiki (duration: 00m 56s)
  • 14:45 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 14:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:15 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable DPL on Wikiquotes where not in use (T287916) (duration: 00m 56s)
  • 13:55 moritzm: installing java-atk-wrapper bugfix updates
  • 13:51 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:40 Amir1: start of djvu clean up in commons in a screen. Gonna take a couple of days (T275268)
  • 13:40 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.7/includes: Backport: Revert "media: Port DjVuImage::retrieveMetaData() to use BoxedCommand" (duration: 01m 01s)
  • 13:36 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 13:34 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 13:25 topranks: Adding new policy-statement to CR routers via homer to set next-hop self on iBGP sessions (not yet configured for any peers).
  • 12:46 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 12:45 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:02 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: 6b3bacd: uzwiki: Enable VisualEditor by default (T294245) (duration: 00m 56s)
  • 11:59 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 11:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 11:07 cmooney@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host rpki1001.eqiad.wmnet
  • 11:04 urbanecm: wikiadmin@10.64.0.164(ukwiki)> delete from growthexperiments_mentor_mentee where gemm_mentee_id = 464811 /* Martin Urbanec (WMF) */;
  • 11:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 10:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:57 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.7/includes/media/DjVuHandler.php: Backport: media: Make new DjVu metadata handler more defensive (duration: 00m 54s)
  • 10:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 10:54 cmooney@cumin1001: START - Cookbook sre.ganeti.makevm for new host rpki1001.eqiad.wmnet
  • 10:53 volans: upgrading python3-wmflib to 1.0.0-1 on all hosts buster+
  • 10:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:43 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rpki1001.eqiad.wmnet
  • 10:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/GrowthExperiments/: 05d6550: MenteeOverviewDataUpdater: Use UserOptionsManager::saveOptions (T295339) (duration: 00m 56s)
  • 10:34 cmooney@cumin1001: START - Cookbook sre.hosts.decommission for hosts rpki1001.eqiad.wmnet
  • 10:34 topranks: Rebuilding rpki1001.eqiad.wmnet. with larger disk - going to decom and then re-create via cookbooks.
  • 10:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:23 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.7/includes/media/: Backport: media: Build and use JSON for metadata of djvu instead of XML (T275268 T192866) (duration: 00m 56s)
  • 10:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:00 moritzm: update Java on Hadoop and Presto nodes
  • 09:59 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.7/includes/media/: Backport: media: Port DjVuImage::retrieveMetaData() to use BoxedCommand (T289228) (duration: 00m 56s)
  • 09:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:39 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 08:49 moritzm: installing glibc bugfix updates from bullseye point release
  • 08:07 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp6002.drmrs.wmnet with OS buster
  • 07:41 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS buster

2021-11-14

  • 11:48 paravoid: disable cr1-eqiad:xe-3/0/6 (IXP port) to mitigate T295650

2021-11-13

  • 18:43 AndyRussG: Enabled debug logging for PayPal IPN listener (updated SmashPig config a9e30591 -> 9567cc4a on frpig1001)
  • 02:59 ryankemper: [Elastic] `relforge` cluster's back to green, rolling restarts complete
  • 02:57 ryankemper: [Elastic] `ryankemper@relforge1003:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service`
  • 02:56 ryankemper: [Elastic] Cluster's green, proceeding to next and final host
  • 02:52 ryankemper: [Elastic] `ryankemper@relforge1004:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service`
  • 02:52 ryankemper: [Elastic] Downtimed relforge* for 2 hours in order to performing a rolling restart of the two hosts `relforge1003` and `relforge1004`

2021-11-12

  • 21:00 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:09 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 18:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 17:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:35 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 17:33 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 17:15 ottomata: restarting and arming keyholder on deploy1002 - T295380
  • 17:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:59 otto@deploy1002: Finished deploy [airflow-dags/analytics@093f067] (hadoop-test): (no justification provided) (duration: 00m 04s)
  • 16:59 otto@deploy1002: Started deploy [airflow-dags/analytics@093f067] (hadoop-test): (no justification provided)
  • 16:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:38 otto@deploy1002: Finished deploy [airflow-dags/analytics@093f067] (hadoop-test): (no justification provided) (duration: 01m 12s)
  • 16:36 otto@deploy1002: Started deploy [airflow-dags/analytics@093f067] (hadoop-test): (no justification provided)
  • 16:15 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:11 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 14:38 moritzm: installing 5.10.70 kernels on bullseye systems (just the update, no coordinated reboot)
  • 11:05 jynus@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2100.codfw.wmnet with OS buster
  • 10:47 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2100.codfw.wmnet with OS buster
  • 10:46 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:42 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 10:41 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
  • 10:35 ema: A:cp re-enable puppet after successful testing of https://gerrit.wikimedia.org/r/c/operations/puppet/+/737424 on cp4027 T293879
  • 10:25 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
  • 10:17 ema: A:cp disable-puppet to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/737424 on cp4027 T293879
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17736 and previous config saved to /var/cache/conftool/dbconfig/20211112-084813-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17735 and previous config saved to /var/cache/conftool/dbconfig/20211112-083310-root.json
  • 08:27 moritzm: imported openjdk-8 8u312-b07-1~deb11u1 to component/jdk8 for bullseye-wikimedia (rebuild of latest Java 8 security release for Bullseye)
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17734 and previous config saved to /var/cache/conftool/dbconfig/20211112-081806-root.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 40%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17733 and previous config saved to /var/cache/conftool/dbconfig/20211112-080302-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17732 and previous config saved to /var/cache/conftool/dbconfig/20211112-074759-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 20%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17731 and previous config saved to /var/cache/conftool/dbconfig/20211112-073255-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 10%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17730 and previous config saved to /var/cache/conftool/dbconfig/20211112-071752-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Add weight for db1104', diff saved to https://phabricator.wikimedia.org/P17729 and previous config saved to /var/cache/conftool/dbconfig/20211112-070236-marostegui.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 5%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17728 and previous config saved to /var/cache/conftool/dbconfig/20211112-070141-root.json
  • 00:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:15 tgr: UTC late deploys done
  • 00:14 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable GrowthExperiments image recommendations on eswiki (T294878) (duration: 00m 56s)

2021-11-11

  • 16:56 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
  • 16:30 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
  • 16:28 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
  • 16:28 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
  • 16:26 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
  • 16:26 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
  • 16:26 jynus@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1139.eqiad.wmnet with OS buster
  • 16:15 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp6001.drmrs.wmnet with OS buster
  • 16:12 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
  • 15:49 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster
  • 15:44 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp6001.drmrs.wmnet with OS buster
  • 15:18 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster
  • 15:16 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
  • 14:59 moritzm: installing krb5 security updates on buster/bullseye (client-side libs/tools only, KDCs already fixed)
  • 14:55 moritzm: installing PHP 7.0 security updates
  • 14:52 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
  • 14:50 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - btullis@cumin1001
  • 14:46 moritzm: installing sqlalchemy security updates on stretch
  • 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:41 moritzm: installing libxstream-java security updates
  • 14:38 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - btullis@cumin1001
  • 14:33 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
  • 14:32 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
  • 14:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:21 volans: uploaded python3-wmflib_1.0.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 14:15 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
  • 14:12 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
  • 14:10 jynus@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2100.codfw.wmnet with OS buster
  • 14:05 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
  • 13:59 moritzm: installing bind9 security updates (only client-side-tools/libs)
  • 13:48 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
  • 13:45 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2100.codfw.wmnet with OS buster
  • 13:38 root@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 13:38 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Load Wikibase Client before other Wikibase extensions (T294224) (duration: 00m 55s)
  • 13:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:01 Lucas_WMDE: UTC morning backport+config window formally over (I’ll do one more config change shortly)
  • 13:00 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Add campaign pattern for control group (T295068) (duration: 00m 55s)
  • 12:50 lucaswerkmeister-wmde@deploy1002: Synchronized multiversion/buildConfigCache.php: Config: Don't need to keep all config in memory (resync, previous deploy for this file was missing `git rebase`) (duration: 00m 55s)
  • 12:47 kharlan@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: Backport: CreateAccountCampaign: Show/hide new HTML based on query param (T295068) (2/2 SpecialCreateAccountCampaign.php) (duration: 00m 55s)
  • 12:46 kharlan@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: CreateAccountCampaign: Show/hide new HTML based on query param (T295068) (1/2 HomepageHooks.php) (duration: 00m 54s)
  • 12:37 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1116.eqiad.wmnet with OS buster
  • 12:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:30 jynus@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2097.codfw.wmnet with OS buster
  • 12:28 kharlan@deploy1002: Synchronized php-1.38.0-wmf.7/includes/specialpage/LoginSignupSpecialPage.php: Backport: LoginSignup: Add function for overriding benefits container (T295068) (duration: 00m 57s)
  • 12:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:22 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 12:21 moritzm: imported openjdk-8 8u312-b07-1~deb10u1 to component/jdk8 for buster-wikimedia (rebuild of latest Java 8 security release for Buster)
  • 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:15 awight@deploy1002: Synchronized multiversion/buildConfigCache.php: Config: Don't need to keep all config in memory (duration: 00m 55s)
  • 12:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:13 awight@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Config: Avoid error suppression (duration: 00m 55s)
  • 12:10 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2097.codfw.wmnet with OS buster
  • 12:10 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1116.eqiad.wmnet with OS buster
  • 12:08 awight@deploy1002: Synchronized multiversion/buildConfigCache.php: Config: Anchor relative import (duration: 00m 56s)
  • 11:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: working on network tests
  • 11:31 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: working on network tests
  • 11:28 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov1001.eqiad.wmnet with OS buster
  • 11:04 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host dbprov1001.eqiad.wmnet with OS buster
  • 10:56 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov2001.codfw.wmnet with OS buster
  • 10:37 moritzm: updated routinator in thirdparty/routinator for bullseye-wikimedia to 0.10.12 T292503
  • 10:24 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host dbprov2001.codfw.wmnet with OS buster
  • 10:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3065.esams.wmnet with OS buster
  • 10:15 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: working on network tests
  • 10:15 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: working on network tests
  • 10:15 vgutierrez: pool cp3065 running haproxy - T290005
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions from s5 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17725 and previous config saved to /var/cache/conftool/dbconfig/20211111-092528-marostegui.json
  • 09:13 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3065.esams.wmnet with OS buster
  • 09:10 vgutierrez: depool cp3065 to be reimaged as cache::upload_haproxy - T290005
  • 09:03 arturo: pull all packages for buster-wikimedia/thirdparty/kubeadm-k8s-1-21 (T282942)
  • 08:17 marostegui: Upgrade db2078 T288720
  • 08:13 marostegui: Restart db1132 T288720
  • 06:56 elukey: `systemctl start prometheus-mysqld-exporter@analytics_meta` on db1108
  • 06:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1104.eqiad.wmnet with OS buster
  • 06:10 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1104.eqiad.wmnet with OS buster
  • 06:06 marostegui: Stop replication on db1104 (old master) T294321
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (old master) T294321', diff saved to https://phabricator.wikimedia.org/P17723 and previous config saved to /var/cache/conftool/dbconfig/20211111-060242-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1109 to s8 primary and set section read-write T294321', diff saved to https://phabricator.wikimedia.org/P17722 and previous config saved to /var/cache/conftool/dbconfig/20211111-060102-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T294321', diff saved to https://phabricator.wikimedia.org/P17721 and previous config saved to /var/cache/conftool/dbconfig/20211111-060031-marostegui.json
  • 06:00 marostegui: Starting s8 eqiad failover from db1104 to db1109 - T294321
  • 05:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 31 hosts with reason: Primary switchover s8 T294321
  • 05:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 31 hosts with reason: Primary switchover s8 T294321
  • 02:52 eileen: civicrm revision 7e38867f -> 817e514a (latest)
  • 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:18 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Set wgForeignUploadTargets on officewiki T295510 (duration: 00m 56s)

2021-11-10

  • 23:46 ebernhardson: start test backup/restore of 1tb commonswiki from relforge to swift in eqiad
  • 23:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateSpecialPages.php --wiki=foundationwiki --only=DoubleRedirects
  • 23:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateSpecialPages.php --wiki=foundationwiki --only=BrokenRedirects
  • 22:06 bblack: dns2002 - restart ntp.servce to fix drmrs peering
  • 22:01 bblack: dns1002 - restart ntp.servce to fix drmrs peering
  • 21:56 bblack: dns2001 - restart ntp.service to fix drmrs peering
  • 21:53 bblack: dns1001 - restart ntp.service to see if drmrs associations cleared up after dns changes, etc
  • 21:24 bblack: asw1-b1[23]-drmrs: added ipv6 router-advertisement clauses, which work, but probably imperfectly :)
  • 19:52 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6001.wikimedia.org with OS buster
  • 19:51 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6002.wikimedia.org with OS buster
  • 19:51 ottomata: altering {eqiad,codfw}.maps.tiles_change to increase to 6 partitions in kafka main-eqiad, main-codfw and jumbo-eqiad: https://phabricator.wikimedia.org/T293366#7497076
  • 19:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:43 cjming: end of UTC evening backport & config window
  • 19:42 cjming: end of UTC late backport & config window
  • 19:41 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Lower mobile web click tracking rate (T295432) (duration: 00m 55s)
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:35 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Lower mobile web click tracking rate (T295432) (duration: 00m 57s)
  • 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:23 legoktm: uploaded php-pcov_1.0.6-4+wmf1~buster1_amd64.changes to apt.wm.o (T243847)
  • 18:57 mutante: removing mediawiki font packages from parsoid hosts - T294378
  • 18:37 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host dns6002.wikimedia.org with OS buster
  • 18:37 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host dns6001.wikimedia.org with OS buster
  • 18:19 dancy@deploy1002: Finished scap: Config: Get rid of obsolete train-versions.json file (duration: 15m 57s)
  • 18:09 bblack: drmrs - rebooting a bunch of hosts to bios for further settings, please ignore any accidental alerts - they do *look* like they're alert-disabled)
  • 18:08 vgutierrez: restart haproxy on cp4026 and cp5006 to enable hitless reloads - T290005
  • 18:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:03 dancy@deploy1002: Started scap: Config: Get rid of obsolete train-versions.json file
  • 17:10 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns6001.wikimedia.org with OS buster
  • 16:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns6002.wikimedia.org with OS buster
  • 16:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:32 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T295480: Move all cirrussearch traffic to codfw (duration: 00m 55s)
  • 16:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:28 elukey: move atskafka to the new CA bundle - T291905
  • 16:26 elukey: move kafkatee instances (analytics-test,centralog) to the new CA bundle - T291905
  • 16:14 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host dns6002.wikimedia.org with OS buster
  • 16:12 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host dns6001.wikimedia.org with OS buster
  • 15:52 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T295480: Move all cirrussearch traffic to codfw (duration: 00m 56s)
  • 14:09 legoktm: restarted mailman3/mailman3-web to pick up new DNS for m5-master
  • 14:08 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:48 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 13:47 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:46 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 13:36 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:03 Lucas_WMDE: UTC morning backport+config window done
  • 13:01 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable the visual editor on the 2022 namespace on Wikimania wiki (T295267) (duration: 00m 55s)
  • 12:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:53 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update $wgNamespacesToBeSearchedDefault for Wikimania 2022 (T295267) (duration: 00m 55s)
  • 12:46 XioNoX: delete route6 object for 2a02:ec80::/32 (split in two /48s)
  • 12:46 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bea7fa6] (eqiad): Update kartotherian-package to 006c027 (duration: 01m 20s)
  • 12:45 XioNoX: delete ROA for 2a02:ec80::/32
  • 12:45 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bea7fa6] (eqiad): Update kartotherian-package to 006c027
  • 12:43 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bea7fa6] (codfw): Update kartotherian-package to 006c027 (duration: 01m 31s)
  • 12:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bea7fa6] (codfw): Update kartotherian-package to 006c027
  • 12:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:38 mbsantos@deploy1002: Finished deploy [tilerator/deploy@ba00d7a] (eqiad): Update tilerator-package to 1221976 (duration: 01m 15s)
  • 12:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:36 mbsantos@deploy1002: Started deploy [tilerator/deploy@ba00d7a] (eqiad): Update tilerator-package to 1221976
  • 12:36 mbsantos@deploy1002: Finished deploy [tilerator/deploy@ba00d7a] (codfw): Update tilerator-package to 1221976 (duration: 02m 06s)
  • 12:34 mbsantos@deploy1002: Started deploy [tilerator/deploy@ba00d7a] (codfw): Update tilerator-package to 1221976
  • 12:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Remove tmpUseRequestLanguagesForRdfOutput Wikibase setting (T285795) (2/2) (duration: 00m 56s)
  • 12:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove tmpUseRequestLanguagesForRdfOutput Wikibase setting (T285795) (1/2) (duration: 00m 56s)
  • 12:30 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:25 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php wikimaniawiki --fix # T295267 (0 to fix, 0 resolvable, 0 deleted, looks good)
  • 12:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:20 urbanecm: Connect `Jbuatti (WMF)@foundationwiki` to SUL
  • 12:19 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: create 2022 namespace for wikimaniawiki (T295267) (duration: 00m 56s)
  • 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:07 urbanecm: wikiadmin@10.64.48.109(centralauth)> delete from globalnames where gn_name='DJemielniak (WMF)'; # to let OIT create that account globally, SULification of foundationwiki, T205347
  • 12:07 urbanecm: wikiadmin@10.64.48.109(centralauth)> delete from localnames where ln_name='DJemielniak (WMF)' and ln_wiki='foundationwiki'; # to let OIT create that account globally, SULification of foundationwiki, T205347
  • 12:07 urbanecm: wikiadmin@10.64.48.109(centralauth)> delete from localnames where ln_wiki='foundationwiki' and ln_name='AAnctil (WMF)'; # to let OIT create that account globally, SULification of foundationwiki, T205347
  • 12:06 urbanecm: wikiadmin@10.64.48.109(centralauth)> select * from localnames where ln_name='AAnctil (WMF)'; # to let OIT create that account globally, SULification of foundationwiki, T205347
  • 12:06 urbanecm: wikiadmin@10.64.48.109(centralauth)> delete from globalnames where gn_name='AAnctil (WMF)'; # to let OIT create that account globally, SULification of foundationwiki, T205347
  • 09:38 marostegui: Upgrade db1124, db1125, db1133 and pc2014 to mariadb 10.4.22
  • 09:22 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6004.drmrs.wmnet with OS buster
  • 08:43 volans@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6004.drmrs.wmnet with OS buster
  • 08:39 volans@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 08:22 volans@cumin1001: START - Cookbook sre.hosts.dhcp for host ganeti6004.drmrs.wmnet
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1109 with weight 0 T294321', diff saved to https://phabricator.wikimedia.org/P17715 and previous config saved to /var/cache/conftool/dbconfig/20211110-064120-root.json
  • 04:15 tgr: T283606: running foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --search-index
  • 01:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:54 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Scale down the foundation wiki logo (T295303) (duration: 00m 56s)
  • 00:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:48 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add mobile logo and wordmark for metawiki (T295303) (duration: 00m 55s)
  • 00:47 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/: Config: Add mobile logo and wordmark for metawiki (T295303) (duration: 00m 56s)
  • 00:42 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add mobile wordmark for foundation-wiki (T295303) (duration: 00m 55s)
  • 00:41 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikimedia-wordmark.svg: Config: Add mobile wordmark for foundation-wiki (T295303) (duration: 00m 56s)
  • 00:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:29 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add enwikibooks in wgImportSources to bnwikibooks (T295051) (duration: 00m 56s)
  • 00:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-11-09

  • 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable DPL on Wikinews where not in use (T287916) (duration: 00m 57s)
  • 19:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:50 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable DPL on Wikibooks where not in use (T287916) (duration: 00m 56s)
  • 19:11 Reedy: echo "https://wikipedia.org/.well-known/assetlinks.json" | mwscript purgeList.php enwiki
  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:45 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:55 mutante: re-enabled puppet on mw* after deploying and testing gerrit:736595 on canary
  • 17:37 mmandere@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns6001.wikimedia.org with OS buster
  • 17:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:08 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host dns6001.wikimedia.org with OS buster
  • 16:55 mmandere@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti6004.drmrs.wmnet with OS buster
  • 16:50 mutante: snapshot* - disabling puppet - converting some crons
  • 16:41 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6004.drmrs.wmnet with OS buster
  • 16:38 mmandere@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti6004.drmrs.wmnet with OS buster
  • 16:16 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 16:12 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 16:07 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6004.drmrs.wmnet with OS buster
  • 16:07 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:49 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti6004.drmrs.wmnet with OS buster
  • 15:08 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6004.drmrs.wmnet with OS buster
  • 14:52 bblack: rebooting ganeti6003
  • 14:21 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 14:19 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6003.drmrs.wmnet with OS buster
  • 14:11 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 14:08 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 13:51 vgutierrez: pool cp5006 (upload) running haproxy-tls - T290005
  • 13:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5006.eqsin.wmnet with OS buster
  • 13:47 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 13:15 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6003.drmrs.wmnet with OS buster
  • 13:09 mmandere@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti6003.drmrs.wmnet with OS buster
  • 13:02 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6003.drmrs.wmnet with OS buster
  • 12:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:22 Lucas_WMDE: UTC morning backport+config window done
  • 12:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Remove unused `global` statement (duration: 00m 55s)
  • 12:18 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 12:12 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti6003.drmrs.wmnet with OS buster
  • 12:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add language codes agq and mcn to wmgExtraLanguageNames (T288335, T293884) (duration: 00m 56s)
  • 12:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:57 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 11:48 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.2.9 - volans@cumin2002
  • 11:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5006.eqsin.wmnet with OS buster
  • 11:47 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.2.9 - volans@cumin2002
  • 11:45 vgutierrez: depool cp5006 to be reimaged as cache::upload_haproxy - T290005
  • 11:40 volans@deploy1002: Finished deploy [homer/deploy@c570af3]: Homer release v0.2.9 (duration: 01m 29s)
  • 11:39 volans@deploy1002: Started deploy [homer/deploy@c570af3]: Homer release v0.2.9
  • 11:32 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6003.drmrs.wmnet with OS buster
  • 10:22 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6002.drmrs.wmnet with OS buster
  • 09:31 vgutierrez: pool cp4026 - T290005
  • 09:03 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6002.drmrs.wmnet with OS buster
  • 08:43 elukey: drop istio 1.6.* and kubeflow-kfserving-build images from the docker registry
  • 07:23 elukey: `apt-get clean` on stat1006 to free some space (root partition full)
  • 02:43 ejegg: updated fundraising CiviCRM from ac6f333d -> 7e38867f
  • 02:38 ejegg: updated payments-wiki 73de4731 -> 49ad5962
  • 02:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-11-08

  • 23:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c09793f: kswiki: Adding wordmark and tagline to IS.php (T294093) (duration: 00m 55s)
  • 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:05 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: 5f7864f: 54e7f74: kswiki: Adding wordmark and tagline files (T294093) (duration: 00m 54s)
  • 20:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e66bd53: Enable TheWikipediaLibrary on meta & testwiki (T288070) (duration: 00m 55s)
  • 19:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:52 ottomata: an-coord1002: drop user 'admin'@'localhost'; start slave; to fix broken replication - T284150
  • 19:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1ca184b: Add a new "all assessments" option to MediaSearch assessments dropdown (T285349) (duration: 00m 55s)
  • 19:46 sukhe: upload pdns-recursor 4.5.7-1wm1 to apt.wm.o (buster)
  • 19:42 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.7/skins/MinervaNeue/resources/: 8375e38: Instrument mobile talk page clicks (T294738) (duration: 00m 54s)
  • 19:41 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.7/skins/MinervaNeue/includes/Skins/SkinMinerva.php: 8375e38: Instrument mobile talk page clicks (T294738) (duration: 00m 54s)
  • 19:39 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/WikidataPageBanner/includes/WikidataPageBanner.php: 2c74457: WikidataPageBanner should disable table of contents using public functions (T295003) (duration: 00m 55s)
  • 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:31 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/VisualEditor/modules/ve-mw/preinit/ve.init.mw.ArticleTargetSaver.js: 9d7cde4: ArticleTargetSaver: ve.init may be undefined (T294981) (duration: 00m 55s)
  • 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bf70a8b: Make reply tool available as opt-out on dewiki (T294591) (duration: 00m 56s)
  • 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:51 vgutierrez: depool cp4026 - T290005
  • 17:39 vgutierrez: pool cp4026 - T290005
  • 17:31 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 17:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:34 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 16:33 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 16:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:18 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:13 vgutierrez: depool cp4026 - T290005
  • 16:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 16:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:06 vgutierrez: pool cp4026 using haproxy as the TLS termination layer - T290005
  • 16:00 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:00 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:51 XioNoX: remove ROA for 185.15.58.0/23
  • 15:50 XioNoX: create RIPE RPKI ROA for 2a02:ec80:600::/48 and 2a02:ec80:500::/48
  • 15:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:18 bblack: asw1-b13-drmrs: "delete forwarding-options dhcp-relay forward-only" to fix dhcp+installer issues in this rack.
  • 15:12 ema: A:cp re-enable puppet after testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/737385 on cp4021 T293879
  • 15:02 ema: merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/737385 with puppet disabled on A:cp T293879
  • 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 13:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 13:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4026.ulsfo.wmnet with OS buster
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 13:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 13:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:43 Lucas_WMDE: UTC morning backport+config window done
  • 12:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4026.ulsfo.wmnet with OS buster
  • 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:19 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update autonyms in wmgExtraLanguageNames (T284870) (duration: 00m 56s)
  • 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust weights for s5 codfw replicas after removing special groups from them T263127', diff saved to https://phabricator.wikimedia.org/P17708 and previous config saved to /var/cache/conftool/dbconfig/20211108-120203-marostegui.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions logpager recentchanges recentchangeslinked watchlist from s5 codfw T263127', diff saved to https://phabricator.wikimedia.org/P17707 and previous config saved to /var/cache/conftool/dbconfig/20211108-115945-marostegui.json
  • 11:41 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti6002.drmrs.wmnet with OS buster
  • 11:32 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4026.ulsfo.wmnet with OS buster
  • 11:01 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6002.drmrs.wmnet with OS buster
  • 10:53 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:53 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:49 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:49 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4026.ulsfo.wmnet with OS buster
  • 10:27 vgutierrez: depool cp4026 to be reimaged as a haproxy-tls test node - T290005
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 10:17 Lucas_WMDE: Deployed patch for T294693
  • 09:47 XioNoX: all core routers: add drmrs to prefix lists + confed
  • 09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 09:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:22 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:22 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 09:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 08:51 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:51 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:24 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 05:53 rzl: rebooted wikitech-static via rackspace web UI - T295266

2021-11-06

2021-11-05

  • 23:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:32 dduvall: re-rolling 1.38.0-wmf.7 to all wikis due to a better of two evil regressions UBN T295187 (refs T293948)
  • 22:32 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.7 refs T293948
  • 22:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:21 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0/group1 to 1.38.0-wmf.7 refs T293948"
  • 22:19 dduvall: rolling back 1.38.0-wmf.7 from group1 and group0 due to UBN T295187 (refs T293948)
  • 20:17 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert "all wikis to 1.38.0-wmf.7 refs T293948"
  • 20:09 dduvall: rolling back 1.38.0-wmf.7 from all wikis due to UBN T295187 (refs T293948)
  • 18:41 mutante: removing mediawiki font packages from labweb* (wikitech wiki)
  • 18:35 XioNoX: cr2-codfw> request chassis fpc online slot 0 - T294789
  • 18:20 legoktm: upgrading scap to 4.0.3 everywhere (T294966)
  • 18:01 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6001.drmrs.wmnet with OS buster
  • 17:22 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6001.drmrs.wmnet with OS buster
  • 16:52 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti6001.drmrs.wmnet with OS buster
  • 16:30 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6001.drmrs.wmnet with OS buster
  • 16:21 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 16:01 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 15:38 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:38 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:30 jayme: published docker-registry.discovery.wmnet/golang1.17:1.17-1
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
  • 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
  • 12:50 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti6001.drmrs.wmnet with OS buster
  • 12:22 moritzm: renamed Ganeti group of test cluster from "default" to "row_A" (following conventions in main DCs) T286206
  • 12:10 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6001.drmrs.wmnet with OS buster
  • 12:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
  • 11:40 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
  • 11:09 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti6001.drmrs.wmnet with OS buster
  • 10:29 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6001.drmrs.wmnet with OS buster
  • 09:53 ema: cp[4033-4036]: upgrade varnish to 6.0.8-1wm2 T295120
  • 09:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2002.codfw.wmnet
  • 09:39 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti6001.drmrs.wmnet with OS buster
  • 09:27 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
  • 09:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
  • 09:19 Amir1: Upgrade db1151 T295026
  • 09:09 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
  • 09:01 ema: apt.wm.org: remove varnish 6.0.8-1wm1 from component main of buster-wikimedia, we use component/varnish6 instead
  • 08:59 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6001.drmrs.wmnet with OS buster
  • 08:52 moritzm: installing set kvm::machine_version for ganeti-test cluster to pc-i440fx-2.8 T286206
  • 08:46 Amir1: Upgrade db2142 T295026
  • 08:43 moritzm: installing reportbug bugfix updates from Bullseye 11.1 point release
  • 08:41 moritzm: installing tmux bugfix updates from Bullseye 11.1 point release
  • 08:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: Upgrade x2 masters T295026
  • 08:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: Upgrade x2 masters T295026
  • 08:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Upgrade x2 masters T295026
  • 08:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Upgrade x2 masters T295026
  • 07:44 XioNoX: restart scs-a8-eqiad
  • 05:31 marostegui: Upgrade clouddb1016
  • 05:31 marostegui: Upgrade clouddb1020
  • 00:16 mutante: phab1001 - sudo systemctl start phabricator_clean_tmp_files.service because Icinga alerted it had failed... worked fine
  • 00:06 mutante: https://labtestwikitech.wikimedia.org - purging mediawiki font packages from backend server
  • 00:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-11-04

  • 23:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:51 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: XWD timeout testing T293568 (duration: 00m 54s)
  • 23:49 tstarling@deploy1002: Synchronized src/XWikimediaDebug.php: XWD timeout testing (duration: 00m 54s)
  • 23:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:44 cjming: end of UTC late backport & config window
  • 23:44 cjming@deploy1002: Synchronized wmf-config: Config: Disable upcoming DiscussionTools mobile interface, enable on beta (T270536) (duration: 00m 55s)
  • 23:38 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix value of wgDTSchemaEditAttemptStepSamplingRate (T295052) (duration: 00m 55s)
  • 23:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:22 cjming@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/RelatedArticles: Backport: Fix loading of related articles via IntersectionObserver (T223844) (duration: 00m 55s)
  • 23:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:19 mutante: wtp1025, wtp1026, parse2001, parse2002 (parsoid-canary): purging mediawiki font packages (T294378)
  • 23:16 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Allow bureaucrats to grant and revoke the importer rights to enwikiversity (T294930) (duration: 00m 56s)
  • 23:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:26 bblack: cpNNNN: manual (cumin) removal of outdated digicert-2020 ocsp configuration and output files, to avoid icinga alerts and clean up
  • 20:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1153.eqiad.wmnet with reason: Maintenance T295026
  • 20:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1153.eqiad.wmnet with reason: Maintenance T295026
  • 19:29 dduvall: 1.38.0-wmf.7 on all wikis. no new errors or increase in error rates (refs T293948)
  • 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:16 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.7 refs T293948
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1153 (re)pooling @ 100%: After upgrade T295026', diff saved to https://phabricator.wikimedia.org/P17703 and previous config saved to /var/cache/conftool/dbconfig/20211104-182655-root.json
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1153 (re)pooling @ 50%: After upgrade T295026', diff saved to https://phabricator.wikimedia.org/P17701 and previous config saved to /var/cache/conftool/dbconfig/20211104-181151-root.json
  • 18:11 legoktm: upgrading to scap 4.0.3 on canaries again (T294966)
  • 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:08 legoktm: uploaded scap 4.0.3-2 to apt.wm.o for buster/stretch (T294966)
  • 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:06 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 03s)
  • 18:05 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 04s)
  • 17:58 Amir1: Upgrade db1153 T295026
  • 17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1153.eqiad.wmnet with reason: Maintenance T295026
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1153.eqiad.wmnet with reason: Maintenance T295026
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1153 for mysql upgrade T295026', diff saved to https://phabricator.wikimedia.org/P17700 and previous config saved to /var/cache/conftool/dbconfig/20211104-175606-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1152 (re)pooling @ 100%: After upgrade T295026', diff saved to https://phabricator.wikimedia.org/P17699 and previous config saved to /var/cache/conftool/dbconfig/20211104-175429-root.json
  • 17:50 volans: restarted puppetdb.service on puppetdb2002
  • 17:47 ryankemper: T288620 [Elastic] Rebooting `elastic1049.eqiad.wmnet` to uptake new gelf settings change
  • 17:46 hnowlan: enabling puppet on C:cassandra after profile::java transition
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1152 (re)pooling @ 50%: After upgrade T295026', diff saved to https://phabricator.wikimedia.org/P17698 and previous config saved to /var/cache/conftool/dbconfig/20211104-173926-root.json
  • 17:33 Amir1: Upgrade db1152 T295026
  • 17:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1152.eqiad.wmnet with reason: Maintenance T295026
  • 17:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1152.eqiad.wmnet with reason: Maintenance T295026
  • 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1152 for mysql upgrade T295026', diff saved to https://phabricator.wikimedia.org/P17697 and previous config saved to /var/cache/conftool/dbconfig/20211104-172950-ladsgroup.json
  • 17:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:24 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 17:23 ryankemper: T294961 [WCQS] Installed kernel version `Linux 5.10.0-0.bpo.9-amd64` on all wcqs* hosts
  • 16:48 ryankemper: T294961 [WCQS] Power cycled all 6 wcqs* hosts via the mgmt console (`racadm serveraction powercycle`)
  • 16:42 mutante: scandium (parsoid::testing) - purging MW font packages
  • 16:08 ppchelko@deploy1002: Finished deploy [restbase/deploy@0848b15]: Add new wikis T292422 T294587 T294588 (duration: 16m 06s)
  • 16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2143 (re)pooling @ 100%: After upgrade T295026', diff saved to https://phabricator.wikimedia.org/P17696 and previous config saved to /var/cache/conftool/dbconfig/20211104-160047-root.json
  • 15:52 ppchelko@deploy1002: Started deploy [restbase/deploy@0848b15]: Add new wikis T292422 T294587 T294588
  • 15:50 jbond: disable puppet fleet wide to deploy a puppet change
  • 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2143 (re)pooling @ 50%: After upgrade T295026', diff saved to https://phabricator.wikimedia.org/P17695 and previous config saved to /var/cache/conftool/dbconfig/20211104-154543-root.json
  • 15:37 Amir1: Upgrade db2143 T295026
  • 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2143.codfw.wmnet with reason: Maintenance T295026
  • 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2143.codfw.wmnet with reason: Maintenance T295026
  • 15:30 XioNoX: drain codfw-ulsfo link
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2143 for mysql upgrade T295026', diff saved to https://phabricator.wikimedia.org/P17694 and previous config saved to /var/cache/conftool/dbconfig/20211104-152919-ladsgroup.json
  • 15:26 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti-test2003.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
  • 15:26 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2003.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 15:04 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:03 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:50 XioNoX: disable cr1-codfw:et-0/0/0
  • 14:49 hashar: Upgrading CI Jenkins
  • 14:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 14:44 moritzm: imported jenkins 2.303.3 to thirdparty/ci for buster-wikimedia T294838
  • 14:40 hnowlan: disabling puppet on C:cassandra in advance of merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/631789
  • 14:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ganeti-test01.svc.codfw.wmnet on all recursors
  • 14:36 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ganeti-test01.svc.codfw.wmnet on all recursors
  • 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) codfw on all recursors
  • 14:36 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache codfw on all recursors
  • 14:32 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 14:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:25 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 1e5b250: Add Image: Do not use proxy in Beta (T294987) (duration: 01m 05s)
  • 14:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 14:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 14:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:58 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 13:54 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:52 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 13:52 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:47 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 13:46 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 13:46 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 13:44 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 13:43 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
  • 13:41 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
  • 13:40 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
  • 13:40 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
  • 13:39 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 13:38 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 13:37 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 13:36 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 13:35 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 13:33 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 13:29 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 13:28 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 13:26 vgutierrez: update eqiad & esams cp nodes to ATS 8.0.8-1wm5 - T294897
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2144 (re)pooling @ 100%: After upgrade T295026', diff saved to https://phabricator.wikimedia.org/P17691 and previous config saved to /var/cache/conftool/dbconfig/20211104-131916-root.json
  • 13:17 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 13:16 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 13:15 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 13:14 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
  • 13:14 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:12 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:11 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:10 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1124.eqiad.wmnet with reason: Testing with the test host
  • 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1124.eqiad.wmnet with reason: Testing with the test host
  • 13:09 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:09 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:08 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
  • 13:06 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 13:05 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:04 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2144 (re)pooling @ 50%: After upgrade T295026', diff saved to https://phabricator.wikimedia.org/P17690 and previous config saved to /var/cache/conftool/dbconfig/20211104-130412-root.json
  • 13:03 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 13:03 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 13:02 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:01 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
  • 12:44 Amir1: Upgrade db2144 (kernel and mariadb) T295026
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2144 for mysql upgrade T295026', diff saved to https://phabricator.wikimedia.org/P17689 and previous config saved to /var/cache/conftool/dbconfig/20211104-122504-ladsgroup.json
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:53 mmandere: pool cp4036.ulsfo.wmnet - T290694
  • 11:28 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4036.ulsfo.wmnet with OS buster
  • 11:24 sukhe: update dnsdist on O:wikidough
  • 11:01 sukhe: upload dnsdist 1.6.1-1wm1 to apt.wm.o (buster) - T273679
  • 10:28 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4036.ulsfo.wmnet with OS buster
  • 10:27 mmandere: depool cp4036.ulsfo.wmnet - T290694
  • 10:21 mmandere: pool cp4034.ulsfo.wmnet - T290694
  • 10:01 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4034.ulsfo.wmnet with OS buster
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17688 and previous config saved to /var/cache/conftool/dbconfig/20211104-093247-root.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17687 and previous config saved to /var/cache/conftool/dbconfig/20211104-091744-root.json
  • 09:12 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 09:09 mmandere: depool cp4034.ulsfo.wmnet - T290694
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17686 and previous config saved to /var/cache/conftool/dbconfig/20211104-090240-root.json
  • 08:56 dcausse: restarting blazegraph on wdqs1012 (stuck for the past 6 hours)
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17685 and previous config saved to /var/cache/conftool/dbconfig/20211104-084736-root.json
  • 08:37 _joe_: ipvsadm -Dt 10.2.2.67:443 on lvs101{5,6}
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17684 and previous config saved to /var/cache/conftool/dbconfig/20211104-083233-root.json
  • 08:29 _joe_: restarting pybal on low-traffic nodes in eqiad and codfw
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17683 and previous config saved to /var/cache/conftool/dbconfig/20211104-081729-root.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1163', diff saved to https://phabricator.wikimedia.org/P17682 and previous config saved to /var/cache/conftool/dbconfig/20211104-081726-marostegui.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17681 and previous config saved to /var/cache/conftool/dbconfig/20211104-074346-root.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for the old special replicas T263127', diff saved to https://phabricator.wikimedia.org/P17679 and previous config saved to /var/cache/conftool/dbconfig/20211104-055419-marostegui.json
  • 00:26 tgr: UTC late deploys done
  • 00:25 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add Wikivoyage in wgImportSources to enwikiversity (T294928) (duration: 01m 05s)
  • 00:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:09 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable GrowthExperiments image recommendations on ar,bn,cs,vi (T294878) (duration: 01m 03s)
  • 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:01 tgr@deploy1002: Synchronized php-1.38.0-wmf.6/extensions/GrowthExperiments: Backport: Add Image: add HTTP proxy config (T290949) Add Image: Harden API response parsing (duration: 01m 05s)

2021-11-03

  • 23:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:22 legoktm: reverted canaries back to scap 4.0.2
  • 23:20 legoktm: uploaded scap 4.0.3-1+really4.0.2 to apt.wm.o for buster/stretch
  • 23:02 legoktm@deploy1002: Finished deploy [restbase/deploy@664a2f8]: (no justification provided) (duration: 00m 50s)
  • 23:01 legoktm@deploy1002: Started deploy [restbase/deploy@664a2f8]: (no justification provided)
  • 22:48 ppchelko@deploy1002: Finished deploy [restbase/deploy@664a2f8]: Add new wikis T292422 T294587 T294588 (duration: 00m 10s)
  • 22:48 ppchelko@deploy1002: Started deploy [restbase/deploy@664a2f8]: Add new wikis T292422 T294587 T294588
  • 22:47 legoktm: upgraded scap on A:restbase (T294936)
  • 22:38 legoktm: upgrading scap on canaries (T294966)
  • 22:34 legoktm: upgraded apache2 on lists1001
  • 22:32 legoktm: uploaded scap 4.0.3 to apt.wm.o for buster and stretch (T294966)
  • 22:24 twentyafterfour: restarted php7.3-fpm on phab1001
  • 22:24 twentyafterfour: restarting phabricator to apply updates.
  • 22:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wcqs2002.codfw.wmnet
  • 22:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wcqs2001.codfw.wmnet
  • 21:56 ryankemper: T294961 [WCQS] Forcing recheck of `PyBal IPVS diff check` and `PyBal backends health check`
  • 21:53 ryankemper: T294961 [WCQS] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/736564 and successfully ran `ryankemper@cumin1001:~$ sudo cumin 'A:icinga or A:dns-auth' run-puppet-agent`
  • 21:47 ryankemper: T294961 [WCQS] DNS changes rolled out, proceeding to the `lvs_setup` step: https://gerrit.wikimedia.org/r/c/operations/puppet/+/736564
  • 21:45 ryankemper: T294961 [WCQS] Merged https://gerrit.wikimedia.org/r/c/operations/dns/+/736585, running `ryankemper@authdns1001:~$ sudo -i authdns-update`
  • 21:38 legoktm: upgrading/restarting apache2 on A:all-mw-eqiad
  • 21:26 legoktm: upgrading/restarting apache2 on A:all-mw-codfw
  • 21:12 legoktm: upgrading PHP 7.2 on labweb, deployment-servers
  • 21:00 legoktm: upgrading PHP 7.2 on A:snapshot
  • 20:55 legoktm: upgrading PHP 7.2 on A:parsoid
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:04 eileen: civicrm revision changed from 93caef68ef to ac6f333db6, config revision is d3bb9999e7
  • 20:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:52 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.7 refs T293948 (duration: 01m 03s)
  • 19:51 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.7 refs T293948
  • 19:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wcqs2003.codfw.wmnet
  • 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:35 mutante: depooled wcqs2003 (pooled=inactive) because Icinga alerts that servers are down but pooled. not in production yet but issues (T294961)
  • 19:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=wcqs2003.codfw.wmnet
  • 19:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wcqs2003.codfw.wmnet
  • 19:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:26 mmandere: pool cp4035.ulsfo.wmnet - T290694
  • 19:19 dduvall: 1.38.0-wmf.7 now on group0. no new errors. leaving ~ 30 minutes before promoting group1 (T293948)
  • 19:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:15 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.7 refs T293948
  • 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:10 tgr: UTC evening deploys done
  • 19:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:59 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
  • 18:55 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
  • 18:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4035.ulsfo.wmnet with OS buster
  • 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:40 legoktm: re-enabling puppet on lists1001
  • 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:34 urbanecm: Purge https://en.wikipedia.org/.well-known/assetlinks.json, https://www.wikipedia.org/.well-known/assetlinks.json and https://wikipedia.org/.well-known/assetlinks.json (T294776)
  • 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:24 volans: rebooting ganeti-test2002 with fixed /etc/network/interfaces
  • 18:22 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 18:22 urbanecm@deploy1002: Synchronized docroot/wikipedia.org/: 2331d06: Add Android site association file (T294776) (duration: 01m 02s)
  • 18:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:18 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 18:17 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 18:15 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 18:15 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Clean up temporary variable wgMathUseRestBase (T274436) (duration: 01m 02s)
  • 18:15 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 18:15 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 18:13 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 18:12 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
  • 18:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:09 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Clean up temporary variable wgMathUseRestBase (T274436) (duration: 01m 03s)
  • 18:09 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
  • 18:08 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
  • 18:08 Amir1: ran set session sql_log_bin=0; RENAME TABLE wb_changes_dispatch TO T294121_DROP_wb_changes_dispatch; on db1111 (T294121)
  • 18:07 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
  • 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:06 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Remove hook set for incident reponse in 2020 (duration: 01m 03s)
  • 18:04 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 18:03 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:02 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 17:50 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4035.ulsfo.wmnet with OS buster
  • 17:49 vgutierrez: update codfw cp instances to ATS 8.0.8-1wm5 - T294897
  • 17:48 mmandere: depool cp4035.ulsfo.wmnet - T290694
  • 17:47 topranks: adding BGP peering session to "Liquid Telecommunications" AS30844 on cr2-esams (AMS-IX)
  • 17:46 legoktm: upgrading PHP 7.2 on A:all-mw-eqiad
  • 17:33 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 17:32 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:31 topranks: adding BGP peering session to "P Foundation" / AS399728 on cr2-eqiad [Equinix Ashburn IXP]
  • 17:30 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:24 legoktm: upgrading PHP 7.2 on A:all-mw-codfw
  • 17:06 mmandere: pool cp4033.ulsfo.wmnet - T290694
  • 17:05 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 17:02 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 17:01 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 16:59 razzi@deploy1002: Finished deploy [analytics/superset/deploy@5b8de4c]: Upgrade superset to 1.3.1 (duration: 00m 31s)
  • 16:58 razzi@deploy1002: Started deploy [analytics/superset/deploy@5b8de4c]: Upgrade superset to 1.3.1
  • 16:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 16:52 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4033.ulsfo.wmnet with OS buster
  • 16:31 hnowlan: installing wikidiff2-1.13.0-1 to A:mw-jobrunner
  • 16:27 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 16:23 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 16:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:17 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 16:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:04 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4033.ulsfo.wmnet with OS buster
  • 15:59 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 15:58 mmandere: depool cp4033.ulsfo.wmnet - T290694
  • 15:57 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 15:51 hnowlan: rolling restart-php7.2-fpm on A:mw-api-codfw to pick up wikidiff2 upgrade
  • 15:47 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
  • 15:22 ppchelko@deploy1002: Finished deploy [restbase/deploy@664a2f8]: Add new wikis T292422 T294587 T294588 (duration: 00m 36s)
  • 15:22 ppchelko@deploy1002: Started deploy [restbase/deploy@664a2f8]: Add new wikis T292422 T294587 T294588
  • 15:21 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:21 ppchelko@deploy1002: Started deploy [restbase/deploy@664a2f8]: Add new wikis T292422 T294587 T294588
  • 15:21 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:11 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 15:10 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 15:09 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:08 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 15:06 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 15:06 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 14:54 elukey@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:40 moritzm: installing elfutils security updates on stretch
  • 14:37 elukey@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:37 elukey@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:33 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
  • 14:32 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 14:31 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 14:31 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:30 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:21 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:20 hnowlan: rolling restart-php7.2-fpm on A:mw-eqiad and A:mw-api-eqiad
  • 14:17 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
  • 14:16 hnowlan: deploying wikidiff2-1.13.0-1 to A:mw-eqiad and A:mw-api-eqiad
  • 14:13 moritzm: installing remaining tiff security updates for buster
  • 14:10 moritzm: initialising ganeti-test01.svc.codfw.wmnet cluster on ganeti-test2001 T286206
  • 14:07 XioNoX: move cr2-codfw access switches link to working linecard - T289241
  • 14:04 vgutierrez: update eqsin and ulsfo cp instances to ATS 8.0.8-1wm5 - T294897
  • 13:38 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 13:34 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp403[3456].*,service=ats-be
  • 13:34 bblack: cp403[3456] - depool ats-be service (upcoming re-reimage)
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:21 vgutierrez: update trafficserver on cp4027 to 8.0.8-1wm5 - T294897
  • 12:20 vgutierrez: update trafficserver on cp4021 to 8.0.8-1wm5 - T294897
  • 12:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:18 vgutierrez: upload trafficserver 8.0.8-1wm5 to apt.wm.org (buster) - T294897
  • 12:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9ca753b: Revert "Adjust AF config for ukwiki" (T272330) (duration: 01m 03s)
  • 12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 667ef0b: foundationwiki: Increase AF throttle requirements (duration: 01m 13s)
  • 11:58 hnowlan: rolling restart-php7.2-fpm on A:mw-codfw and A:mw-api-codfw
  • 11:56 hnowlan: deploying wikidiff2-1.13.0-1 to A:mw-codfw and A:mw-api-codfw
  • 11:37 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https
  • 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7fdf3f5: Wikisource: allow copy-uploads from Commons (T294824) (duration: 01m 04s)
  • 11:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:23 XioNoX: re-enable eqiad Equinix IXP peerings - T290877
  • 08:55 XioNoX: Disable eqiad Equinix IXP peerings - T290877
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager replicas from s6 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17660 and previous config saved to /var/cache/conftool/dbconfig/20211103-075801-marostegui.json
  • 07:58 elukey@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 07:57 elukey@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 07:50 marostegui: Drop oauth2_access_tokens oauth_accepted_consumer oauth_registered_consumer from foundationwiki T294595
  • 06:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1163.eqiad.wmnet with OS buster
  • 06:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 06:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 06:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 34888b0: Growth IP research survey: Fix coverage (T294568) (duration: 01m 04s)
  • 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1163.eqiad.wmnet with OS buster
  • 06:10 marostegui: Stop replication on db1163 T290865
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1163 until it's reimaged to buster T293964', diff saved to https://phabricator.wikimedia.org/P17659 and previous config saved to /var/cache/conftool/dbconfig/20211103-060644-root.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1118 to s1 primary and set section read-write T293964', diff saved to https://phabricator.wikimedia.org/P17658 and previous config saved to /var/cache/conftool/dbconfig/20211103-060201-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T293964', diff saved to https://phabricator.wikimedia.org/P17657 and previous config saved to /var/cache/conftool/dbconfig/20211103-060114-root.json
  • 06:00 marostegui: Starting s1 eqiad failover from db1163 to db1118 - T293964
  • 05:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s1 T293964
  • 05:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s1 T293964
  • 02:22 milimetric@deploy1002: Finished deploy [analytics/refinery@cf6095c] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@cf6095c] (duration: 05m 36s)
  • 02:16 milimetric@deploy1002: Started deploy [analytics/refinery@cf6095c] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@cf6095c]
  • 02:16 milimetric@deploy1002: Finished deploy [analytics/refinery@cf6095c] (thin): Regular analytics weekly train THIN [analytics/refinery@cf6095c] (duration: 00m 07s)
  • 02:16 milimetric@deploy1002: Started deploy [analytics/refinery@cf6095c] (thin): Regular analytics weekly train THIN [analytics/refinery@cf6095c]
  • 02:15 milimetric@deploy1002: Finished deploy [analytics/refinery@cf6095c]: Regular analytics weekly train [analytics/refinery@cf6095c] (duration: 22m 30s)
  • 01:53 milimetric@deploy1002: Started deploy [analytics/refinery@cf6095c]: Regular analytics weekly train [analytics/refinery@cf6095c]

2021-11-02

  • 23:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:46 tgr: UTC late deploys done
  • 23:45 tgr@deploy1002: Synchronized wmf-config: Config: Use page id for GrowthExperiments image recommendations, except for testwiki (736314 736317 (T290949 T292154) (duration: 01m 03s)
  • 23:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:34 tgr@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Use url-downloader proxy for GrowthExperiments (T290949) (duration: 01m 14s)
  • 23:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:14 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-db1002.eqiad.wmnet with OS buster
  • 21:50 robh@cumin1001: START - Cookbook sre.hosts.reimage for host an-db1002.eqiad.wmnet with OS buster
  • 21:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-db1002.eqiad.wmnet with OS buster
  • 21:03 robh@cumin1001: START - Cookbook sre.hosts.reimage for host an-db1002.eqiad.wmnet with OS buster
  • 20:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-db1001.eqiad.wmnet with OS buster
  • 20:28 robh@cumin1001: START - Cookbook sre.hosts.reimage for host an-db1001.eqiad.wmnet with OS buster
  • 20:01 thcipriani: 1.38.0-wmf.7 on testwikis, leaving it there for today for US holiday (T293948)
  • 19:58 thcipriani@deploy1002: Pruned MediaWiki: 1.38.0-wmf.5 (duration: 04m 08s)
  • 19:53 thcipriani@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.7 refs T293948 (duration: 50m 13s)
  • 19:50 moritzm: imported ganeti 2.16.0-1~bpo9+1+wmf1to component/ganeti216 for stretch-wikimedia (with additional cherrypicked patches for compat with KVM 3.1) T284811
  • 19:47 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:39 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:35 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-db1002.eqiad.wmnet
  • 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:08 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-db1001.eqiad.wmnet with OS buster
  • 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:02 thcipriani@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.7 refs T293948
  • 18:46 thcipriani: starting to stage train for 1.38.0-wmf.7 (T293948)
  • 18:33 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-db1002.eqiad.wmnet
  • 18:32 robh@cumin1001: START - Cookbook sre.hosts.reimage for host an-db1001.eqiad.wmnet with OS buster
  • 18:23 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:18 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-db1001.eqiad.wmnet
  • 18:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:59 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.6/extensions/DiscussionTools/modules/dt-ve/dt.ui.UsernameCompletionAction.js: 494af12: UsernameCompletion: Filter out users with indefinite sitewide blocks from API results (T294783) (duration: 00m 55s)
  • 17:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:57 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-db1001.eqiad.wmnet
  • 17:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:44 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 339be07: foundationwiki: Set wgCentralAuthCookies to true (T205347) (duration: 00m 54s)
  • 17:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:33 moritzm: installing opencv security updates
  • 17:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e322770: Revert "Revert "foundationwiki: Enable Translate extension"" (T205349) (duration: 00m 55s)
  • 17:22 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.6/includes/cache/LinkCache.php: 1e78aea: LinkCache: Try invalidating cache before throwing (T205349) (duration: 00m 56s)
  • 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:38 mmandere: pool cp4036.ulsfo.wmnet - T290694
  • 16:30 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4036.ulsfo.wmnet with OS buster
  • 15:41 mmandere: pool cp4034.ulsfo.wmnet - T290694
  • 15:38 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4036.ulsfo.wmnet with OS buster
  • 15:32 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4034.ulsfo.wmnet with OS buster
  • 15:12 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:11 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:07 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:34 mmandere: pool cp4035.ulsfo.wmnet - T290694
  • 14:31 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 14:24 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4035.ulsfo.wmnet with OS buster
  • 14:19 hnowlan: roll-restart restart-php7.2-fpm on A:mw-app-canary and A:mw-api-canary
  • 14:15 hnowlan: debdeploying wikidiff2-1.13.0-1 to A:mw-app-canary and A:mw-api-canary for T285857
  • 14:05 hashar@deploy1002: Finished deploy [integration/docroot@4e4d14a]: Add landing page for code metrics (duration: 00m 09s)
  • 14:05 hashar@deploy1002: Started deploy [integration/docroot@4e4d14a]: Add landing page for code metrics
  • 13:45 mmandere: pool cp4033.ulsfo.wmnet - T290694
  • 11:26 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 11:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1003.eqiad.wmnet
  • 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1003.eqiad.wmnet
  • 11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1004.eqiad.wmnet
  • 10:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
  • 10:54 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1004.eqiad.wmnet
  • 10:53 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2002-dev.codfw.wmnet
  • 10:48 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
  • 10:48 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2001-dev.codfw.wmnet
  • 10:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
  • 10:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1005.eqiad.wmnet
  • 10:41 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
  • 10:40 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudgw2001-dev.codfw.wmnet
  • 10:40 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
  • 10:36 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1005.eqiad.wmnet
  • 10:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: dbff998: dewiki: Set wgGEHomepageDefaultVariant to control (T294712) (duration: 00m 55s)
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1118 with weight 0 T293964', diff saved to https://phabricator.wikimedia.org/P17652 and previous config saved to /var/cache/conftool/dbconfig/20211102-100348-root.json
  • 09:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:40 legoktm: restarted apache2 on lists1001
  • 09:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b259434: QuickSurveys: Show Growth IP editors survey to 0.1% of users (T294568) (duration: 00m 57s)
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges replicas from s6 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17651 and previous config saved to /var/cache/conftool/dbconfig/20211102-090306-marostegui.json
  • 08:29 moritzm: installing sdl2 security updates
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked replicas from s6 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17650 and previous config saved to /var/cache/conftool/dbconfig/20211102-072320-marostegui.json
  • 07:13 elukey: `apt-get purge dkms` (rc state) on stat100[5,8]
  • 06:45 marostegui: Rename oauth2_access_tokens oauth_accepted_consumer oauth_registered_consumer tables on db1123 T294595
  • 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:56 cstone: civicrm revision changed from 403be9ce05 to 93caef68ef
  • 01:21 ejegg: updated SmashPig standalone deploy from dd3a81c7c2 to be68299b92
  • 01:18 ejegg: updated payments-wiki from 5b9fdd0fe1 to 73de4731bd
  • 00:45 mutante: upgraded php-fpm on cloudweb2001-dev - https://labtestwikitech.wikimedia.org/wiki/Main_Page
  • 00:24 mutante: parsoid-canary (scandium, wtp1025, wtp1026, parse2001, parse2002) - upgrading php-fpm and php-* packages
  • 00:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:07 mutante: scandium - installing package upgrades, incl. apache, php7.2- packages
  • 00:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:02 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add event stream config for discussiontools (T286076) (duration: 00m 55s)
  • 00:00 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable ArticlePlaceholder for kswiki (T294632) (duration: 00m 55s)
  • 00:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-11-01

  • 21:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:30 urbanecm: Deploy a security patch for T290808
  • 21:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8f5008d: votewiki: Grant election admins securepoll-view-voter-pii (T290808) (duration: 00m 55s)
  • 20:59 mutante: mwmaint1002:/# systemctl start mediawiki_job_growthexperiments-purgeExpiredMentorStatus (T280307)
  • 20:56 legoktm: upgrading PHP 7.2 on A:mw-canary servers
  • 20:44 legoktm: upgrading PHP 7.2 on mwdebug* servers
  • 20:34 mutante: mwmaint* - new timer/service mediawiki_job_growthexperiments-purgeExpiredMentorStatus created by puppet - T280307
  • 20:33 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
  • 20:32 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
  • 20:30 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
  • 20:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
  • 20:22 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
  • 20:18 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
  • 20:14 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
  • 20:12 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
  • 20:10 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
  • 20:08 mutante: planet1002 - systemctl start update-en-planet after merging config change btw. legoktm: it should be included in a sec
  • 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cba805c: Prepare a QuickSurvey for Growth IP research (T294568) (duration: 00m 55s)
  • 19:26 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 19:23 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 19:19 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:49 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 18:37 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 18:26 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: fb433d6: Amend wordmark for the Meetei (Manipuri) Wikipedia (T294189; 2/2) (duration: 00m 55s)
  • 18:09 urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-mni.svg (T294189)
  • 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:08 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-mni.svg: fb433d6: Amend wordmark for the Meetei (Manipuri) Wikipedia (T294189; 1/2) (duration: 00m 55s)
  • 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:52 topranks: force-resetting FPC 0 on cr2-codfw as it appears hard down.
  • 17:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:46 mutante: removing mediawiki font packages from the 8 canary API servers, in addition to 11 canary appservers T294378
  • 17:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:06 mutante: removing font packages from canary appservers (T294378, gerrit:735685)
  • 16:53 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 16:53 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 15:52 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 15:52 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 15:50 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 15:49 moritzm: installing opencv security updates on stretch
  • 15:28 moritzm: rolling restart of mw canaries to pick up tiff security updates
  • 15:12 moritzm: installing tiff security updates
  • 14:54 moritzm: uploaded PHP 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf3 to apt.wikimedia.org (buster-wikimedia/component/php72) T294317
  • 14:37 moritzm: updating PHP on mwdebug1001
  • 13:31 moritzm: installing jbig2dec security updates
  • 12:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
  • 12:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
  • 12:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
  • 12:08 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.6/extensions/GrowthExperiments/includes/Mentorship/QuitMentorship.php: 4671528: QuitMentorship: Pass a logger (T294665; 2/2) (duration: 00m 55s)
  • 12:07 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.6/extensions/GrowthExperiments/includes/Mentorship/QuitMentorshipFactory.php: 4671528: QuitMentorship: Pass a logger (T294665; 1/2) (duration: 00m 56s)
  • 11:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
  • 11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
  • 11:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
  • 11:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
  • 11:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
  • 11:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
  • 11:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
  • 11:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet
  • 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:01 urbanecm: 11:01:21 Synchronized wmf-config/CommonSettings.php: b9aa3d2: Add edit-legal to editprotected grant (duration: 00m 54s)
  • 11:00 urbanecm: 10:59:03 Synchronized wmf-config/InitialiseSettings.php: c236232: foundationwiki: Disable direct account creation (T205347) (duration: 00m 56s)
  • 10:46 moritzm: installing libdatetime-timezone-perl updates (updates for latest tz changes)
  • 10:17 urbanecm: Deploy a security patch for T294686
  • 09:03 dcausse: restarting blazegraph on wdqs2003 (jvm stuck for the last 22hours)
  • 02:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:24 reedy@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 49s)
  • 02:22 reedy@deploy1002: Synchronized langlist: Add ami to langlist T294717 T292414 (duration: 00m 55s)

2000s

2010s

2020s