Jump to content

Server Admin Log/Archive 55

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2022-07-31

  • 23:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:20 krinkle@deploy1002: Synchronized dblists-index.php: I814ee93b5c (duration: 03m 20s)
  • 23:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:19 vgutierrez@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp5001.eqsin.wmnet
  • 18:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: T314256
  • 18:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: T314256
  • 18:12 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5001.eqsin.wmnet,service=ats-tls
  • 18:12 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5001.eqsin.wmnet,service=varnish-fe
  • 18:12 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5001.eqsin.wmnet,service=ats-be

2022-07-29

  • 22:43 Krinkle: krinkle@mwmaint1002$ mwscript findBadBlobs.php nlwiktionary; mark 2371 blobs from May 2004 as "Invalid gzip, T265989"
  • 22:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2041.codfw.wmnet with OS bullseye
  • 22:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2041.codfw.wmnet with reason: host reimage
  • 22:17 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2041.codfw.wmnet with reason: host reimage
  • 22:09 Krinkle: findBadBlobs.php nlwiktionary --revisions 22 --mark 'Invalid gzip, T265989'
  • 22:01 mutante: phab1001 - rsync -avp --bwlimit=1000 /srv/repos/ rsync://phab1004.eqiad.wmnet/phabricator-srv-repos (running slowly inside a screen session as root) (T313360, T280597)
  • 21:57 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2041.codfw.wmnet with OS bullseye
  • 21:06 mutante: phab1004 - mkdir /srv/repos ; mkdir /srv/dumps
  • 20:46 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic2029.codfw.wmnet with OS bullseye
  • 20:29 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2029.codfw.wmnet with reason: host reimage
  • 20:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2029.codfw.wmnet with reason: host reimage
  • 20:18 mutante: authdns-update - adding gerrit-replica-new.wikimedia.org
  • 20:13 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2029.codfw.wmnet with OS bullseye
  • 18:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2057.codfw.wmnet with OS bullseye
  • 18:06 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2057.codfw.wmnet with reason: host reimage
  • 18:02 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2057.codfw.wmnet with reason: host reimage
  • 17:47 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2057.codfw.wmnet with OS bullseye
  • 17:41 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@85585b0]: (no justification provided) (duration: 00m 09s)
  • 17:41 ebysans@deploy1002: Started deploy [airflow-dags/analytics@85585b0]: (no justification provided)
  • 17:10 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2042.codfw.wmnet with OS bullseye
  • 16:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2042.codfw.wmnet with reason: host reimage
  • 16:50 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2042.codfw.wmnet with reason: host reimage
  • 16:30 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2042.codfw.wmnet with OS bullseye
  • 16:21 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic2058.codfw.wmnet with OS bullseye
  • 15:58 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2058.codfw.wmnet with reason: host reimage
  • 15:55 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2058.codfw.wmnet with reason: host reimage
  • 15:40 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2058.codfw.wmnet with OS bullseye
  • 15:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2030.codfw.wmnet with OS bullseye
  • 15:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2030.codfw.wmnet with reason: host reimage
  • 15:17 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2030.codfw.wmnet with reason: host reimage
  • 15:03 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2030.codfw.wmnet with OS bullseye
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P32112 and previous config saved to /var/cache/conftool/dbconfig/20220729-150256-root.json
  • 15:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2043.codfw.wmnet with OS bullseye
  • 14:59 marostegui: dbmaint s7@eqiad T314140
  • 14:39 marostegui: dbmaint s3@eqiad T314140
  • 14:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: host reimage
  • 14:34 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: host reimage
  • 14:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1189.eqiad.wmnet with OS bullseye
  • 14:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS bullseye
  • 14:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1186.eqiad.wmnet with OS bullseye
  • 14:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1187.eqiad.wmnet with OS bullseye
  • 14:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage
  • 14:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2043.codfw.wmnet with OS bullseye
  • 14:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage
  • 14:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1186.eqiad.wmnet with reason: host reimage
  • 14:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage
  • 14:10 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1187.eqiad.wmnet with reason: host reimage
  • 14:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage
  • 14:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1187.eqiad.wmnet with reason: host reimage
  • 14:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS bullseye
  • 14:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS bullseye
  • 14:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1187.eqiad.wmnet with OS bullseye
  • 14:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1186.eqiad.wmnet with reason: host reimage
  • 14:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bullseye
  • 14:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1185.eqiad.wmnet with OS bullseye
  • 14:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
  • 13:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2047.codfw.wmnet with OS bullseye
  • 13:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2047.codfw.wmnet with reason: host reimage
  • 13:33 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2047.codfw.wmnet with reason: host reimage
  • 13:12 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2047.codfw.wmnet with OS bullseye
  • 13:11 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 13:07 marostegui: dbmaint s8@eqiad T314140
  • 13:07 marostegui: dbmaint s4@eqiad T314140
  • 13:07 marostegui: dbmaint s4@eqiad T314141T314140
  • 13:06 marostegui: dbmaint s3@eqiad T314141
  • 12:11 marostegui: dbmaint s3@eqiad T314087
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2088 from dbctl T313797', diff saved to https://phabricator.wikimedia.org/P32111 and previous config saved to /var/cache/conftool/dbconfig/20220729-114203-marostegui.json
  • 11:37 vgutierrez: update ATS to version 9.1.2 in cp4032 - T309651
  • 11:04 vgutierrez: reenable puppet on cp nodes
  • 11:03 vgutierrez: repool ats-be@cp4026 - T309651
  • 10:33 vgutierrez: disable puppet on cp nodes to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/818436
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2173 into s1 T311493', diff saved to https://phabricator.wikimedia.org/P32110 and previous config saved to /var/cache/conftool/dbconfig/20220729-101507-marostegui.json
  • 08:12 vgutierrez: depool ats-be on cp4026 for debugging purposes
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32109 and previous config saved to /var/cache/conftool/dbconfig/20220729-080528-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32108 and previous config saved to /var/cache/conftool/dbconfig/20220729-075023-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32107 and previous config saved to /var/cache/conftool/dbconfig/20220729-073518-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32106 and previous config saved to /var/cache/conftool/dbconfig/20220729-072013-root.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32105 and previous config saved to /var/cache/conftool/dbconfig/20220729-070509-root.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32104 and previous config saved to /var/cache/conftool/dbconfig/20220729-065004-root.json
  • 05:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 16 hosts with reason: codfw s8 sanitarium master switch
  • 05:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 16 hosts with reason: codfw s8 sanitarium master switch
  • 00:48 TimStarling: slowly restarting (with batch 1 sleep 5) trafficserver on text caches to fully deploy g 817086 T313578

2022-07-28

  • 22:22 mforns@deploy1002: Finished deploy [airflow-dags/analytics@9ea9cd1]: (no justification provided) (duration: 00m 09s)
  • 22:21 mforns@deploy1002: Started deploy [airflow-dags/analytics@9ea9cd1]: (no justification provided)
  • 21:51 mforns@deploy1002: Finished deploy [airflow-dags/analytics@e8d4704]: (no justification provided) (duration: 00m 09s)
  • 21:51 mforns@deploy1002: Started deploy [airflow-dags/analytics@e8d4704]: (no justification provided)
  • 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T312990)', diff saved to https://phabricator.wikimedia.org/P32102 and previous config saved to /var/cache/conftool/dbconfig/20220728-212227-marostegui.json
  • 21:18 mforns@deploy1002: Finished deploy [airflow-dags/analytics@5ec2435]: (no justification provided) (duration: 00m 09s)
  • 21:18 mforns@deploy1002: Started deploy [airflow-dags/analytics@5ec2435]: (no justification provided)
  • 21:07 brennen@deploy1002: Finished deploy [phabricator/deployment@a0f0699]: test deploy to phab2001 (take 2) (duration: 00m 27s)
  • 21:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P32100 and previous config saved to /var/cache/conftool/dbconfig/20220728-210721-marostegui.json
  • 21:06 brennen@deploy1002: Started deploy [phabricator/deployment@a0f0699]: test deploy to phab2001 (take 2)
  • 21:04 brennen@deploy1002: Finished deploy [phabricator/deployment@a21dea9]: test deploy to phab2001 (duration: 00m 27s)
  • 21:03 brennen@deploy1002: Started deploy [phabricator/deployment@a21dea9]: test deploy to phab2001
  • 20:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P32099 and previous config saved to /var/cache/conftool/dbconfig/20220728-205215-marostegui.json
  • 20:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T312990)', diff saved to https://phabricator.wikimedia.org/P32098 and previous config saved to /var/cache/conftool/dbconfig/20220728-203709-marostegui.json
  • 20:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T312990)', diff saved to https://phabricator.wikimedia.org/P32097 and previous config saved to /var/cache/conftool/dbconfig/20220728-203446-marostegui.json
  • 20:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 20:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 20:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 16 hosts with reason: Maintenance
  • 20:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 16 hosts with reason: Maintenance
  • 20:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 20:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 20:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 20:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 20:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T312990)', diff saved to https://phabricator.wikimedia.org/P32096 and previous config saved to /var/cache/conftool/dbconfig/20220728-203212-marostegui.json
  • 20:18 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Register Wikistories streams (T313633) (duration: 03m 24s)
  • 20:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P32095 and previous config saved to /var/cache/conftool/dbconfig/20220728-201706-marostegui.json
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P32094 and previous config saved to /var/cache/conftool/dbconfig/20220728-200200-marostegui.json
  • 19:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T312990)', diff saved to https://phabricator.wikimedia.org/P32093 and previous config saved to /var/cache/conftool/dbconfig/20220728-194654-marostegui.json
  • 19:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T312990)', diff saved to https://phabricator.wikimedia.org/P32092 and previous config saved to /var/cache/conftool/dbconfig/20220728-194426-marostegui.json
  • 19:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 19:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 19:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T312990)', diff saved to https://phabricator.wikimedia.org/P32091 and previous config saved to /var/cache/conftool/dbconfig/20220728-194405-marostegui.json
  • 19:44 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.22 refs T308075
  • 19:35 brennen: 1.39.0-wmf.22 train (T308075): blocker resolved, rolling to all wikis
  • 19:34 brennen@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/Flow: Backport: Update CheckUser hook for pagination (T314058 T314069) (duration: 03m 16s)
  • 19:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P32090 and previous config saved to /var/cache/conftool/dbconfig/20220728-192859-marostegui.json
  • 19:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P32089 and previous config saved to /var/cache/conftool/dbconfig/20220728-191353-marostegui.json
  • 19:08 wfan: civicrm upgraded from 3143dda9 to 497bddf7
  • 19:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@82e0383]: (no justification provided) (duration: 00m 17s)
  • 19:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@82e0383]: (no justification provided)
  • 18:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T312990)', diff saved to https://phabricator.wikimedia.org/P32088 and previous config saved to /var/cache/conftool/dbconfig/20220728-185847-marostegui.json
  • 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T312990)', diff saved to https://phabricator.wikimedia.org/P32087 and previous config saved to /var/cache/conftool/dbconfig/20220728-185624-marostegui.json
  • 18:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 18:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312990)', diff saved to https://phabricator.wikimedia.org/P32086 and previous config saved to /var/cache/conftool/dbconfig/20220728-185603-marostegui.json
  • 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P32085 and previous config saved to /var/cache/conftool/dbconfig/20220728-184056-marostegui.json
  • 18:28 mutante: gerrit: rsyncing /home from prod gerrit1001 to /srv/home-gerrit1001.wikimedia.org on gerrit2002 new replica T243027 T313250
  • 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P32084 and previous config saved to /var/cache/conftool/dbconfig/20220728-182550-marostegui.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312990)', diff saved to https://phabricator.wikimedia.org/P32083 and previous config saved to /var/cache/conftool/dbconfig/20220728-181044-marostegui.json
  • 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T312990)', diff saved to https://phabricator.wikimedia.org/P32082 and previous config saved to /var/cache/conftool/dbconfig/20220728-180815-marostegui.json
  • 18:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 18:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 18:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312990)', diff saved to https://phabricator.wikimedia.org/P32081 and previous config saved to /var/cache/conftool/dbconfig/20220728-180754-marostegui.json
  • 18:06 ryankemper: [Elastic] Finished re-running `delete`s and `update`s from `2022-07-28T15:00:00Z` until `2022-07-28T17:30:00Z`
  • 18:06 damilare: SmashPig updated from ffe5066d to 8e8f0017
  • 17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P32080 and previous config saved to /var/cache/conftool/dbconfig/20220728-175248-marostegui.json
  • 17:41 ryankemper: [Elastic] Re-running `delete`s and `update`s from `2022-07-28T15:00:00Z` until `2022-07-28T17:30:00Z` on `ryankemper@mwmaint1002` tmux `mlr_outage`
  • 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P32079 and previous config saved to /var/cache/conftool/dbconfig/20220728-173742-marostegui.json
  • 17:23 ryankemper: [Elastic] Restarting `elastic1072` after halting mjolnir bulk daemons: `ryankemper@elastic1072:~$ sudo depool && sleep 30 && sudo systemctl restart elasticsearch_6* && sleep 30 && sudo pool`
  • 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312990)', diff saved to https://phabricator.wikimedia.org/P32078 and previous config saved to /var/cache/conftool/dbconfig/20220728-172235-marostegui.json
  • 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T312990)', diff saved to https://phabricator.wikimedia.org/P32077 and previous config saved to /var/cache/conftool/dbconfig/20220728-172008-marostegui.json
  • 17:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:19 ryankemper: [Elastic] `ryankemper@search-loader2001:~$ sudo disable-puppet "production issue" && sudo systemctl stop mjolnir-kafka-bulk-daemon.service` just to be safe (we prob only needed to halt eqiad)
  • 17:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 17:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T312990)', diff saved to https://phabricator.wikimedia.org/P32076 and previous config saved to /var/cache/conftool/dbconfig/20220728-171930-marostegui.json
  • 17:18 ryankemper: [Elastic] `sudo disable-puppet "production issue"` && `sudo systemctl stop mjolnir-kafka-bulk-daemon.service` on `ryankemper@search-loader1001`
  • 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P32075 and previous config saved to /var/cache/conftool/dbconfig/20220728-170424-marostegui.json
  • 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P32074 and previous config saved to /var/cache/conftool/dbconfig/20220728-164918-marostegui.json
  • 16:45 vgutierrez: pooling ats-be@cp4026 running ATS 9.1.2 - T309651
  • 16:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:42 mutante: disabling puppet on gerrit servers for a change in gerrit puppet code
  • 16:38 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T312990)', diff saved to https://phabricator.wikimedia.org/P32073 and previous config saved to /var/cache/conftool/dbconfig/20220728-163412-marostegui.json
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T312990)', diff saved to https://phabricator.wikimedia.org/P32072 and previous config saved to /var/cache/conftool/dbconfig/20220728-163149-marostegui.json
  • 16:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312990)', diff saved to https://phabricator.wikimedia.org/P32071 and previous config saved to /var/cache/conftool/dbconfig/20220728-163127-marostegui.json
  • 16:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1056.eqiad.wmnet
  • 16:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts conf[1004-1006].eqiad.wmnet
  • 16:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 16:21 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
  • 16:21 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 16:21 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
  • 16:21 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 16:21 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P32070 and previous config saved to /var/cache/conftool/dbconfig/20220728-161621-marostegui.json
  • 16:15 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1056.eqiad.wmnet
  • 16:12 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync
  • 16:11 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: sync
  • 16:11 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P32069 and previous config saved to /var/cache/conftool/dbconfig/20220728-160113-marostegui.json
  • 15:52 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync
  • 15:52 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: sync
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312990)', diff saved to https://phabricator.wikimedia.org/P32068 and previous config saved to /var/cache/conftool/dbconfig/20220728-154607-marostegui.json
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T312990)', diff saved to https://phabricator.wikimedia.org/P32067 and previous config saved to /var/cache/conftool/dbconfig/20220728-154344-marostegui.json
  • 15:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 15:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T312990)', diff saved to https://phabricator.wikimedia.org/P32066 and previous config saved to /var/cache/conftool/dbconfig/20220728-154323-marostegui.json
  • 15:38 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4026.ulsfo.wmnet,service=ats-be
  • 15:37 sukhe: depool ats-be on cp4026 for ATS9 testing
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P32063 and previous config saved to /var/cache/conftool/dbconfig/20220728-152817-marostegui.json
  • 15:22 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: upgrade to 3.11.13 T309896 - mvernon@cumin2002
  • 15:17 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts conf[1004-1006].eqiad.wmnet
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P32062 and previous config saved to /var/cache/conftool/dbconfig/20220728-151311-marostegui.json
  • 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T312990)', diff saved to https://phabricator.wikimedia.org/P32061 and previous config saved to /var/cache/conftool/dbconfig/20220728-145805-marostegui.json
  • 14:46 mvernon@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: upgrade to 3.11.13 T309896 - mvernon@cumin2002
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T312990)', diff saved to https://phabricator.wikimedia.org/P32057 and previous config saved to /var/cache/conftool/dbconfig/20220728-141736-marostegui.json
  • 14:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312990)', diff saved to https://phabricator.wikimedia.org/P32056 and previous config saved to /var/cache/conftool/dbconfig/20220728-141715-marostegui.json
  • 14:02 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@137a4ff]: (no justification provided) (duration: 02m 03s)
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P32055 and previous config saved to /var/cache/conftool/dbconfig/20220728-140209-marostegui.json
  • 14:00 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@137a4ff]: (no justification provided)
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32054 and previous config saved to /var/cache/conftool/dbconfig/20220728-134828-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P32053 and previous config saved to /var/cache/conftool/dbconfig/20220728-134703-marostegui.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32052 and previous config saved to /var/cache/conftool/dbconfig/20220728-133323-root.json
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312990)', diff saved to https://phabricator.wikimedia.org/P32051 and previous config saved to /var/cache/conftool/dbconfig/20220728-133157-marostegui.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T312990)', diff saved to https://phabricator.wikimedia.org/P32050 and previous config saved to /var/cache/conftool/dbconfig/20220728-132929-marostegui.json
  • 13:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 13:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 13:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 13:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T312990)', diff saved to https://phabricator.wikimedia.org/P32049 and previous config saved to /var/cache/conftool/dbconfig/20220728-132835-marostegui.json
  • 13:27 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:26 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: testwiki: Add mediawiki.web_ui.interactions stream (T311268) (2/2) (duration: 03m 19s)
  • 13:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: testwiki: Add mediawiki.web_ui.interactions stream (T311268) (1/2) (duration: 03m 24s)
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32048 and previous config saved to /var/cache/conftool/dbconfig/20220728-131818-root.json
  • 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P32047 and previous config saved to /var/cache/conftool/dbconfig/20220728-131329-marostegui.json
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Configure wbsearchentities profile parameter on Wikidata (T307869) (duration: 03m 25s)
  • 13:09 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:07 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32045 and previous config saved to /var/cache/conftool/dbconfig/20220728-130314-root.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P32044 and previous config saved to /var/cache/conftool/dbconfig/20220728-125823-marostegui.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2174 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P32043 and previous config saved to /var/cache/conftool/dbconfig/20220728-125253-marostegui.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32042 and previous config saved to /var/cache/conftool/dbconfig/20220728-124809-root.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T312990)', diff saved to https://phabricator.wikimedia.org/P32041 and previous config saved to /var/cache/conftool/dbconfig/20220728-124317-marostegui.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T312990)', diff saved to https://phabricator.wikimedia.org/P32040 and previous config saved to /var/cache/conftool/dbconfig/20220728-123854-marostegui.json
  • 12:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 12:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 12:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 12:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32039 and previous config saved to /var/cache/conftool/dbconfig/20220728-123304-root.json
  • 11:50 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test 818085 - jbond@cumin2002"
  • 11:50 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test 818085 - jbond@cumin2002"
  • 11:41 akosiaris: slow (10minutes interval) rolling restart of all pybals to pick up new conf hosts config. T311407
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32038 and previous config saved to /var/cache/conftool/dbconfig/20220728-113615-root.json
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32037 and previous config saved to /var/cache/conftool/dbconfig/20220728-112109-root.json
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32036 and previous config saved to /var/cache/conftool/dbconfig/20220728-110604-root.json
  • 10:53 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32035 and previous config saved to /var/cache/conftool/dbconfig/20220728-105100-root.json
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32034 and previous config saved to /var/cache/conftool/dbconfig/20220728-103555-root.json
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32032 and previous config saved to /var/cache/conftool/dbconfig/20220728-102051-root.json
  • 10:19 jbond@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin2002"
  • 10:19 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin2002"
  • 10:13 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin2002"
  • 10:12 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin2002"
  • 10:05 jelto: update gitlab1004 to 15.0.4-ce.0
  • 09:55 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:48 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:40 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:33 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:33 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:24 Emperor: rolling restart of swift proxies to apply wmf/rewrite update T313102
  • 09:17 Emperor: set thanos ring replicas to 3.95 T311690
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2142', diff saved to https://phabricator.wikimedia.org/P32030 and previous config saved to /var/cache/conftool/dbconfig/20220728-085737-marostegui.json
  • 08:57 kart_: Updated cxserver to 2022-07-27-220330-production (T308248)
  • 08:56 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 08:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 08:53 vgutierrez: disable puppet on cp hosts to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/816206
  • 08:48 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 08:48 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 08:44 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 08:43 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 08:36 vgutierrez: update HAProxy to version 2.4.18 in cp4021 and cp4027
  • 08:28 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2172 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P32028 and previous config saved to /var/cache/conftool/dbconfig/20220728-081252-marostegui.json
  • 08:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:02 jnuche: UTC morning backport and config training done
  • 08:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:01 jnuche: UTC morning backport and config training
  • 08:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:44 vgutierrez: update HAProxy to version 2.4.18 on apt.wm.o thirdparty/haproxy24
  • 07:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:21 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable SectionTranslation on 10 more WPs where ContentTranslation is available by default (T313300) (duration: 03m 16s)
  • 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2142 T313811', diff saved to https://phabricator.wikimedia.org/P32026 and previous config saved to /var/cache/conftool/dbconfig/20220728-060757-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2144 to x2 primary T313811', diff saved to https://phabricator.wikimedia.org/P32025 and previous config saved to /var/cache/conftool/dbconfig/20220728-060057-marostegui.json
  • 06:00 marostegui: Starting x2 codfw failover from db2142 to db2144 - T313811
  • 05:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 T313811
  • 05:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 T313811
  • 03:28 ejegg: updated fundraising CiviCRM from e0962be6 to 3143dda9
  • 01:28 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: move OAuth token storage T313578 (duration: 03m 04s)
  • 01:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:18 tstarling@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/OAuth: New config var for T313578, not yet used (duration: 03m 23s)
  • 01:11 tstarling@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/OAuth: New config var for T313578, not yet used (duration: 03m 39s)

2022-07-27

  • 23:59 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: sync again now that scap proxy list is fixed T313730 T313496 (duration: 03m 25s)
  • 23:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:45 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: move CentralAuth sessions to Kask T313496 (duration: 05m 34s)
  • 23:45 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[2251-2255,2257-2258].codfw.wmnet
  • 23:45 rzl@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:38 rzl@cumin2002: START - Cookbook sre.dns.netbox
  • 23:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:29 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: increase wgObjectCacheSessionExpiry to 86400 (duration: 03m 30s)
  • 23:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:26 rzl@cumin2002: START - Cookbook sre.hosts.decommission for hosts mw[2251-2255,2257-2258].codfw.wmnet
  • 23:18 rzl@cumin2002: conftool action : set/pooled=inactive; selector: name=mw225[1-57-8].codfw.wmnet
  • 23:17 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 7 hosts with reason: Decom
  • 23:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on 7 hosts with reason: Decom
  • 23:14 tstarling@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 23:14 tstarling@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 23:14 tstarling@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 23:13 tstarling@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 23:13 rzl@cumin2002: conftool action : set/pooled=no; selector: name=mw225[1-57-8].codfw.wmnet
  • 23:08 tstarling@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 23:08 tstarling@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 22:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:08 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.22 refs T308075 (duration: 03m 08s)
  • 22:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.22 refs T308075
  • 22:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:59 brennen@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/Translate/src/TtmServer: Backport: SearchTranslationsApi: Change the way we fetch TTM services (T313836) (duration: 03m 19s)
  • 21:33 cjming: end of UTC late backport window
  • 21:32 cjming@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/TemplateWizard/resources/ext.TemplateWizard.Dialog.js: Backport: Delay template insertion until after closing the dialog (T33780) (duration: 03m 36s)
  • 21:28 cjming@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/TemplateWizard/resources/ext.TemplateWizard.Dialog.js: Backport: Delay template insertion until after closing the dialog (T33780) (duration: 03m 27s)
  • 21:27 urandom: Removing reserved space on sessionstore storage volumes -- T313991
  • 21:25 cjming@deploy1002: Synchronized php-1.39.0-wmf.22/resources/src/jquery/jquery.textSelection.js: Backport: jquery.textSelection: Use non-execCommand when we can't focus the field (T33780) (duration: 03m 22s)
  • 21:21 cjming@deploy1002: Synchronized php-1.39.0-wmf.21/resources/src/jquery/jquery.textSelection.js: Backport: jquery.textSelection: Use non-execCommand when we can't focus the field (T33780) (duration: 03m 09s)
  • 21:17 cjming@deploy1002: Synchronized php-1.39.0-wmf.22/resources/src/jquery/jquery.textSelection.js: Backport: jquery.textSelection: Support more edge cases of document.execCommand (T33780) (duration: 03m 10s)
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:55 sukhe@cumin1001: dbctl commit (dc=all): 'depool db1111', diff saved to https://phabricator.wikimedia.org/P32018 and previous config saved to /var/cache/conftool/dbconfig/20220727-205536-sukhe.json
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:48 sukhe@cumin1001: dbctl commit (dc=all): 'depool db1132', diff saved to https://phabricator.wikimedia.org/P32017 and previous config saved to /var/cache/conftool/dbconfig/20220727-204806-sukhe.json
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:20 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: VisualEditor: Allow external link paste on mediawikiwiki, metawiki (T129546) (duration: 03m 37s)
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:13 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: ptwiki: Restrict "move" permission (T313802) (duration: 03m 19s)
  • 19:34 denisse@deploy1002: Finished deploy [librenms/librenms@f049593]: Provision LibreNMS on netmon1003 (duration: 00m 05s)
  • 19:34 denisse@deploy1002: Started deploy [librenms/librenms@f049593]: Provision LibreNMS on netmon1003
  • 19:16 ejegg: updated Fundraising CiviCRM from b4a7154a to e0962be6
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T312990)', diff saved to https://phabricator.wikimedia.org/P32015 and previous config saved to /var/cache/conftool/dbconfig/20220727-175414-marostegui.json
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P32014 and previous config saved to /var/cache/conftool/dbconfig/20220727-173908-marostegui.json
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P32013 and previous config saved to /var/cache/conftool/dbconfig/20220727-172402-marostegui.json
  • 17:23 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@1a72195]: switch image_suggestions_manual from _delta to _full (duration: 02m 01s)
  • 17:21 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@1a72195]: switch image_suggestions_manual from _delta to _full
  • 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T312990)', diff saved to https://phabricator.wikimedia.org/P32012 and previous config saved to /var/cache/conftool/dbconfig/20220727-170856-marostegui.json
  • 16:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T312990)', diff saved to https://phabricator.wikimedia.org/P32011 and previous config saved to /var/cache/conftool/dbconfig/20220727-164425-marostegui.json
  • 16:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 16:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 16:42 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 16:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 13 hosts with reason: Maintenance
  • 16:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 13 hosts with reason: Maintenance
  • 16:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 16:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T312990)', diff saved to https://phabricator.wikimedia.org/P32010 and previous config saved to /var/cache/conftool/dbconfig/20220727-163935-marostegui.json
  • 16:34 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 16:32 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 16:32 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 16:31 urandom: rolling Cassandra restart, aqs1010-1015, to restore on-disk logging -- T309896
  • 16:31 andrewbogott: this is a sample log, demonstrating to dhinus
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32009 and previous config saved to /var/cache/conftool/dbconfig/20220727-162429-marostegui.json
  • 16:22 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 16:10 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32008 and previous config saved to /var/cache/conftool/dbconfig/20220727-160923-marostegui.json
  • 16:07 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 16:07 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T312990)', diff saved to https://phabricator.wikimedia.org/P32007 and previous config saved to /var/cache/conftool/dbconfig/20220727-155417-marostegui.json
  • 15:51 urandom: rolling Cassandra restart, aqs2001-2012, to restore on-disk logging -- T309896
  • 15:48 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:46 urandom: restarting Cassandra, sessionstore2001, to restore on-disk logging -- T309896
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T312990)', diff saved to https://phabricator.wikimedia.org/P32006 and previous config saved to /var/cache/conftool/dbconfig/20220727-145646-marostegui.json
  • 14:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 14:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312990)', diff saved to https://phabricator.wikimedia.org/P32005 and previous config saved to /var/cache/conftool/dbconfig/20220727-145626-marostegui.json
  • 14:51 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:51 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32003 and previous config saved to /var/cache/conftool/dbconfig/20220727-144120-marostegui.json
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32002 and previous config saved to /var/cache/conftool/dbconfig/20220727-142614-marostegui.json
  • 14:23 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 14:22 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 14:16 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
  • 14:16 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312990)', diff saved to https://phabricator.wikimedia.org/P32001 and previous config saved to /var/cache/conftool/dbconfig/20220727-141108-marostegui.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T312990)', diff saved to https://phabricator.wikimedia.org/P32000 and previous config saved to /var/cache/conftool/dbconfig/20220727-140544-marostegui.json
  • 14:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T312990)', diff saved to https://phabricator.wikimedia.org/P31999 and previous config saved to /var/cache/conftool/dbconfig/20220727-140523-marostegui.json
  • 13:51 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikidata.php: Config: Tune the wikidata "language" profile for wbsearchentities (T307869) (2/2) (duration: 03m 21s)
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P31998 and previous config saved to /var/cache/conftool/dbconfig/20220727-135017-marostegui.json
  • 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Tune the wikidata "language" profile for wbsearchentities (T307869) (1/2) (duration: 03m 29s)
  • 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:39 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:36 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P31997 and previous config saved to /var/cache/conftool/dbconfig/20220727-133511-marostegui.json
  • 13:34 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:34 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:34 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:34 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:34 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:32 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:32 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:30 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:27 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T312990)', diff saved to https://phabricator.wikimedia.org/P31995 and previous config saved to /var/cache/conftool/dbconfig/20220727-132005-marostegui.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T312990)', diff saved to https://phabricator.wikimedia.org/P31994 and previous config saved to /var/cache/conftool/dbconfig/20220727-131500-marostegui.json
  • 13:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T312990)', diff saved to https://phabricator.wikimedia.org/P31993 and previous config saved to /var/cache/conftool/dbconfig/20220727-131439-marostegui.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P31992 and previous config saved to /var/cache/conftool/dbconfig/20220727-125933-marostegui.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P31991 and previous config saved to /var/cache/conftool/dbconfig/20220727-124426-marostegui.json
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T312990)', diff saved to https://phabricator.wikimedia.org/P31990 and previous config saved to /var/cache/conftool/dbconfig/20220727-122920-marostegui.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T312990)', diff saved to https://phabricator.wikimedia.org/P31989 and previous config saved to /var/cache/conftool/dbconfig/20220727-122147-marostegui.json
  • 12:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 12:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T312990)', diff saved to https://phabricator.wikimedia.org/P31988 and previous config saved to /var/cache/conftool/dbconfig/20220727-122115-marostegui.json
  • 12:17 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 12:17 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P31987 and previous config saved to /var/cache/conftool/dbconfig/20220727-120609-marostegui.json
  • 12:00 kart_: Updated cxserver to 2022-07-27-070728-production (T313300, T309577, T310873, T310880)
  • 11:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:54 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:53 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P31986 and previous config saved to /var/cache/conftool/dbconfig/20220727-115103-marostegui.json
  • 11:48 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:47 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T312990)', diff saved to https://phabricator.wikimedia.org/P31985 and previous config saved to /var/cache/conftool/dbconfig/20220727-113557-marostegui.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T312990)', diff saved to https://phabricator.wikimedia.org/P31984 and previous config saved to /var/cache/conftool/dbconfig/20220727-113136-marostegui.json
  • 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T312990)', diff saved to https://phabricator.wikimedia.org/P31983 and previous config saved to /var/cache/conftool/dbconfig/20220727-112722-marostegui.json
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P31982 and previous config saved to /var/cache/conftool/dbconfig/20220727-111216-marostegui.json
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P31981 and previous config saved to /var/cache/conftool/dbconfig/20220727-105710-marostegui.json
  • 10:46 Emperor: update cassandradev packages for stretch to 3.11.13 T313742
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T312990)', diff saved to https://phabricator.wikimedia.org/P31980 and previous config saved to /var/cache/conftool/dbconfig/20220727-104204-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T312990)', diff saved to https://phabricator.wikimedia.org/P31979 and previous config saved to /var/cache/conftool/dbconfig/20220727-103640-marostegui.json
  • 10:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 10:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T312990)', diff saved to https://phabricator.wikimedia.org/P31978 and previous config saved to /var/cache/conftool/dbconfig/20220727-103619-marostegui.json
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P31976 and previous config saved to /var/cache/conftool/dbconfig/20220727-102113-marostegui.json
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P31974 and previous config saved to /var/cache/conftool/dbconfig/20220727-100607-marostegui.json
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T312990)', diff saved to https://phabricator.wikimedia.org/P31972 and previous config saved to /var/cache/conftool/dbconfig/20220727-095101-marostegui.json
  • 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 09:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T312990)', diff saved to https://phabricator.wikimedia.org/P31971 and previous config saved to /var/cache/conftool/dbconfig/20220727-094452-marostegui.json
  • 09:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 09:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T312990)', diff saved to https://phabricator.wikimedia.org/P31970 and previous config saved to /var/cache/conftool/dbconfig/20220727-094430-marostegui.json
  • 09:35 ladsgroup@deploy1002: Synchronized portals: Fixing favicon of wikiquote and wikibooks, take III (duration: 03m 36s)
  • 09:32 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: memtest86+ run
  • 09:32 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: memtest86+ run
  • 09:31 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Fixing favicon of wikiquote and wikibooks, take III (duration: 03m 19s)
  • 09:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2087.codfw.wmnet
  • 09:29 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P31969 and previous config saved to /var/cache/conftool/dbconfig/20220727-092924-marostegui.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2087 from dbctl T313483', diff saved to https://phabricator.wikimedia.org/P31968 and previous config saved to /var/cache/conftool/dbconfig/20220727-092917-marostegui.json
  • 09:25 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 09:21 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2087.codfw.wmnet
  • 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
  • 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:09 ladsgroup@deploy1002: Synchronized portals: Fixing favicon of wikiquote and wikibooks, take II (duration: 03m 24s)
  • 09:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:05 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Fixing favicon of wikiquote and wikibooks, take II (duration: 03m 49s)
  • 09:02 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P31967 and previous config saved to /var/cache/conftool/dbconfig/20220727-090221-marostegui.json
  • 09:01 elukey: reboot ml-serve2001 - T313822
  • 08:57 elukey: restart burrow-* on kafkamon1002 to pick up zookeeper changes
  • 08:57 elukey: manually create /var/run/burrow on kafkamon1002 to allow a clean restart of Burrow daemons (after zookeeper config change)
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T312990)', diff saved to https://phabricator.wikimedia.org/P31966 and previous config saved to /var/cache/conftool/dbconfig/20220727-084715-marostegui.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T312990)', diff saved to https://phabricator.wikimedia.org/P31965 and previous config saved to /var/cache/conftool/dbconfig/20220727-084120-marostegui.json
  • 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T312990)', diff saved to https://phabricator.wikimedia.org/P31964 and previous config saved to /var/cache/conftool/dbconfig/20220727-084042-marostegui.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2171 (s5, s6) to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P31962 and previous config saved to /var/cache/conftool/dbconfig/20220727-082817-marostegui.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P31961 and previous config saved to /var/cache/conftool/dbconfig/20220727-082535-marostegui.json
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P31960 and previous config saved to /var/cache/conftool/dbconfig/20220727-081029-marostegui.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2170 (s1, s2) to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P31959 and previous config saved to /var/cache/conftool/dbconfig/20220727-080029-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T312990)', diff saved to https://phabricator.wikimedia.org/P31958 and previous config saved to /var/cache/conftool/dbconfig/20220727-075523-marostegui.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T312990)', diff saved to https://phabricator.wikimedia.org/P31957 and previous config saved to /var/cache/conftool/dbconfig/20220727-074546-marostegui.json
  • 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 07:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2079 T313798', diff saved to https://phabricator.wikimedia.org/P31956 and previous config saved to /var/cache/conftool/dbconfig/20220727-073442-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 codfw primary T313798', diff saved to https://phabricator.wikimedia.org/P31955 and previous config saved to /var/cache/conftool/dbconfig/20220727-073214-marostegui.json
  • 07:30 volans: restarted ferm on ms-be1065 (had failed for a timed out query)
  • 07:18 volans: restarted ferm on ms-be2065 (had failed for a timed out query)
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 T313798', diff saved to https://phabricator.wikimedia.org/P31954 and previous config saved to /var/cache/conftool/dbconfig/20220727-070901-marostegui.json
  • 07:05 marostegui: Restart db2161 to change its binlog format
  • 07:03 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: codfw s8 master switch
  • 07:03 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: codfw s8 master switch
  • 05:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2086.codfw.wmnet
  • 05:19 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:15 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 05:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2086.codfw.wmnet
  • 01:44 AndyRussG: update payments-wiki 4487bd31 -> 589bb64

2022-07-26

  • 23:59 tzatziki: removing one file for legal compliance
  • 22:06 brennen@deploy1002: Finished deploy [phabricator/deployment@0950b61]: test deploy to phab2001 (duration: 00m 27s)
  • 22:06 brennen@deploy1002: Started deploy [phabricator/deployment@0950b61]: test deploy to phab2001
  • 22:03 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 05s)
  • 22:02 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
  • 21:54 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 05s)
  • 21:54 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
  • 21:53 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 05s)
  • 21:53 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
  • 21:51 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 05s)
  • 21:51 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
  • 21:33 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 51s)
  • 21:32 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
  • 21:30 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 11s)
  • 21:30 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
  • 21:28 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 19s)
  • 21:28 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
  • 21:25 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001 (duration: 00m 05s)
  • 21:25 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: test deploy to phab2001
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:27 inflatador: bking@wdqs1004 restarted blazegraph services that were (are?) alerting for 503
  • 20:21 ebernhardson: depool wdqs1004
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:20 cjming: end of UTC late backport window
  • 20:19 cjming@deploy1002: Synchronized logos/config.yaml: Config: etwikiquote: Change logo for 10k articles (T313698) (duration: 03m 07s)
  • 20:16 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: etwikiquote: Change logo for 10k articles (T313698) (duration: 03m 15s)
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:12 cjming@deploy1002: Synchronized static/images/project-logos/: Config: etwikiquote: Change logo for 10k articles (T313698) (duration: 03m 28s)
  • 19:03 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts elastic2049.codfw.wmnet
  • 19:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:59 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
  • 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:53 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic2049.codfw.wmnet
  • 18:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts elastic2049.codfw.wmnet
  • 18:41 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic2049.codfw.wmnet
  • 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.22 refs T308075
  • 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:04 mutante: [doc1002:~] $ sudo systemctl start rsync-doc-doc2001.codfw.wmnet.service
  • 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:40 bking@cumin1001: conftool action : set/pooled=inactive; selector: name=elastic2049
  • 17:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:28 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.22 refs T308075 (duration: 35m 50s)
  • 17:12 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 17:11 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 17:10 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 17:09 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 17:09 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 17:09 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 17:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:52 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.22 refs T308075
  • 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:33 brennen@deploy1002: Finished deploy [phabricator/deployment@8a7d4bf]: no-op demonstration deploy to phab2001 (duration: 00m 26s)
  • 16:32 brennen@deploy1002: Started deploy [phabricator/deployment@8a7d4bf]: no-op demonstration deploy to phab2001
  • 15:58 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 15:58 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 15:56 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 15:56 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 15:56 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 15:56 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 15:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:48 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Add WikibaseTerms temporary debug log channel" (T313039) (grep confirms wmf.21+ code has no mentions of this channel) (duration: 03m 19s)
  • 15:30 _joe_: restarting pybal on lvs1020 to check php 7.4 too
  • 15:25 _joe_: restarting pybal on lvs1019 to check php 7.4 too
  • 15:23 _joe_: restarting pybal on lvs2009 to check php 7.4 too
  • 15:18 _joe_: restarting pybal on lvs2010 to check php 7.4 too
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2144 with weight 0 and db2143 back with 100 T313811', diff saved to https://phabricator.wikimedia.org/P31952 and previous config saved to /var/cache/conftool/dbconfig/20220726-145412-root.json
  • 14:52 sukhe: upload trafficserver_9.1.2-1wm1_amd64 to apt.wm.o (buster) - T309651
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2143 with weight 0 T313811', diff saved to https://phabricator.wikimedia.org/P31951 and previous config saved to /var/cache/conftool/dbconfig/20220726-145116-root.json
  • 14:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 14:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 14:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 14:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 9 hosts with reason: Maintenance
  • 14:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 9 hosts with reason: Maintenance
  • 14:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 14:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312990)', diff saved to https://phabricator.wikimedia.org/P31950 and previous config saved to /var/cache/conftool/dbconfig/20220726-141540-marostegui.json
  • 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31949 and previous config saved to /var/cache/conftool/dbconfig/20220726-140034-marostegui.json
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31947 and previous config saved to /var/cache/conftool/dbconfig/20220726-134529-marostegui.json
  • 13:38 taavi: UTC afternoon deploys done
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:35 taavi@deploy1002: Synchronized php-1.39.0-wmf.21/resources/src/jquery/jquery.textSelection.js: backporting gerrit r817231 r817232 for wmf.21, T33780 (duration: 03m 02s)
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312990)', diff saved to https://phabricator.wikimedia.org/P31946 and previous config saved to /var/cache/conftool/dbconfig/20220726-133023-marostegui.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T312990)', diff saved to https://phabricator.wikimedia.org/P31945 and previous config saved to /var/cache/conftool/dbconfig/20220726-132650-marostegui.json
  • 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312990)', diff saved to https://phabricator.wikimedia.org/P31944 and previous config saved to /var/cache/conftool/dbconfig/20220726-132628-marostegui.json
  • 13:25 jbond: uploaded spicerack_3.1.1 to apt.wikimedia.org bullseye-wikimedia
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31943 and previous config saved to /var/cache/conftool/dbconfig/20220726-131122-marostegui.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31942 and previous config saved to /var/cache/conftool/dbconfig/20220726-125617-marostegui.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312990)', diff saved to https://phabricator.wikimedia.org/P31941 and previous config saved to /var/cache/conftool/dbconfig/20220726-124112-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T312990)', diff saved to https://phabricator.wikimedia.org/P31940 and previous config saved to /var/cache/conftool/dbconfig/20220726-123745-marostegui.json
  • 12:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 12:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312990)', diff saved to https://phabricator.wikimedia.org/P31939 and previous config saved to /var/cache/conftool/dbconfig/20220726-123719-marostegui.json
  • 12:32 jnuche@deploy1002: Synchronized README: Verifying fix for T313770 (duration: 03m 14s)
  • 12:24 jnuche@deploy1002: Installation of scap version "4.11.4" completed for 559 hosts
  • 12:24 jnuche@deploy1002: Installing scap version "4.11.4" for 559 hosts
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31938 and previous config saved to /var/cache/conftool/dbconfig/20220726-122214-marostegui.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31937 and previous config saved to /var/cache/conftool/dbconfig/20220726-120709-marostegui.json
  • 12:02 oblivian@deploy1002: Synchronized README: testing fix for php restarts T313770 (duration: 03m 15s)
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312990)', diff saved to https://phabricator.wikimedia.org/P31936 and previous config saved to /var/cache/conftool/dbconfig/20220726-115204-marostegui.json
  • 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T312990)', diff saved to https://phabricator.wikimedia.org/P31935 and previous config saved to /var/cache/conftool/dbconfig/20220726-114833-marostegui.json
  • 11:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312990)', diff saved to https://phabricator.wikimedia.org/P31934 and previous config saved to /var/cache/conftool/dbconfig/20220726-114813-marostegui.json
  • 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31933 and previous config saved to /var/cache/conftool/dbconfig/20220726-113308-marostegui.json
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31932 and previous config saved to /var/cache/conftool/dbconfig/20220726-111803-marostegui.json
  • 11:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312990)', diff saved to https://phabricator.wikimedia.org/P31931 and previous config saved to /var/cache/conftool/dbconfig/20220726-110258-marostegui.json
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T312990)', diff saved to https://phabricator.wikimedia.org/P31930 and previous config saved to /var/cache/conftool/dbconfig/20220726-110022-marostegui.json
  • 11:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 11:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312990)', diff saved to https://phabricator.wikimedia.org/P31929 and previous config saved to /var/cache/conftool/dbconfig/20220726-110002-marostegui.json
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31928 and previous config saved to /var/cache/conftool/dbconfig/20220726-104456-marostegui.json
  • 10:39 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update hieradata from Netbox - volans@cumin2002"
  • 10:38 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update hieradata from Netbox - volans@cumin2002"
  • 10:34 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31925 and previous config saved to /var/cache/conftool/dbconfig/20220726-102951-marostegui.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312990)', diff saved to https://phabricator.wikimedia.org/P31924 and previous config saved to /var/cache/conftool/dbconfig/20220726-101446-marostegui.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T312990)', diff saved to https://phabricator.wikimedia.org/P31923 and previous config saved to /var/cache/conftool/dbconfig/20220726-101130-marostegui.json
  • 10:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 10:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312990)', diff saved to https://phabricator.wikimedia.org/P31922 and previous config saved to /var/cache/conftool/dbconfig/20220726-101110-marostegui.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31921 and previous config saved to /var/cache/conftool/dbconfig/20220726-095605-marostegui.json
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31920 and previous config saved to /var/cache/conftool/dbconfig/20220726-094100-marostegui.json
  • 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2085.codfw.wmnet
  • 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:40 oblivian@deploy1002: Synchronized README: testing fix for php restarts (duration: 02m 54s)
  • 09:36 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 09:32 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2085.codfw.wmnet
  • 09:31 _joe_: running puppet on the mw-canary hosts T313770
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312990)', diff saved to https://phabricator.wikimedia.org/P31918 and previous config saved to /var/cache/conftool/dbconfig/20220726-092555-marostegui.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T312990)', diff saved to https://phabricator.wikimedia.org/P31917 and previous config saved to /var/cache/conftool/dbconfig/20220726-092217-marostegui.json
  • 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:21 jnuche@deploy1002: Installation of scap version "4.11.3" completed for 1 hosts
  • 09:21 jnuche@deploy1002: Installing scap version "4.11.3" for 1 hosts
  • 09:13 volans: manually restarting php on MW canaries: cumin 'A:mw-canary' 'restart-php-fpm-all'
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31916 and previous config saved to /var/cache/conftool/dbconfig/20220726-090241-root.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31915 and previous config saved to /var/cache/conftool/dbconfig/20220726-090237-root.json
  • 08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31914 and previous config saved to /var/cache/conftool/dbconfig/20220726-084737-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31913 and previous config saved to /var/cache/conftool/dbconfig/20220726-084733-root.json
  • 08:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1020
  • 08:40 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1020
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31912 and previous config saved to /var/cache/conftool/dbconfig/20220726-083233-root.json
  • 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31911 and previous config saved to /var/cache/conftool/dbconfig/20220726-083229-root.json
  • 08:33 marostegui: Promote pc1014 to pc3 master T313401
  • 08:33 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to pc3 master (duration: 03m 13s)
  • 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:33 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:26 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:26 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:19 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:19 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31909 and previous config saved to /var/cache/conftool/dbconfig/20220726-081729-root.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31908 and previous config saved to /var/cache/conftool/dbconfig/20220726-081725-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31907 and previous config saved to /var/cache/conftool/dbconfig/20220726-080225-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31906 and previous config saved to /var/cache/conftool/dbconfig/20220726-080221-root.json
  • 07:48 _joe_: deploy python3-poolcounter everywhere T310835
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31903 and previous config saved to /var/cache/conftool/dbconfig/20220726-074721-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31902 and previous config saved to /var/cache/conftool/dbconfig/20220726-074717-root.json
  • 07:41 vgutierrez: rolling restart of ats-be on cp[1080,1083,1085,1087,5006,6001,6006,6009,6011,6015]
  • 07:30 _joe_: running a restart-all for php-fpm on appservers in codfw to test python-poolcounter 0.0.3 T310835
  • 06:58 _joe_: upgrade all of codfw to python3-poolcounter 0.0.3 T310835
  • 06:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 06:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 06:36 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 06:24 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 06:21 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
  • 06:07 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:11 TimStarling: restarted php7.2-fpm on the 9 canary hosts in eqiad T313770

2022-07-25

  • 22:54 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312863)', diff saved to https://phabricator.wikimedia.org/P31900 and previous config saved to /var/cache/conftool/dbconfig/20220725-224153-ladsgroup.json
  • 22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31899 and previous config saved to /var/cache/conftool/dbconfig/20220725-222648-ladsgroup.json
  • 22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31898 and previous config saved to /var/cache/conftool/dbconfig/20220725-221143-ladsgroup.json
  • 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312863)', diff saved to https://phabricator.wikimedia.org/P31897 and previous config saved to /var/cache/conftool/dbconfig/20220725-215637-ladsgroup.json
  • 21:27 brennen@deploy1002: Finished scap: no-op deploy to get wmf.21 on all boxen (T313770) (duration: 03m 33s)
  • 21:24 brennen@deploy1002: Started scap: no-op deploy to get wmf.21 on all boxen (T313770)
  • 21:20 brennen: running a no-op sync-world for T313770 to hopefully get 1.39.0-wmf.21 (T308074) to all servers.
  • 20:28 cjming: end of UTC late backport window
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:10 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [cirrus] Increase shard count for ruwikinews (duration: 03m 15s)
  • 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:06 cjming@deploy1002: Synchronized wmf-config: Config: Remove Table of Contents config (T310527) (duration: 03m 13s)
  • 19:24 mutante: after new wikis have been created apparently they need a "initSiteStats.php" run to make statistics work but this only runs in a timer on mwmaint once weekly or so
  • 19:23 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_initsitestats.service
  • 17:07 jbond: enable puppet fleet wide
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T312863)', diff saved to https://phabricator.wikimedia.org/P31895 and previous config saved to /var/cache/conftool/dbconfig/20220725-165931-ladsgroup.json
  • 16:49 jbond: disable puppet fleet wide
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31894 and previous config saved to /var/cache/conftool/dbconfig/20220725-164426-ladsgroup.json
  • 16:31 ejegg: updated payments-wiki from f56e9391 to 4487bd31
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31893 and previous config saved to /var/cache/conftool/dbconfig/20220725-162921-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T312863)', diff saved to https://phabricator.wikimedia.org/P31892 and previous config saved to /var/cache/conftool/dbconfig/20220725-161416-ladsgroup.json
  • 16:14 bblack: cp*: re-enable puppet for normal staggered rollout (cp4027 tested all the esitest stuff without incident)
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T312863)', diff saved to https://phabricator.wikimedia.org/P31891 and previous config saved to /var/cache/conftool/dbconfig/20220725-160532-ladsgroup.json
  • 16:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T312863)', diff saved to https://phabricator.wikimedia.org/P31890 and previous config saved to /var/cache/conftool/dbconfig/20220725-160512-ladsgroup.json
  • 15:59 bblack: cp*: temporarily disable puppet to test esitest service rollout
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31888 and previous config saved to /var/cache/conftool/dbconfig/20220725-155007-ladsgroup.json
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31887 and previous config saved to /var/cache/conftool/dbconfig/20220725-153502-ladsgroup.json
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T312863)', diff saved to https://phabricator.wikimedia.org/P31886 and previous config saved to /var/cache/conftool/dbconfig/20220725-151957-ladsgroup.json
  • 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T312863)', diff saved to https://phabricator.wikimedia.org/P31885 and previous config saved to /var/cache/conftool/dbconfig/20220725-150212-ladsgroup.json
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 15:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 15:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T312863)', diff saved to https://phabricator.wikimedia.org/P31884 and previous config saved to /var/cache/conftool/dbconfig/20220725-150039-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312863)', diff saved to https://phabricator.wikimedia.org/P31883 and previous config saved to /var/cache/conftool/dbconfig/20220725-144827-ladsgroup.json
  • 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31882 and previous config saved to /var/cache/conftool/dbconfig/20220725-144534-ladsgroup.json
  • 14:44 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2001.codfw.wmnet: restart cassandra on 3.11.13 canary T309896 - mvernon@cumin2002
  • 14:38 mvernon@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2001.codfw.wmnet: restart cassandra on 3.11.13 canary T309896 - mvernon@cumin2002
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31881 and previous config saved to /var/cache/conftool/dbconfig/20220725-143321-ladsgroup.json
  • 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31880 and previous config saved to /var/cache/conftool/dbconfig/20220725-143029-ladsgroup.json
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31879 and previous config saved to /var/cache/conftool/dbconfig/20220725-141816-ladsgroup.json
  • 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T312863)', diff saved to https://phabricator.wikimedia.org/P31878 and previous config saved to /var/cache/conftool/dbconfig/20220725-141523-ladsgroup.json
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T312863)', diff saved to https://phabricator.wikimedia.org/P31877 and previous config saved to /var/cache/conftool/dbconfig/20220725-141236-ladsgroup.json
  • 14:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T312863)', diff saved to https://phabricator.wikimedia.org/P31876 and previous config saved to /var/cache/conftool/dbconfig/20220725-141215-ladsgroup.json
  • 14:12 andrewbogott: updating wikitech-static to MediaWiki 1.38.2
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312863)', diff saved to https://phabricator.wikimedia.org/P31875 and previous config saved to /var/cache/conftool/dbconfig/20220725-140311-ladsgroup.json
  • 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:01 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:01 Lucas_WMDE: lucaswerkmeister-wmde@mw1320:~$ sudo -i /usr/local/sbin/restart-php7.2-fpm # T310847 just in case
  • 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:59 Lucas_WMDE: lucaswerkmeister-wmde@mw1320:~$ scap pull # T310847 (repeat failed host from earlier sync)
  • 13:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add sampling to android.breadcrumbs event stream. (T310847) (duration: 02m 56s)
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31874 and previous config saved to /var/cache/conftool/dbconfig/20220725-135710-ladsgroup.json
  • 13:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: ptwikinews: Install WikiLove extension (T313173) (duration: 03m 19s)
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31873 and previous config saved to /var/cache/conftool/dbconfig/20220725-134205-ladsgroup.json
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:31 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php ptwikinews wikilove # T313173
  • 13:28 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: ruwikivoyage: Add "suppressredirect" right to "filemover" group (T313614) (duration: 03m 17s)
  • 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T312863)', diff saved to https://phabricator.wikimedia.org/P31872 and previous config saved to /var/cache/conftool/dbconfig/20220725-132700-ladsgroup.json
  • 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:21 Emperor: set min_part_hours to 12 for eqiad swift on ms-fe1009 T312643
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T312863)', diff saved to https://phabricator.wikimedia.org/P31871 and previous config saved to /var/cache/conftool/dbconfig/20220725-132012-ladsgroup.json
  • 13:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T312863)', diff saved to https://phabricator.wikimedia.org/P31870 and previous config saved to /var/cache/conftool/dbconfig/20220725-131952-ladsgroup.json
  • 13:16 Emperor: set min_part_hours to 12 for codfw swift on ms-fe2009 T312643
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31864 and previous config saved to /var/cache/conftool/dbconfig/20220725-130447-ladsgroup.json
  • 13:02 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:02 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31863 and previous config saved to /var/cache/conftool/dbconfig/20220725-124942-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T312863)', diff saved to https://phabricator.wikimedia.org/P31862 and previous config saved to /var/cache/conftool/dbconfig/20220725-123436-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T312863)', diff saved to https://phabricator.wikimedia.org/P31861 and previous config saved to /var/cache/conftool/dbconfig/20220725-122953-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 12:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 12:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T312863)', diff saved to https://phabricator.wikimedia.org/P31860 and previous config saved to /var/cache/conftool/dbconfig/20220725-122839-ladsgroup.json
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31859 and previous config saved to /var/cache/conftool/dbconfig/20220725-121334-ladsgroup.json
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31858 and previous config saved to /var/cache/conftool/dbconfig/20220725-115829-ladsgroup.json
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T312863)', diff saved to https://phabricator.wikimedia.org/P31857 and previous config saved to /var/cache/conftool/dbconfig/20220725-114324-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T312863)', diff saved to https://phabricator.wikimedia.org/P31856 and previous config saved to /var/cache/conftool/dbconfig/20220725-113939-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T312863)', diff saved to https://phabricator.wikimedia.org/P31855 and previous config saved to /var/cache/conftool/dbconfig/20220725-113919-ladsgroup.json
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312863)', diff saved to https://phabricator.wikimedia.org/P31854 and previous config saved to /var/cache/conftool/dbconfig/20220725-112528-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31853 and previous config saved to /var/cache/conftool/dbconfig/20220725-112413-ladsgroup.json
  • 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31852 and previous config saved to /var/cache/conftool/dbconfig/20220725-111023-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31851 and previous config saved to /var/cache/conftool/dbconfig/20220725-110908-ladsgroup.json
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31850 and previous config saved to /var/cache/conftool/dbconfig/20220725-105518-ladsgroup.json
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T312863)', diff saved to https://phabricator.wikimedia.org/P31848 and previous config saved to /var/cache/conftool/dbconfig/20220725-105403-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T312863)', diff saved to https://phabricator.wikimedia.org/P31846 and previous config saved to /var/cache/conftool/dbconfig/20220725-105114-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T312863)', diff saved to https://phabricator.wikimedia.org/P31845 and previous config saved to /var/cache/conftool/dbconfig/20220725-105054-ladsgroup.json
  • 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312863)', diff saved to https://phabricator.wikimedia.org/P31841 and previous config saved to /var/cache/conftool/dbconfig/20220725-104013-ladsgroup.json
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31837 and previous config saved to /var/cache/conftool/dbconfig/20220725-103549-ladsgroup.json
  • 10:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:26 ladsgroup@deploy1002: Synchronized portals: Wikimedia Portals Update: Fixing favicon of wikiquote and wikibooks (duration: 02m 55s)
  • 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:23 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Fixing favicon of wikiquote and wikibooks (duration: 03m 03s)
  • 10:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:21 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31834 and previous config saved to /var/cache/conftool/dbconfig/20220725-102043-ladsgroup.json
  • 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T312863)', diff saved to https://phabricator.wikimedia.org/P31833 and previous config saved to /var/cache/conftool/dbconfig/20220725-100538-ladsgroup.json
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1109 (T312863)', diff saved to https://phabricator.wikimedia.org/P31832 and previous config saved to /var/cache/conftool/dbconfig/20220725-100254-ladsgroup.json
  • 10:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T312863)', diff saved to https://phabricator.wikimedia.org/P31831 and previous config saved to /var/cache/conftool/dbconfig/20220725-100234-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31826 and previous config saved to /var/cache/conftool/dbconfig/20220725-094729-ladsgroup.json
  • 09:34 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31825 and previous config saved to /var/cache/conftool/dbconfig/20220725-093222-ladsgroup.json
  • 09:30 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:26 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T312863)', diff saved to https://phabricator.wikimedia.org/P31824 and previous config saved to /var/cache/conftool/dbconfig/20220725-091740-ladsgroup.json
  • 09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T312863)', diff saved to https://phabricator.wikimedia.org/P31823 and previous config saved to /var/cache/conftool/dbconfig/20220725-091717-ladsgroup.json
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T312863)', diff saved to https://phabricator.wikimedia.org/P31822 and previous config saved to /var/cache/conftool/dbconfig/20220725-091435-ladsgroup.json
  • 09:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 09:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 15 hosts with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 15 hosts with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 09:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 09:10 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 09:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T312863)', diff saved to https://phabricator.wikimedia.org/P31821 and previous config saved to /var/cache/conftool/dbconfig/20220725-090906-ladsgroup.json
  • 09:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T312863)', diff saved to https://phabricator.wikimedia.org/P31820 and previous config saved to /var/cache/conftool/dbconfig/20220725-090604-ladsgroup.json
  • 09:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P31819 and previous config saved to /var/cache/conftool/dbconfig/20220725-090113-ladsgroup.json
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P31818 and previous config saved to /var/cache/conftool/dbconfig/20220725-084609-ladsgroup.json
  • 08:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P31817 and previous config saved to /var/cache/conftool/dbconfig/20220725-083105-ladsgroup.json
  • 08:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 5%: Maint done', diff saved to https://phabricator.wikimedia.org/P31816 and previous config saved to /var/cache/conftool/dbconfig/20220725-081601-ladsgroup.json
  • 08:15 kartik@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/Translate: Backport: ReviewTranslationActionApi: Move to namespace and add strict types (T312008 T313608) (duration: 03m 09s)
  • 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:37 kartik@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Explicitly set math rendering modes (T309686) (duration: 03m 11s)
  • 07:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:31 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:23 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 07:16 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Section Translation in Uzbek Wikipedia (T310116) (duration: 03m 04s)
  • 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:30 XioNoX: power off asw2-d5-eqiad for decommissioning - T313115

2022-07-24

  • 20:54 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM archiva1002.wikimedia.org
  • 20:37 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM archiva1002.wikimedia.org
  • 14:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T312863)', diff saved to https://phabricator.wikimedia.org/P31815 and previous config saved to /var/cache/conftool/dbconfig/20220724-100221-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31814 and previous config saved to /var/cache/conftool/dbconfig/20220724-094716-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31813 and previous config saved to /var/cache/conftool/dbconfig/20220724-093211-ladsgroup.json
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T312863)', diff saved to https://phabricator.wikimedia.org/P31812 and previous config saved to /var/cache/conftool/dbconfig/20220724-091706-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312863)', diff saved to https://phabricator.wikimedia.org/P31811 and previous config saved to /var/cache/conftool/dbconfig/20220724-041542-ladsgroup.json
  • 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31810 and previous config saved to /var/cache/conftool/dbconfig/20220724-040037-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31809 and previous config saved to /var/cache/conftool/dbconfig/20220724-034532-ladsgroup.json
  • 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T312863)', diff saved to https://phabricator.wikimedia.org/P31808 and previous config saved to /var/cache/conftool/dbconfig/20220724-034356-ladsgroup.json
  • 03:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 03:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312863)', diff saved to https://phabricator.wikimedia.org/P31807 and previous config saved to /var/cache/conftool/dbconfig/20220724-034336-ladsgroup.json
  • 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312863)', diff saved to https://phabricator.wikimedia.org/P31806 and previous config saved to /var/cache/conftool/dbconfig/20220724-033027-ladsgroup.json
  • 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31805 and previous config saved to /var/cache/conftool/dbconfig/20220724-032831-ladsgroup.json
  • 03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31804 and previous config saved to /var/cache/conftool/dbconfig/20220724-031326-ladsgroup.json
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312863)', diff saved to https://phabricator.wikimedia.org/P31803 and previous config saved to /var/cache/conftool/dbconfig/20220724-025820-ladsgroup.json
  • 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T312863)', diff saved to https://phabricator.wikimedia.org/P31802 and previous config saved to /var/cache/conftool/dbconfig/20220724-003718-ladsgroup.json
  • 00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 00:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 00:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T312863)', diff saved to https://phabricator.wikimedia.org/P31801 and previous config saved to /var/cache/conftool/dbconfig/20220724-003652-ladsgroup.json
  • 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31800 and previous config saved to /var/cache/conftool/dbconfig/20220724-002147-ladsgroup.json
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31799 and previous config saved to /var/cache/conftool/dbconfig/20220724-000641-ladsgroup.json

2022-07-23

  • 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T312863)', diff saved to https://phabricator.wikimedia.org/P31798 and previous config saved to /var/cache/conftool/dbconfig/20220723-235136-ladsgroup.json
  • 23:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T312863)', diff saved to https://phabricator.wikimedia.org/P31797 and previous config saved to /var/cache/conftool/dbconfig/20220723-232948-ladsgroup.json
  • 23:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 23:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 23:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312863)', diff saved to https://phabricator.wikimedia.org/P31796 and previous config saved to /var/cache/conftool/dbconfig/20220723-232927-ladsgroup.json
  • 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31795 and previous config saved to /var/cache/conftool/dbconfig/20220723-231422-ladsgroup.json
  • 22:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31794 and previous config saved to /var/cache/conftool/dbconfig/20220723-225917-ladsgroup.json
  • 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312863)', diff saved to https://phabricator.wikimedia.org/P31793 and previous config saved to /var/cache/conftool/dbconfig/20220723-224412-ladsgroup.json
  • 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T312863)', diff saved to https://phabricator.wikimedia.org/P31792 and previous config saved to /var/cache/conftool/dbconfig/20220723-220740-ladsgroup.json
  • 22:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 22:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312863)', diff saved to https://phabricator.wikimedia.org/P31791 and previous config saved to /var/cache/conftool/dbconfig/20220723-220720-ladsgroup.json
  • 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31790 and previous config saved to /var/cache/conftool/dbconfig/20220723-215215-ladsgroup.json
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31789 and previous config saved to /var/cache/conftool/dbconfig/20220723-213710-ladsgroup.json
  • 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312863)', diff saved to https://phabricator.wikimedia.org/P31788 and previous config saved to /var/cache/conftool/dbconfig/20220723-213610-ladsgroup.json
  • 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312863)', diff saved to https://phabricator.wikimedia.org/P31787 and previous config saved to /var/cache/conftool/dbconfig/20220723-212204-ladsgroup.json
  • 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31786 and previous config saved to /var/cache/conftool/dbconfig/20220723-212105-ladsgroup.json
  • 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31785 and previous config saved to /var/cache/conftool/dbconfig/20220723-210559-ladsgroup.json
  • 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312863)', diff saved to https://phabricator.wikimedia.org/P31784 and previous config saved to /var/cache/conftool/dbconfig/20220723-205054-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T312863)', diff saved to https://phabricator.wikimedia.org/P31783 and previous config saved to /var/cache/conftool/dbconfig/20220723-204049-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 20:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T312863)', diff saved to https://phabricator.wikimedia.org/P31782 and previous config saved to /var/cache/conftool/dbconfig/20220723-164105-ladsgroup.json
  • 16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 16:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312863)', diff saved to https://phabricator.wikimedia.org/P31781 and previous config saved to /var/cache/conftool/dbconfig/20220723-164045-ladsgroup.json
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31780 and previous config saved to /var/cache/conftool/dbconfig/20220723-162540-ladsgroup.json
  • 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31779 and previous config saved to /var/cache/conftool/dbconfig/20220723-161035-ladsgroup.json
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312863)', diff saved to https://phabricator.wikimedia.org/P31778 and previous config saved to /var/cache/conftool/dbconfig/20220723-155530-ladsgroup.json
  • 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 15:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312863)', diff saved to https://phabricator.wikimedia.org/P31777 and previous config saved to /var/cache/conftool/dbconfig/20220723-155311-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31776 and previous config saved to /var/cache/conftool/dbconfig/20220723-153805-ladsgroup.json
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31775 and previous config saved to /var/cache/conftool/dbconfig/20220723-152300-ladsgroup.json
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T312863)', diff saved to https://phabricator.wikimedia.org/P31774 and previous config saved to /var/cache/conftool/dbconfig/20220723-151951-ladsgroup.json
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312863)', diff saved to https://phabricator.wikimedia.org/P31773 and previous config saved to /var/cache/conftool/dbconfig/20220723-151930-ladsgroup.json
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312863)', diff saved to https://phabricator.wikimedia.org/P31772 and previous config saved to /var/cache/conftool/dbconfig/20220723-150754-ladsgroup.json
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31771 and previous config saved to /var/cache/conftool/dbconfig/20220723-150425-ladsgroup.json
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31770 and previous config saved to /var/cache/conftool/dbconfig/20220723-144920-ladsgroup.json
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312863)', diff saved to https://phabricator.wikimedia.org/P31769 and previous config saved to /var/cache/conftool/dbconfig/20220723-143414-ladsgroup.json
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T312863)', diff saved to https://phabricator.wikimedia.org/P31768 and previous config saved to /var/cache/conftool/dbconfig/20220723-105825-ladsgroup.json
  • 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312863)', diff saved to https://phabricator.wikimedia.org/P31767 and previous config saved to /var/cache/conftool/dbconfig/20220723-105805-ladsgroup.json
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T312863)', diff saved to https://phabricator.wikimedia.org/P31766 and previous config saved to /var/cache/conftool/dbconfig/20220723-105257-ladsgroup.json
  • 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312863)', diff saved to https://phabricator.wikimedia.org/P31765 and previous config saved to /var/cache/conftool/dbconfig/20220723-105238-ladsgroup.json
  • 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312863)', diff saved to https://phabricator.wikimedia.org/P31764 and previous config saved to /var/cache/conftool/dbconfig/20220723-105228-ladsgroup.json
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31763 and previous config saved to /var/cache/conftool/dbconfig/20220723-104300-ladsgroup.json
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31762 and previous config saved to /var/cache/conftool/dbconfig/20220723-103733-ladsgroup.json
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31761 and previous config saved to /var/cache/conftool/dbconfig/20220723-103723-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31760 and previous config saved to /var/cache/conftool/dbconfig/20220723-102755-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31759 and previous config saved to /var/cache/conftool/dbconfig/20220723-102227-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31758 and previous config saved to /var/cache/conftool/dbconfig/20220723-102218-ladsgroup.json
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312863)', diff saved to https://phabricator.wikimedia.org/P31757 and previous config saved to /var/cache/conftool/dbconfig/20220723-101250-ladsgroup.json
  • 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312863)', diff saved to https://phabricator.wikimedia.org/P31756 and previous config saved to /var/cache/conftool/dbconfig/20220723-100722-ladsgroup.json
  • 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312863)', diff saved to https://phabricator.wikimedia.org/P31755 and previous config saved to /var/cache/conftool/dbconfig/20220723-100713-ladsgroup.json
  • 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T312863)', diff saved to https://phabricator.wikimedia.org/P31754 and previous config saved to /var/cache/conftool/dbconfig/20220723-095241-ladsgroup.json
  • 09:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T312863)', diff saved to https://phabricator.wikimedia.org/P31753 and previous config saved to /var/cache/conftool/dbconfig/20220723-053604-ladsgroup.json
  • 05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T312863)', diff saved to https://phabricator.wikimedia.org/P31752 and previous config saved to /var/cache/conftool/dbconfig/20220723-052925-ladsgroup.json
  • 05:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 05:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 01:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 01:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312863)', diff saved to https://phabricator.wikimedia.org/P31751 and previous config saved to /var/cache/conftool/dbconfig/20220723-015300-ladsgroup.json
  • 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31750 and previous config saved to /var/cache/conftool/dbconfig/20220723-013755-ladsgroup.json
  • 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31749 and previous config saved to /var/cache/conftool/dbconfig/20220723-012250-ladsgroup.json
  • 01:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312863)', diff saved to https://phabricator.wikimedia.org/P31748 and previous config saved to /var/cache/conftool/dbconfig/20220723-010745-ladsgroup.json
  • 00:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
  • 00:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
  • 00:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 00:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312863)', diff saved to https://phabricator.wikimedia.org/P31747 and previous config saved to /var/cache/conftool/dbconfig/20220723-001125-ladsgroup.json

2022-07-22

  • 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31746 and previous config saved to /var/cache/conftool/dbconfig/20220722-235619-ladsgroup.json
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31745 and previous config saved to /var/cache/conftool/dbconfig/20220722-234114-ladsgroup.json
  • 23:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312863)', diff saved to https://phabricator.wikimedia.org/P31744 and previous config saved to /var/cache/conftool/dbconfig/20220722-232609-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T312863)', diff saved to https://phabricator.wikimedia.org/P31743 and previous config saved to /var/cache/conftool/dbconfig/20220722-215349-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312863)', diff saved to https://phabricator.wikimedia.org/P31742 and previous config saved to /var/cache/conftool/dbconfig/20220722-215329-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31741 and previous config saved to /var/cache/conftool/dbconfig/20220722-213824-ladsgroup.json
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31740 and previous config saved to /var/cache/conftool/dbconfig/20220722-212319-ladsgroup.json
  • 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T312863)', diff saved to https://phabricator.wikimedia.org/P31739 and previous config saved to /var/cache/conftool/dbconfig/20220722-211308-ladsgroup.json
  • 21:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 21:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312863)', diff saved to https://phabricator.wikimedia.org/P31738 and previous config saved to /var/cache/conftool/dbconfig/20220722-211259-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312863)', diff saved to https://phabricator.wikimedia.org/P31737 and previous config saved to /var/cache/conftool/dbconfig/20220722-210813-ladsgroup.json
  • 21:05 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 29s)
  • 21:04 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31736 and previous config saved to /var/cache/conftool/dbconfig/20220722-205754-ladsgroup.json
  • 20:44 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 07s)
  • 20:44 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
  • 20:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31735 and previous config saved to /var/cache/conftool/dbconfig/20220722-204248-ladsgroup.json
  • 20:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 20:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 20:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 20:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312863)', diff saved to https://phabricator.wikimedia.org/P31734 and previous config saved to /var/cache/conftool/dbconfig/20220722-203708-ladsgroup.json
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312863)', diff saved to https://phabricator.wikimedia.org/P31733 and previous config saved to /var/cache/conftool/dbconfig/20220722-202743-ladsgroup.json
  • 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31732 and previous config saved to /var/cache/conftool/dbconfig/20220722-202203-ladsgroup.json
  • 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31731 and previous config saved to /var/cache/conftool/dbconfig/20220722-200658-ladsgroup.json
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312863)', diff saved to https://phabricator.wikimedia.org/P31730 and previous config saved to /var/cache/conftool/dbconfig/20220722-195153-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T312863)', diff saved to https://phabricator.wikimedia.org/P31729 and previous config saved to /var/cache/conftool/dbconfig/20220722-194428-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 19:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T312863)', diff saved to https://phabricator.wikimedia.org/P31727 and previous config saved to /var/cache/conftool/dbconfig/20220722-173218-ladsgroup.json
  • 17:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:54 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: no-op deploy to sync up new cloudweb hosts (duration: 08m 47s)
  • 16:45 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: no-op deploy to sync up new cloudweb hosts
  • 16:19 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 16:02 jbond: puppet-agent to puppet7 component
  • 15:57 jbond: ruby-semantic-puppet to puppet7 component
  • 15:49 jbond: ruby-sorted-set to puppet7 component
  • 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2046.codfw.wmnet with OS bullseye
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T312863)', diff saved to https://phabricator.wikimedia.org/P31725 and previous config saved to /var/cache/conftool/dbconfig/20220722-150727-ladsgroup.json
  • 15:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312863)', diff saved to https://phabricator.wikimedia.org/P31724 and previous config saved to /var/cache/conftool/dbconfig/20220722-150707-ladsgroup.json
  • 15:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2046.codfw.wmnet with reason: host reimage
  • 15:03 jbond: ruby-rbtree to puppet7 component
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2046.codfw.wmnet with reason: host reimage
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31722 and previous config saved to /var/cache/conftool/dbconfig/20220722-145201-ladsgroup.json
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31721 and previous config saved to /var/cache/conftool/dbconfig/20220722-144734-ladsgroup.json
  • 14:41 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2046.codfw.wmnet with OS bullseye
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31720 and previous config saved to /var/cache/conftool/dbconfig/20220722-143655-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31719 and previous config saved to /var/cache/conftool/dbconfig/20220722-143229-ladsgroup.json
  • 14:29 moritzm: restarting tomcat on idp-test.w.o
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312863)', diff saved to https://phabricator.wikimedia.org/P31718 and previous config saved to /var/cache/conftool/dbconfig/20220722-142150-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312863)', diff saved to https://phabricator.wikimedia.org/P31717 and previous config saved to /var/cache/conftool/dbconfig/20220722-141724-ladsgroup.json
  • 13:45 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2033.codfw.wmnet with OS bullseye
  • 13:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2033.codfw.wmnet with reason: host reimage
  • 13:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2033.codfw.wmnet with reason: host reimage
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T312863)', diff saved to https://phabricator.wikimedia.org/P31713 and previous config saved to /var/cache/conftool/dbconfig/20220722-131710-ladsgroup.json
  • 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 13:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312863)', diff saved to https://phabricator.wikimedia.org/P31712 and previous config saved to /var/cache/conftool/dbconfig/20220722-131650-ladsgroup.json
  • 13:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31711 and previous config saved to /var/cache/conftool/dbconfig/20220722-130145-ladsgroup.json
  • 12:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 12:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:55 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31710 and previous config saved to /var/cache/conftool/dbconfig/20220722-124640-ladsgroup.json
  • 10:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2014.codfw.wmnet to cluster codfw and group C
  • 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2014.codfw.wmnet to cluster codfw and group C
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
  • 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
  • 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31708 and previous config saved to /var/cache/conftool/dbconfig/20220722-102452-ladsgroup.json
  • 10:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2021.codfw.wmnet to cluster codfw and group B
  • 10:21 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2021.codfw.wmnet to cluster codfw and group B
  • 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
  • 10:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2021.codfw.wmnet to cluster codfw and group B
  • 10:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2021.codfw.wmnet to cluster codfw and group B
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31707 and previous config saved to /var/cache/conftool/dbconfig/20220722-100948-ladsgroup.json
  • 10:06 XioNoX: push pfw policies - T313522
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31706 and previous config saved to /var/cache/conftool/dbconfig/20220722-095444-ladsgroup.json
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31705 and previous config saved to /var/cache/conftool/dbconfig/20220722-093940-ladsgroup.json
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T312984)', diff saved to https://phabricator.wikimedia.org/P31704 and previous config saved to /var/cache/conftool/dbconfig/20220722-093754-ladsgroup.json
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T312863)', diff saved to https://phabricator.wikimedia.org/P31702 and previous config saved to /var/cache/conftool/dbconfig/20220722-093453-ladsgroup.json
  • 09:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T312863)', diff saved to https://phabricator.wikimedia.org/P31701 and previous config saved to /var/cache/conftool/dbconfig/20220722-084647-ladsgroup.json
  • 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312863)', diff saved to https://phabricator.wikimedia.org/P31700 and previous config saved to /var/cache/conftool/dbconfig/20220722-084627-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 12 hosts with reason: Maintenance
  • 08:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 12 hosts with reason: Maintenance
  • 08:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312863)', diff saved to https://phabricator.wikimedia.org/P31697 and previous config saved to /var/cache/conftool/dbconfig/20220722-080112-ladsgroup.json
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T312863)', diff saved to https://phabricator.wikimedia.org/P31696 and previous config saved to /var/cache/conftool/dbconfig/20220722-074844-ladsgroup.json
  • 07:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 07:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 06:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2014.codfw.wmnet with OS bullseye
  • 06:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2014.codfw.wmnet with reason: host reimage
  • 05:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2014.codfw.wmnet with OS bullseye
  • 05:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 05:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 05:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 05:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2021.codfw.wmnet with reason: host reimage
  • 05:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 05:35 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2021.codfw.wmnet with reason: host reimage
  • 05:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 05:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 05:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 05:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2021.codfw.wmnet with OS bullseye
  • 05:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2021.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 05:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2021.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 04:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 04:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312863)', diff saved to https://phabricator.wikimedia.org/P31694 and previous config saved to /var/cache/conftool/dbconfig/20220722-045543-ladsgroup.json
  • 04:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 04:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31693 and previous config saved to /var/cache/conftool/dbconfig/20220722-044038-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31692 and previous config saved to /var/cache/conftool/dbconfig/20220722-042533-ladsgroup.json
  • 04:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312863)', diff saved to https://phabricator.wikimedia.org/P31691 and previous config saved to /var/cache/conftool/dbconfig/20220722-041028-ladsgroup.json
  • 04:05 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: disable debug log on test2wiki (cleanup) (duration: 03m 05s)
  • 04:01 krinkle@deploy1002: Synchronized wmf-config/: I9051d20cd1 (duration: 03m 02s)
  • 03:58 krinkle@deploy1002: Synchronized multiversion/: I9051d20cd1 (duration: 03m 10s)
  • 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T312863)', diff saved to https://phabricator.wikimedia.org/P31690 and previous config saved to /var/cache/conftool/dbconfig/20220722-031014-ladsgroup.json
  • 03:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 03:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T312863)', diff saved to https://phabricator.wikimedia.org/P31689 and previous config saved to /var/cache/conftool/dbconfig/20220722-030954-ladsgroup.json
  • 03:09 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: disable debug log on test2wiki (duration: 02m 47s)
  • 03:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31688 and previous config saved to /var/cache/conftool/dbconfig/20220722-025449-ladsgroup.json
  • 02:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31687 and previous config saved to /var/cache/conftool/dbconfig/20220722-023943-ladsgroup.json
  • 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T312863)', diff saved to https://phabricator.wikimedia.org/P31685 and previous config saved to /var/cache/conftool/dbconfig/20220722-002622-ladsgroup.json
  • 00:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312863)', diff saved to https://phabricator.wikimedia.org/P31684 and previous config saved to /var/cache/conftool/dbconfig/20220722-002601-ladsgroup.json
  • 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31683 and previous config saved to /var/cache/conftool/dbconfig/20220722-001056-ladsgroup.json

2022-07-21

  • 23:53 mutante: https://policy.wikimedia.org moved from Wordpress DNS back to WMF DNS - now redirects to https://wikimediafoundation.org/advocacy/ as requested on T310738 | this might also resolve T132104 or not because wikimediafoundation.org is also on wordpress VIP
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312863)', diff saved to https://phabricator.wikimedia.org/P31680 and previous config saved to /var/cache/conftool/dbconfig/20220721-234045-ladsgroup.json
  • 23:22 mutante: [cumin2002:~] $ sudo cumin 'C:profile::httpbb' "rm /srv/deployment/httpbb-tests/appserver/test_search.yaml"
  • 23:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2045.codfw.wmnet with OS bullseye
  • 22:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2045.codfw.wmnet with reason: host reimage
  • 22:52 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2045.codfw.wmnet with reason: host reimage
  • 22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 22:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312984)', diff saved to https://phabricator.wikimedia.org/P31678 and previous config saved to /var/cache/conftool/dbconfig/20220721-223048-ladsgroup.json
  • 22:30 mutante: re-enabling puppet on all remaining 'C:profile::mediawiki::httpd'
  • 22:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2045.codfw.wmnet with OS bullseye
  • 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31677 and previous config saved to /var/cache/conftool/dbconfig/20220721-221543-ladsgroup.json
  • 22:09 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2045.codfw.wmnet with OS bullseye
  • 22:05 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2045.codfw.wmnet with OS bullseye
  • 22:02 dancy@deploy1002: Installation of scap version "4.11.3" completed for 559 hosts
  • 22:02 dancy@deploy1002: Installing scap version "4.11.3" for 559 hosts
  • 22:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31676 and previous config saved to /var/cache/conftool/dbconfig/20220721-220038-ladsgroup.json
  • 21:56 mutante: re-enabling puppet on mw2 in groups (codfw)
  • 21:48 mutante: re-enabling puppet on parsoid (wtp*)
  • 21:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312984)', diff saved to https://phabricator.wikimedia.org/P31675 and previous config saved to /var/cache/conftool/dbconfig/20220721-214532-ladsgroup.json
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T312863)', diff saved to https://phabricator.wikimedia.org/P31674 and previous config saved to /var/cache/conftool/dbconfig/20220721-213246-ladsgroup.json
  • 21:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 21:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T312863)', diff saved to https://phabricator.wikimedia.org/P31673 and previous config saved to /var/cache/conftool/dbconfig/20220721-213237-ladsgroup.json
  • 21:17 mutante: puppet re-enabled on mw-api-canary and parsoid-canary
  • 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31672 and previous config saved to /var/cache/conftool/dbconfig/20220721-211732-ladsgroup.json
  • 20:52 mutante: deploying apache config change on cluster, slowly..puppet disabled on C:profile::mediawiki::httpd .. then re-enabling starting with mwdebug.. using httpbb to test it.. then re-enabling puppet on more hosts https://gerrit.wikimedia.org/r/c/operations/puppet/+/809324 Bug: T310738
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T312984)', diff saved to https://phabricator.wikimedia.org/P31669 and previous config saved to /var/cache/conftool/dbconfig/20220721-204518-ladsgroup.json
  • 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 20:39 mutante: disabling puppet on mw appservers to deploy gerrit:809324 - T310738
  • 20:34 cjming: end of UTC late backport window
  • 20:34 bd808: Proof of life for stashbot processing !logs
  • 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:28 andrewbogott: testing the log by logging a test
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T312863)', diff saved to https://phabricator.wikimedia.org/P31668 and previous config saved to /var/cache/conftool/dbconfig/20220721-202348-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312863)', diff saved to https://phabricator.wikimedia.org/P31667 and previous config saved to /var/cache/conftool/dbconfig/20220721-202311-ladsgroup.json
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31666 and previous config saved to /var/cache/conftool/dbconfig/20220721-200806-ladsgroup.json
  • 19:56 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 19:54 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31665 and previous config saved to /var/cache/conftool/dbconfig/20220721-195301-ladsgroup.json
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312863)', diff saved to https://phabricator.wikimedia.org/P31664 and previous config saved to /var/cache/conftool/dbconfig/20220721-193756-ladsgroup.json
  • 19:35 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 05s)
  • 19:35 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
  • 19:34 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 05s)
  • 19:34 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
  • 19:31 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2066.codfw.wmnet with OS bullseye
  • 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31662 and previous config saved to /var/cache/conftool/dbconfig/20220721-191136-ladsgroup.json
  • 19:09 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2066.codfw.wmnet with reason: host reimage
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31661 and previous config saved to /var/cache/conftool/dbconfig/20220721-185631-ladsgroup.json
  • 18:50 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
  • 18:42 tzatziki: running extensions/SecurePoll/cli/wm-scripts/bv2022/populateEditCount.php on all 8 sections
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312984)', diff saved to https://phabricator.wikimedia.org/P31660 and previous config saved to /var/cache/conftool/dbconfig/20220721-184126-ladsgroup.json
  • 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T312863)', diff saved to https://phabricator.wikimedia.org/P31659 and previous config saved to /var/cache/conftool/dbconfig/20220721-183723-ladsgroup.json
  • 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 18:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312863)', diff saved to https://phabricator.wikimedia.org/P31658 and previous config saved to /var/cache/conftool/dbconfig/20220721-183703-ladsgroup.json
  • 18:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:34 dancy@deploy1002: Finished scap: Backport for gerrit:816022 MWConfigCacheGenerator.php: Use grace period of 3 minutes (duration: 03m 39s)
  • 18:31 dancy@deploy1002: Started scap: Backport for gerrit:816022 MWConfigCacheGenerator.php: Use grace period of 3 minutes
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T312984)', diff saved to https://phabricator.wikimedia.org/P31656 and previous config saved to /var/cache/conftool/dbconfig/20220721-182033-ladsgroup.json
  • 18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31655 and previous config saved to /var/cache/conftool/dbconfig/20220721-182013-ladsgroup.json
  • 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:14 brennen: testing scap deployment to phab2001, this is a no-op for production services
  • 18:12 brennen@deploy1002: Finished deploy [phabricator/deployment@358bb3a]: (no justification provided) (duration: 01m 17s)
  • 18:11 brennen@deploy1002: Started deploy [phabricator/deployment@358bb3a]: (no justification provided)
  • 18:10 tzatziki: creating tables for board election with bv2022_tables.sql
  • 18:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:07 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.21 refs T308074
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31654 and previous config saved to /var/cache/conftool/dbconfig/20220721-180653-ladsgroup.json
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31653 and previous config saved to /var/cache/conftool/dbconfig/20220721-180508-ladsgroup.json
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312863)', diff saved to https://phabricator.wikimedia.org/P31652 and previous config saved to /var/cache/conftool/dbconfig/20220721-175147-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31651 and previous config saved to /var/cache/conftool/dbconfig/20220721-175003-ladsgroup.json
  • 17:42 dwisehaupt: reclone of frdb2003 from frdb1003 is complete. all services back in service.
  • 17:41 ryankemper@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic207[0-2].*
  • 17:41 ryankemper@cumin1001: conftool action : set/weight=10:pooled=no; selector: name=elastic2066.codfw.wmnet
  • 17:41 ryankemper@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic206[1-9].*
  • 17:36 dancy@deploy1002: Synchronized README: Gathering timing info (duration: 03m 09s)
  • 17:35 ryankemper@cumin1001: conftool action : GET; selector: name=elastic6*
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31650 and previous config saved to /var/cache/conftool/dbconfig/20220721-173458-ladsgroup.json
  • 17:30 ryankemper@cumin1001: conftool action : set/weight=10,pooled=yes; selector: name=elastic6*
  • 17:21 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:20 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:20 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:19 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:17 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:17 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:00 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
  • 16:58 ryankemper: T300943 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/816017 to get conftool-data entries for new elastic2* hosts
  • 16:58 mvernon@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: merging upstream config changes T309896 - mvernon@cumin1001
  • 16:44 ryankemper@cumin1001: conftool action : set/weight=10; selector: name=elastic6*
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31649 and previous config saved to /var/cache/conftool/dbconfig/20220721-163859-ladsgroup.json
  • 16:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:38 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T312863)', diff saved to https://phabricator.wikimedia.org/P31648 and previous config saved to /var/cache/conftool/dbconfig/20220721-162458-ladsgroup.json
  • 16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312863)', diff saved to https://phabricator.wikimedia.org/P31647 and previous config saved to /var/cache/conftool/dbconfig/20220721-162419-ladsgroup.json
  • 16:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1004.wikimedia.org with OS buster
  • 16:09 ryankemper: T300943 Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/816008 and running puppet twice on elastic20[64-72]
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31646 and previous config saved to /var/cache/conftool/dbconfig/20220721-160914-ladsgroup.json
  • 16:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1003.wikimedia.org with OS buster
  • 16:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31645 and previous config saved to /var/cache/conftool/dbconfig/20220721-160522-ladsgroup.json
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31644 and previous config saved to /var/cache/conftool/dbconfig/20220721-155409-ladsgroup.json
  • 15:50 ryankemper: T300943 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/815823 and running puppet across elastic2* in preparation for adding new codfw hosts into service
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31643 and previous config saved to /var/cache/conftool/dbconfig/20220721-155017-ladsgroup.json
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312863)', diff saved to https://phabricator.wikimedia.org/P31642 and previous config saved to /var/cache/conftool/dbconfig/20220721-153904-ladsgroup.json
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31641 and previous config saved to /var/cache/conftool/dbconfig/20220721-153512-ladsgroup.json
  • 15:34 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
  • 15:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
  • 15:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
  • 15:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
  • 15:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikibase.php: Config: Configure wbsearchentities profile parameter on Test Wikidata (take 2) (T307869) (2/2) (duration: 03m 13s)
  • 15:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:21 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Configure wbsearchentities profile parameter on Test Wikidata (take 2) (T307869) (1/2) (duration: 02m 59s)
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31640 and previous config saved to /var/cache/conftool/dbconfig/20220721-152007-ladsgroup.json
  • 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:16 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS buster
  • 15:16 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS buster
  • 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/Wikibase/repo/: Backport: Fix profile in wbsearchentities and wbsearch (T307869) (duration: 03m 07s)
  • 15:13 moritzm: draining ganeti2021 T310483
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2014.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 15:11 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2014.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 14:45 moritzm: upgrading ganeti/eqsin to 3.0.2 T312637
  • 14:39 mvernon@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: merging upstream config changes T309896 - mvernon@cumin1001
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31639 and previous config saved to /var/cache/conftool/dbconfig/20220721-143544-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31638 and previous config saved to /var/cache/conftool/dbconfig/20220721-143524-ladsgroup.json
  • 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312990)', diff saved to https://phabricator.wikimedia.org/P31637 and previous config saved to /var/cache/conftool/dbconfig/20220721-142523-marostegui.json
  • 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1192.eqiad.wmnet with OS bullseye
  • 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1187.eqiad.wmnet with OS bullseye
  • 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1195.eqiad.wmnet with OS bullseye
  • 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1188.eqiad.wmnet with OS bullseye
  • 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1191.eqiad.wmnet with OS bullseye
  • 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1185.eqiad.wmnet with OS bullseye
  • 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1194.eqiad.wmnet with OS bullseye
  • 14:23 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1186.eqiad.wmnet with OS bullseye
  • 14:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1190.eqiad.wmnet with OS bullseye
  • 14:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1193.eqiad.wmnet with OS bullseye
  • 14:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1189.eqiad.wmnet with OS bullseye
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31636 and previous config saved to /var/cache/conftool/dbconfig/20220721-142019-ladsgroup.json
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31635 and previous config saved to /var/cache/conftool/dbconfig/20220721-141938-root.json
  • 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bullseye
  • 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS bullseye
  • 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
  • 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1187.eqiad.wmnet with OS bullseye
  • 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1192.eqiad.wmnet with OS bullseye
  • 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1190.eqiad.wmnet with OS bullseye
  • 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS bullseye
  • 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1193.eqiad.wmnet with OS bullseye
  • 14:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bullseye
  • 14:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS bullseye
  • 14:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS bullseye
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31634 and previous config saved to /var/cache/conftool/dbconfig/20220721-141018-marostegui.json
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31633 and previous config saved to /var/cache/conftool/dbconfig/20220721-140513-ladsgroup.json
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31632 and previous config saved to /var/cache/conftool/dbconfig/20220721-140434-root.json
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T312863)', diff saved to https://phabricator.wikimedia.org/P31631 and previous config saved to /var/cache/conftool/dbconfig/20220721-140004-ladsgroup.json
  • 13:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 13:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1185.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1192.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1187.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1191.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1190.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1188.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1186.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1193.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1194.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1189.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1195.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31630 and previous config saved to /var/cache/conftool/dbconfig/20220721-135513-marostegui.json
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31629 and previous config saved to /var/cache/conftool/dbconfig/20220721-135008-ladsgroup.json
  • 13:45 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31628 and previous config saved to /var/cache/conftool/dbconfig/20220721-134250-root.json
  • 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1192.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1193.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1190.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1185.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1187.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1188.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1186.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1191.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host db1189.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312990)', diff saved to https://phabricator.wikimedia.org/P31627 and previous config saved to /var/cache/conftool/dbconfig/20220721-134008-marostegui.json
  • 13:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T312990)', diff saved to https://phabricator.wikimedia.org/P31626 and previous config saved to /var/cache/conftool/dbconfig/20220721-132824-marostegui.json
  • 13:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31625 and previous config saved to /var/cache/conftool/dbconfig/20220721-132746-root.json
  • 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T312990)', diff saved to https://phabricator.wikimedia.org/P31624 and previous config saved to /var/cache/conftool/dbconfig/20220721-132639-marostegui.json
  • 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:21 moritzm: installing paramiko security updates
  • 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:19 Lucas_WMDE: pulled config change Iee6de25983 to mwdebug1001, then reverted in I9248270621 and pulled that too; neither was synced to other hosts
  • 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:14 moritzm: installing xen security updates
  • 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31623 and previous config saved to /var/cache/conftool/dbconfig/20220721-131040-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312990)', diff saved to https://phabricator.wikimedia.org/P31622 and previous config saved to /var/cache/conftool/dbconfig/20220721-125108-marostegui.json
  • 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31621 and previous config saved to /var/cache/conftool/dbconfig/20220721-123603-marostegui.json
  • 12:21 dwisehaupt: started reclone of frdb2003 from frdb1003
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31620 and previous config saved to /var/cache/conftool/dbconfig/20220721-122058-marostegui.json
  • 12:07 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312990)', diff saved to https://phabricator.wikimedia.org/P31619 and previous config saved to /var/cache/conftool/dbconfig/20220721-120553-marostegui.json
  • 12:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 18:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 12:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 18:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T312990)', diff saved to https://phabricator.wikimedia.org/P31618 and previous config saved to /var/cache/conftool/dbconfig/20220721-115607-marostegui.json
  • 11:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 11:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312990)', diff saved to https://phabricator.wikimedia.org/P31617 and previous config saved to /var/cache/conftool/dbconfig/20220721-114641-marostegui.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31616 and previous config saved to /var/cache/conftool/dbconfig/20220721-113136-marostegui.json
  • 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2078.codfw.wmnet
  • 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31615 and previous config saved to /var/cache/conftool/dbconfig/20220721-111631-marostegui.json
  • 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2006.codfw.wmnet with reason: Switch instance to plain disk storage, T311686
  • 11:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2006.codfw.wmnet with reason: Switch instance to plain disk storage, T311686
  • 11:10 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:09 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:08 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:08 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:07 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:07 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 11:07 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:03 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2078.codfw.wmnet
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312990)', diff saved to https://phabricator.wikimedia.org/P31614 and previous config saved to /var/cache/conftool/dbconfig/20220721-110126-marostegui.json
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T312984)', diff saved to https://phabricator.wikimedia.org/P31613 and previous config saved to /var/cache/conftool/dbconfig/20220721-105856-ladsgroup.json
  • 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubetcd2006.codfw.wmnet with reason: Switch to DRBD, T311686
  • 10:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on kubetcd2006.codfw.wmnet with reason: Switch to DRBD, T311686
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31612 and previous config saved to /var/cache/conftool/dbconfig/20220721-104351-ladsgroup.json
  • 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T312990)', diff saved to https://phabricator.wikimedia.org/P31611 and previous config saved to /var/cache/conftool/dbconfig/20220721-104039-marostegui.json
  • 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312990)', diff saved to https://phabricator.wikimedia.org/P31610 and previous config saved to /var/cache/conftool/dbconfig/20220721-104002-marostegui.json
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2026.codfw.wmnet to cluster codfw and group D
  • 10:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to cluster codfw and group D
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31609 and previous config saved to /var/cache/conftool/dbconfig/20220721-102846-ladsgroup.json
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31608 and previous config saved to /var/cache/conftool/dbconfig/20220721-102457-marostegui.json
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
  • 10:18 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 10:17 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 10:15 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 10:15 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 10:14 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2026.codfw.wmnet to cluster codfw and group D
  • 10:14 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to cluster codfw and group D
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T312984)', diff saved to https://phabricator.wikimedia.org/P31607 and previous config saved to /var/cache/conftool/dbconfig/20220721-101341-ladsgroup.json
  • 10:11 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:10 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31606 and previous config saved to /var/cache/conftool/dbconfig/20220721-100951-marostegui.json
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2009.codfw.wmnet to cluster codfw and group C
  • 10:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2009.codfw.wmnet to cluster codfw and group C
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: Switch instance to plain disk storage, T311686
  • 09:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: Switch instance to plain disk storage, T311686
  • 09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312863)', diff saved to https://phabricator.wikimedia.org/P31605 and previous config saved to /var/cache/conftool/dbconfig/20220721-095454-ladsgroup.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312990)', diff saved to https://phabricator.wikimedia.org/P31604 and previous config saved to /var/cache/conftool/dbconfig/20220721-095446-marostegui.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2085 and db2086 from dbctl �[3~', diff saved to https://phabricator.wikimedia.org/P31603 and previous config saved to /var/cache/conftool/dbconfig/20220721-095439-marostegui.json
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T312984)', diff saved to https://phabricator.wikimedia.org/P31602 and previous config saved to /var/cache/conftool/dbconfig/20220721-093755-ladsgroup.json
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 09:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 09:32 jbond: enable puppet on A:cp post gerrit:815728
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31601 and previous config saved to /var/cache/conftool/dbconfig/20220721-093032-ladsgroup.json
  • 09:21 moritzm: installing containerd security updates in Kubernetes eqiad masters
  • 09:18 jbond: disable puppet on A:cp for gerrit:815728
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31599 and previous config saved to /var/cache/conftool/dbconfig/20220721-091527-ladsgroup.json
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312863)', diff saved to https://phabricator.wikimedia.org/P31598 and previous config saved to /var/cache/conftool/dbconfig/20220721-090022-ladsgroup.json
  • 08:59 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:59 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:57 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:55 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:54 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:54 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:54 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:54 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:54 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:54 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:53 klausman@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T312990)', diff saved to https://phabricator.wikimedia.org/P31597 and previous config saved to /var/cache/conftool/dbconfig/20220721-084935-marostegui.json
  • 08:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 08:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
  • 08:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2169 to s6 and s7 T311493', diff saved to https://phabricator.wikimedia.org/P31595 and previous config saved to /var/cache/conftool/dbconfig/20220721-083147-marostegui.json
  • 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 08:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 08:18 moritzm: installing containerd security updates in Kubernetes eqiad workers
  • 08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312990)', diff saved to https://phabricator.wikimedia.org/P31594 and previous config saved to /var/cache/conftool/dbconfig/20220721-081449-marostegui.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31593 and previous config saved to /var/cache/conftool/dbconfig/20220721-075944-marostegui.json
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P31592 and previous config saved to /var/cache/conftool/dbconfig/20220721-075757-root.json
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31591 and previous config saved to /var/cache/conftool/dbconfig/20220721-075745-root.json
  • 07:46 ladsgroup@deploy1002: Synchronized portals: Wikimedia Portals Update: Adding Wikiquote to the new portals (T273179) (duration: 03m 10s)
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31590 and previous config saved to /var/cache/conftool/dbconfig/20220721-074439-marostegui.json
  • 07:43 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Adding Wikiquote to the new portals (T273179) (duration: 03m 08s)
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P31589 and previous config saved to /var/cache/conftool/dbconfig/20220721-074253-root.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31588 and previous config saved to /var/cache/conftool/dbconfig/20220721-074242-root.json
  • 07:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T312863)', diff saved to https://phabricator.wikimedia.org/P31587 and previous config saved to /var/cache/conftool/dbconfig/20220721-073502-ladsgroup.json
  • 07:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T312863)', diff saved to https://phabricator.wikimedia.org/P31586 and previous config saved to /var/cache/conftool/dbconfig/20220721-073251-ladsgroup.json
  • 07:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T312863)', diff saved to https://phabricator.wikimedia.org/P31585 and previous config saved to /var/cache/conftool/dbconfig/20220721-073217-ladsgroup.json
  • 07:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS bullseye
  • 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312990)', diff saved to https://phabricator.wikimedia.org/P31584 and previous config saved to /var/cache/conftool/dbconfig/20220721-072934-marostegui.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P31583 and previous config saved to /var/cache/conftool/dbconfig/20220721-072749-root.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31582 and previous config saved to /var/cache/conftool/dbconfig/20220721-072738-root.json
  • 07:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T312990)', diff saved to https://phabricator.wikimedia.org/P31581 and previous config saved to /var/cache/conftool/dbconfig/20220721-071953-marostegui.json
  • 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312990)', diff saved to https://phabricator.wikimedia.org/P31580 and previous config saved to /var/cache/conftool/dbconfig/20220721-071932-marostegui.json
  • 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2026.codfw.wmnet with reason: host reimage
  • 07:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2026.codfw.wmnet with reason: host reimage
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P31579 and previous config saved to /var/cache/conftool/dbconfig/20220721-071245-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31578 and previous config saved to /var/cache/conftool/dbconfig/20220721-071234-root.json
  • 07:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2020.codfw.wmnet to cluster codfw and group B
  • 07:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2020.codfw.wmnet to cluster codfw and group B
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31577 and previous config saved to /var/cache/conftool/dbconfig/20220721-070427-marostegui.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P31576 and previous config saved to /var/cache/conftool/dbconfig/20220721-065741-root.json
  • 06:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS bullseye
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31575 and previous config saved to /var/cache/conftool/dbconfig/20220721-065730-root.json
  • 06:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2009.codfw.wmnet with OS bullseye
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31574 and previous config saved to /var/cache/conftool/dbconfig/20220721-064922-marostegui.json
  • 06:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 06:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P31573 and previous config saved to /var/cache/conftool/dbconfig/20220721-064237-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31572 and previous config saved to /var/cache/conftool/dbconfig/20220721-064226-root.json
  • 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2009.codfw.wmnet with reason: host reimage
  • 06:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 06:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 06:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2009.codfw.wmnet with reason: host reimage
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312990)', diff saved to https://phabricator.wikimedia.org/P31571 and previous config saved to /var/cache/conftool/dbconfig/20220721-063417-marostegui.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P31570 and previous config saved to /var/cache/conftool/dbconfig/20220721-062733-root.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31569 and previous config saved to /var/cache/conftool/dbconfig/20220721-062722-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T312990)', diff saved to https://phabricator.wikimedia.org/P31568 and previous config saved to /var/cache/conftool/dbconfig/20220721-062431-marostegui.json
  • 06:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 06:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 06:18 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2009.codfw.wmnet with OS bullseye
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P31567 and previous config saved to /var/cache/conftool/dbconfig/20220721-061228-root.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31566 and previous config saved to /var/cache/conftool/dbconfig/20220721-061217-root.json
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T313398', diff saved to https://phabricator.wikimedia.org/P31565 and previous config saved to /var/cache/conftool/dbconfig/20220721-061145-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1103 to x1 primary and set section read-write T313398', diff saved to https://phabricator.wikimedia.org/P31564 and previous config saved to /var/cache/conftool/dbconfig/20220721-061001-root.json
  • 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 06:08 marostegui: Starting x1 eqiad failover from db1120 to db1103 - T313398
  • 06:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P31563 and previous config saved to /var/cache/conftool/dbconfig/20220721-060427-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1136 to s7 primary and set section read-write T313383', diff saved to https://phabricator.wikimedia.org/P31562 and previous config saved to /var/cache/conftool/dbconfig/20220721-060112-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T313383', diff saved to https://phabricator.wikimedia.org/P31561 and previous config saved to /var/cache/conftool/dbconfig/20220721-060037-marostegui.json
  • 06:00 marostegui: Starting s7 eqiad failover from db1181 to db1136 - T313383
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1103 with weight 0 T313398', diff saved to https://phabricator.wikimedia.org/P31560 and previous config saved to /var/cache/conftool/dbconfig/20220721-051752-root.json
  • 05:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T313398
  • 05:14 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T313398
  • 05:14 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T313383
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1136 with weight 0 T313383', diff saved to https://phabricator.wikimedia.org/P31559 and previous config saved to /var/cache/conftool/dbconfig/20220721-051358-root.json
  • 05:13 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T313383
  • 00:44 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135

2022-07-20

  • 23:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2072.codfw.wmnet with OS bullseye
  • 23:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2067.codfw.wmnet with OS bullseye
  • 23:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2070.codfw.wmnet with OS bullseye
  • 23:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2071.codfw.wmnet with OS bullseye
  • 23:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2068.codfw.wmnet with OS bullseye
  • 23:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2072.codfw.wmnet with reason: host reimage
  • 23:29 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2070.codfw.wmnet with reason: host reimage
  • 23:29 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic2067.codfw.wmnet with reason: host reimage
  • 23:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2071.codfw.wmnet with reason: host reimage
  • 23:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2068.codfw.wmnet with reason: host reimage
  • 23:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2071.codfw.wmnet with reason: host reimage
  • 23:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2072.codfw.wmnet with reason: host reimage
  • 23:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2070.codfw.wmnet with reason: host reimage
  • 23:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2067.codfw.wmnet with reason: host reimage
  • 23:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2068.codfw.wmnet with reason: host reimage
  • 23:11 ryankemper: T300943 Fixed IPMI passwords for elastic `20[67,68,70,71,72]`, reimaging them to bullseye (these hosts are not in service, thus the batch operation)
  • 23:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2072.codfw.wmnet with OS bullseye
  • 23:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2071.codfw.wmnet with OS bullseye
  • 23:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2070.codfw.wmnet with OS bullseye
  • 23:07 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2068.codfw.wmnet with OS bullseye
  • 23:07 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2067.codfw.wmnet with OS bullseye
  • 21:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 21:45 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 20:45 cjming: end of UTC late backport window
  • 20:43 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Deploy the new grid layout to group 1 (T312241) (duration: 03m 16s)
  • 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:38 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Deploy the new grid layout to group 1 (T312241) (duration: 03m 14s)
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2032.codfw.wmnet with OS bullseye
  • 20:27 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable DiscussionTools visualenhancements as beta feature on partner wikis (T312670) (duration: 03m 26s)
  • 20:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T312990)', diff saved to https://phabricator.wikimedia.org/P31555 and previous config saved to /var/cache/conftool/dbconfig/20220720-201240-marostegui.json
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2032.codfw.wmnet with reason: host reimage
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable DiscussionTools visualenhancements as beta feature on partner wikis (T312670) (duration: 03m 10s)
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:08 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2032.codfw.wmnet with reason: host reimage
  • 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31554 and previous config saved to /var/cache/conftool/dbconfig/20220720-195734-marostegui.json
  • 19:54 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2032.codfw.wmnet with OS bullseye
  • 19:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 19:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:45 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.21 refs T308074 (duration: 02m 53s)
  • 19:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31553 and previous config saved to /var/cache/conftool/dbconfig/20220720-194229-marostegui.json
  • 19:42 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.21 refs T308074
  • 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:33 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/3D/src/PatentFormField.php: Backport: PatentFormField: pass on $this->mParent to HTMLRadioField constructor (T313432) (duration: 03m 08s)
  • 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T312990)', diff saved to https://phabricator.wikimedia.org/P31552 and previous config saved to /var/cache/conftool/dbconfig/20220720-192724-marostegui.json
  • 19:17 jeena: that should be revert group1 wikis to 1.39.0-wmf.19
  • 19:13 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group[0|1] wikis to [VERSION]"
  • 18:37 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 18:35 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2045.codfw.wmnet with OS bullseye
  • 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T312990)', diff saved to https://phabricator.wikimedia.org/P31551 and previous config saved to /var/cache/conftool/dbconfig/20220720-182710-marostegui.json
  • 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 18:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 18:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 18:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 18:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 15 hosts with reason: Maintenance
  • 18:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 15 hosts with reason: Maintenance
  • 18:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T312990)', diff saved to https://phabricator.wikimedia.org/P31550 and previous config saved to /var/cache/conftool/dbconfig/20220720-182339-marostegui.json
  • 18:17 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2045.codfw.wmnet with OS bullseye
  • 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:16 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.21 refs T308074 (duration: 03m 07s)
  • 18:15 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.21 refs T308074
  • 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31549 and previous config saved to /var/cache/conftool/dbconfig/20220720-180834-marostegui.json
  • 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31548 and previous config saved to /var/cache/conftool/dbconfig/20220720-175328-marostegui.json
  • 17:51 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 17:50 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 17:38 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 17:38 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2048.codfw.wmnet with OS bullseye
  • 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T312990)', diff saved to https://phabricator.wikimedia.org/P31547 and previous config saved to /var/cache/conftool/dbconfig/20220720-173823-marostegui.json
  • 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T312990)', diff saved to https://phabricator.wikimedia.org/P31546 and previous config saved to /var/cache/conftool/dbconfig/20220720-173522-marostegui.json
  • 17:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 17:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T312990)', diff saved to https://phabricator.wikimedia.org/P31545 and previous config saved to /var/cache/conftool/dbconfig/20220720-173502-marostegui.json
  • 17:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2048.codfw.wmnet with reason: host reimage
  • 17:25 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2048.codfw.wmnet with reason: host reimage
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31544 and previous config saved to /var/cache/conftool/dbconfig/20220720-171956-marostegui.json
  • 17:12 rzl: rzl@cumin2002:~$ sudo cumin A:mw 'enable-puppet 815759'
  • 17:05 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2048.codfw.wmnet with OS bullseye
  • 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31543 and previous config saved to /var/cache/conftool/dbconfig/20220720-170451-marostegui.json
  • 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T312990)', diff saved to https://phabricator.wikimedia.org/P31542 and previous config saved to /var/cache/conftool/dbconfig/20220720-164946-marostegui.json
  • 16:49 rzl: rzl@cumin2002:~$ sudo cumin A:mw 'disable-puppet 815759'
  • 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1109 (T312990)', diff saved to https://phabricator.wikimedia.org/P31541 and previous config saved to /var/cache/conftool/dbconfig/20220720-164638-marostegui.json
  • 16:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 16:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T312990)', diff saved to https://phabricator.wikimedia.org/P31540 and previous config saved to /var/cache/conftool/dbconfig/20220720-164618-marostegui.json
  • 16:40 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31539 and previous config saved to /var/cache/conftool/dbconfig/20220720-163113-marostegui.json
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31538 and previous config saved to /var/cache/conftool/dbconfig/20220720-161608-marostegui.json
  • 16:05 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T312990)', diff saved to https://phabricator.wikimedia.org/P31537 and previous config saved to /var/cache/conftool/dbconfig/20220720-160103-marostegui.json
  • 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T312990)', diff saved to https://phabricator.wikimedia.org/P31536 and previous config saved to /var/cache/conftool/dbconfig/20220720-155752-marostegui.json
  • 15:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 15:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T312990)', diff saved to https://phabricator.wikimedia.org/P31535 and previous config saved to /var/cache/conftool/dbconfig/20220720-155732-marostegui.json
  • 15:57 dancy@deploy1002: Installation of scap version "4.11.2" completed for 557 hosts
  • 15:56 dancy@deploy1002: Installing scap version "4.11.2" for 557 hosts
  • 15:50 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
  • 15:46 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2036.codfw.wmnet with OS bullseye
  • 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31534 and previous config saved to /var/cache/conftool/dbconfig/20220720-154227-marostegui.json
  • 15:39 dancy@deploy1002: rebuilt and synchronized wikiversions files: testing
  • 15:35 dancy@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
  • 15:28 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31532 and previous config saved to /var/cache/conftool/dbconfig/20220720-152721-marostegui.json
  • 15:26 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
  • 15:26 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 15:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2036.codfw.wmnet with reason: host reimage
  • 15:20 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2036.codfw.wmnet with reason: host reimage
  • 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Fix db2167:3318', diff saved to https://phabricator.wikimedia.org/P31531 and previous config saved to /var/cache/conftool/dbconfig/20220720-151711-marostegui.json
  • 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T312990)', diff saved to https://phabricator.wikimedia.org/P31530 and previous config saved to /var/cache/conftool/dbconfig/20220720-151216-marostegui.json
  • 15:10 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T312990)', diff saved to https://phabricator.wikimedia.org/P31529 and previous config saved to /var/cache/conftool/dbconfig/20220720-150908-marostegui.json
  • 15:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 15:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 15:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 15:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T312990)', diff saved to https://phabricator.wikimedia.org/P31528 and previous config saved to /var/cache/conftool/dbconfig/20220720-150730-marostegui.json
  • 15:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2036.codfw.wmnet with OS bullseye
  • 14:59 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31527 and previous config saved to /var/cache/conftool/dbconfig/20220720-145224-marostegui.json
  • 14:44 volans: installing spicearck 3.1.0 on cumin2002
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31524 and previous config saved to /var/cache/conftool/dbconfig/20220720-143719-marostegui.json
  • 14:36 volans: uploaded spicerack_3.1.0 to apt.wikimedia.org bullseye-wikimedia
  • 14:26 moritzm: installing containerd security updates in Kubernetes codfw masters
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T312990)', diff saved to https://phabricator.wikimedia.org/P31523 and previous config saved to /var/cache/conftool/dbconfig/20220720-142214-marostegui.json
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T312990)', diff saved to https://phabricator.wikimedia.org/P31522 and previous config saved to /var/cache/conftool/dbconfig/20220720-141912-marostegui.json
  • 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T312990)', diff saved to https://phabricator.wikimedia.org/P31521 and previous config saved to /var/cache/conftool/dbconfig/20220720-141851-marostegui.json
  • 14:04 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31520 and previous config saved to /var/cache/conftool/dbconfig/20220720-140346-marostegui.json
  • 14:03 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/WikibaseLexeme/WikibaseLexeme.resources.php: Backport: Load Special:NewLexemeAlpha RL modules on mobile (T313116) (2/2) (duration: 03m 02s)
  • 14:02 jbond: disable puppet on A:cp to deplot Gerrit:768766
  • 13:59 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/WikibaseLexeme/src/MediaWiki/Config/LexemeLanguageCodePropertyIdConfig.php: Backport: Load Special:NewLexemeAlpha RL modules on mobile (T313116) (1/2) (duration: 02m 56s)
  • 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:54 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 /srv/mediawiki-staging (master $ u=) $ git -C php-1.39.0-wmf.19/extensions/WikibaseLexeme am --skip # T308659 backport already applied
  • 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2034.codfw.wmnet with OS bullseye
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31519 and previous config saved to /var/cache/conftool/dbconfig/20220720-134841-marostegui.json
  • 13:45 moritzm: installing containerd security updates in Kubernetes codfw cluster
  • 13:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/WikibaseLexeme/WikibaseLexeme.resources.php: Backport: Load Special:NewLexemeAlpha RL modules on mobile (T313116) (2/2) (duration: 03m 08s)
  • 13:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/WikibaseLexeme/src/MediaWiki/Config/LexemeLanguageCodePropertyIdConfig.php: Backport: Load Special:NewLexemeAlpha RL modules on mobile (T313116) (1/2) (duration: 03m 34s)
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:35 moritzm: installing request-tracker4 security updates
  • 13:33 XioNoX: cr2-eqiad# deactivate interfaces xe-3/3/0 - T313337
  • 13:33 XioNoX: cr2-eqiad# deactivate interfaces xe-3/3/0 -
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T312990)', diff saved to https://phabricator.wikimedia.org/P31518 and previous config saved to /var/cache/conftool/dbconfig/20220720-133336-marostegui.json
  • 13:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2034.codfw.wmnet with reason: host reimage
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T312990)', diff saved to https://phabricator.wikimedia.org/P31517 and previous config saved to /var/cache/conftool/dbconfig/20220720-133030-marostegui.json
  • 13:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 13:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T312990)', diff saved to https://phabricator.wikimedia.org/P31516 and previous config saved to /var/cache/conftool/dbconfig/20220720-133010-marostegui.json
  • 13:29 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2034.codfw.wmnet with reason: host reimage
  • 13:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2034.codfw.wmnet with OS bullseye
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31515 and previous config saved to /var/cache/conftool/dbconfig/20220720-131505-marostegui.json
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31514 and previous config saved to /var/cache/conftool/dbconfig/20220720-130000-marostegui.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T312990)', diff saved to https://phabricator.wikimedia.org/P31513 and previous config saved to /var/cache/conftool/dbconfig/20220720-124453-marostegui.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T312990)', diff saved to https://phabricator.wikimedia.org/P31512 and previous config saved to /var/cache/conftool/dbconfig/20220720-124042-marostegui.json
  • 12:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 12:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T312990)', diff saved to https://phabricator.wikimedia.org/P31511 and previous config saved to /var/cache/conftool/dbconfig/20220720-123751-marostegui.json
  • 12:29 marostegui: Move pc1014 from pc2 to pc3 T313401
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31510 and previous config saved to /var/cache/conftool/dbconfig/20220720-122246-marostegui.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31509 and previous config saved to /var/cache/conftool/dbconfig/20220720-120738-marostegui.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T312990)', diff saved to https://phabricator.wikimedia.org/P31507 and previous config saved to /var/cache/conftool/dbconfig/20220720-115233-marostegui.json
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T312990)', diff saved to https://phabricator.wikimedia.org/P31506 and previous config saved to /var/cache/conftool/dbconfig/20220720-113424-marostegui.json
  • 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 11:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2009.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 11:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2009.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
  • 11:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.1 - ayounsi@cumin1001
  • 11:05 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.1 - ayounsi@cumin1001
  • 11:03 moritzm: draining ganeti2014 T310483
  • 10:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
  • 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2020.codfw.wmnet with OS bullseye
  • 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 12 hosts with reason: Maintenance
  • 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 12 hosts with reason: Maintenance
  • 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 10:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312990)', diff saved to https://phabricator.wikimedia.org/P31504 and previous config saved to /var/cache/conftool/dbconfig/20220720-103825-marostegui.json
  • 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 10:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2020.codfw.wmnet with reason: host reimage
  • 10:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2020.codfw.wmnet with reason: host reimage
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31503 and previous config saved to /var/cache/conftool/dbconfig/20220720-102320-marostegui.json
  • 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 10:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2020.codfw.wmnet with OS bullseye
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31502 and previous config saved to /var/cache/conftool/dbconfig/20220720-100815-marostegui.json
  • 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2029.codfw.wmnet to cluster codfw and group A
  • 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2029.codfw.wmnet to cluster codfw and group A
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312990)', diff saved to https://phabricator.wikimedia.org/P31501 and previous config saved to /var/cache/conftool/dbconfig/20220720-095310-marostegui.json
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 09:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T312990)', diff saved to https://phabricator.wikimedia.org/P31499 and previous config saved to /var/cache/conftool/dbconfig/20220720-085256-marostegui.json
  • 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312990)', diff saved to https://phabricator.wikimedia.org/P31498 and previous config saved to /var/cache/conftool/dbconfig/20220720-085236-marostegui.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31497 and previous config saved to /var/cache/conftool/dbconfig/20220720-083731-marostegui.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31496 and previous config saved to /var/cache/conftool/dbconfig/20220720-082226-marostegui.json
  • 08:14 elukey: apt-get clean on archiva1002 to free some space
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312990)', diff saved to https://phabricator.wikimedia.org/P31495 and previous config saved to /var/cache/conftool/dbconfig/20220720-080721-marostegui.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T312990)', diff saved to https://phabricator.wikimedia.org/P31494 and previous config saved to /var/cache/conftool/dbconfig/20220720-080509-marostegui.json
  • 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T312990)', diff saved to https://phabricator.wikimedia.org/P31493 and previous config saved to /var/cache/conftool/dbconfig/20220720-080442-marostegui.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31492 and previous config saved to /var/cache/conftool/dbconfig/20220720-074937-marostegui.json
  • 07:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
  • 07:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
  • 07:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31491 and previous config saved to /var/cache/conftool/dbconfig/20220720-073432-marostegui.json
  • 07:31 jayme: ml-serve1002.eqiad.wmnet,ml-serve1004.eqiad.wmnet 'systemctl restart rsyslog'
  • 07:30 taavi@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/SecurePoll/cli/wm-scripts/bv2022/populateEditCount.php: T309753 backports (duration: 02m 54s)
  • 07:30 jayme: kubernetes1010.eqiad.wmnet,kubernetes1020.eqiad.wmnet 'systemctl restart rsyslog'
  • 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:26 taavi@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/SecurePoll/cli/wm-scripts/bv2022/: T309753 backports (duration: 02m 57s)
  • 07:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T312990)', diff saved to https://phabricator.wikimedia.org/P31490 and previous config saved to /var/cache/conftool/dbconfig/20220720-071927-marostegui.json
  • 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2029.codfw.wmnet with OS bullseye
  • 07:14 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable ContentTranslation out of Beta for sswiki (T309384) (duration: 03m 24s)
  • 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T312990)', diff saved to https://phabricator.wikimedia.org/P31489 and previous config saved to /var/cache/conftool/dbconfig/20220720-071114-marostegui.json
  • 07:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 07:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312990)', diff saved to https://phabricator.wikimedia.org/P31488 and previous config saved to /var/cache/conftool/dbconfig/20220720-071054-marostegui.json
  • 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2029.codfw.wmnet with reason: host reimage
  • 06:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2029.codfw.wmnet with reason: host reimage
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31487 and previous config saved to /var/cache/conftool/dbconfig/20220720-065549-marostegui.json
  • 06:43 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2029.codfw.wmnet with OS bullseye
  • 06:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 06:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31486 and previous config saved to /var/cache/conftool/dbconfig/20220720-064044-marostegui.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312990)', diff saved to https://phabricator.wikimedia.org/P31485 and previous config saved to /var/cache/conftool/dbconfig/20220720-062539-marostegui.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T312990)', diff saved to https://phabricator.wikimedia.org/P31484 and previous config saved to /var/cache/conftool/dbconfig/20220720-062327-marostegui.json
  • 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312990)', diff saved to https://phabricator.wikimedia.org/P31483 and previous config saved to /var/cache/conftool/dbconfig/20220720-062307-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31482 and previous config saved to /var/cache/conftool/dbconfig/20220720-060802-marostegui.json
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31481 and previous config saved to /var/cache/conftool/dbconfig/20220720-055256-marostegui.json
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312990)', diff saved to https://phabricator.wikimedia.org/P31480 and previous config saved to /var/cache/conftool/dbconfig/20220720-053751-marostegui.json
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T312990)', diff saved to https://phabricator.wikimedia.org/P31479 and previous config saved to /var/cache/conftool/dbconfig/20220720-053620-marostegui.json
  • 05:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312990)', diff saved to https://phabricator.wikimedia.org/P31478 and previous config saved to /var/cache/conftool/dbconfig/20220720-053520-marostegui.json
  • 05:26 marostegui: Stop mysql on db2087 (s6 and s7) to clone db2169 T311493
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31475 and previous config saved to /var/cache/conftool/dbconfig/20220720-052014-marostegui.json
  • 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31474 and previous config saved to /var/cache/conftool/dbconfig/20220720-050509-marostegui.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2168 to dbctl in s7 and s8 T311493', diff saved to https://phabricator.wikimedia.org/P31473 and previous config saved to /var/cache/conftool/dbconfig/20220720-045918-marostegui.json
  • 04:57 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312990)', diff saved to https://phabricator.wikimedia.org/P31472 and previous config saved to /var/cache/conftool/dbconfig/20220720-045004-marostegui.json
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T312990)', diff saved to https://phabricator.wikimedia.org/P31471 and previous config saved to /var/cache/conftool/dbconfig/20220720-044729-marostegui.json
  • 04:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 04:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 04:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 04:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 04:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 04:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 04:10 rzl: rzl@kubemaster1001:~$ sudo systemctl restart kube-apiserver
  • 04:08 rzl: rzl@kubemaster1002:~$ sudo systemctl restart kube-apiserver
  • 03:48 rzl: rzl@cumin2002:~$ sudo cumin dbproxy[1019,1020,1021].eqiad.wmnet 'systemctl reload haproxy'
  • 03:37 rzl: rzl@dbproxy1018:~$ sudo systemctl reload haproxy
  • 03:30 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 03:19 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host elastic2060.codfw.wmnet with OS bullseye
  • 03:19 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2060.codfw.wmnet with OS bullseye
  • 03:10 tstarling@deploy1002: Finished scap: revert yue -> zh fallback, needs LC rebuild in both branches T296188 (duration: 19m 41s)
  • 02:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:51 tstarling@deploy1002: Started scap: revert yue -> zh fallback, needs LC rebuild in both branches T296188
  • 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2052.codfw.wmnet with OS bullseye
  • 01:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2052.codfw.wmnet with reason: host reimage
  • 01:24 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2052.codfw.wmnet with reason: host reimage
  • 01:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2052.codfw.wmnet with OS bullseye
  • 01:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2051.codfw.wmnet with OS bullseye
  • 00:43 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2051.codfw.wmnet with reason: host reimage
  • 00:39 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2051.codfw.wmnet with reason: host reimage
  • 00:22 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS bullseye

2022-07-19

  • 22:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 8 hosts with reason: Maintenance
  • 22:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 8 hosts with reason: Maintenance
  • 22:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312990)', diff saved to https://phabricator.wikimedia.org/P31470 and previous config saved to /var/cache/conftool/dbconfig/20220719-225828-marostegui.json
  • 22:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2050.codfw.wmnet with OS bullseye
  • 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31469 and previous config saved to /var/cache/conftool/dbconfig/20220719-224323-marostegui.json
  • 22:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2050.codfw.wmnet with reason: host reimage
  • 22:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2050.codfw.wmnet with reason: host reimage
  • 22:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31468 and previous config saved to /var/cache/conftool/dbconfig/20220719-222818-marostegui.json
  • 22:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312990)', diff saved to https://phabricator.wikimedia.org/P31467 and previous config saved to /var/cache/conftool/dbconfig/20220719-221312-marostegui.json
  • 22:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T312990)', diff saved to https://phabricator.wikimedia.org/P31466 and previous config saved to /var/cache/conftool/dbconfig/20220719-221035-marostegui.json
  • 22:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:10 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2050.codfw.wmnet with OS bullseye
  • 22:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 22:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 22:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312990)', diff saved to https://phabricator.wikimedia.org/P31465 and previous config saved to /var/cache/conftool/dbconfig/20220719-220946-marostegui.json
  • 22:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31464 and previous config saved to /var/cache/conftool/dbconfig/20220719-215441-marostegui.json
  • 21:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:45 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.21 refs T308074
  • 21:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31463 and previous config saved to /var/cache/conftool/dbconfig/20220719-213936-marostegui.json
  • 21:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2026.codfw.wmnet with OS bullseye
  • 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:36 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.21 refs T308074 (duration: 04m 02s)
  • 21:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:32 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21 refs T308074
  • 21:26 dancy@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Config: MWConfigCacheGenerator: If opcache.revalidate_freq is 0, use grace period of 10 seconds (T311788) (duration: 02m 59s)
  • 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312990)', diff saved to https://phabricator.wikimedia.org/P31462 and previous config saved to /var/cache/conftool/dbconfig/20220719-212431-marostegui.json
  • 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T312990)', diff saved to https://phabricator.wikimedia.org/P31461 and previous config saved to /var/cache/conftool/dbconfig/20220719-212149-marostegui.json
  • 21:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 21:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312990)', diff saved to https://phabricator.wikimedia.org/P31460 and previous config saved to /var/cache/conftool/dbconfig/20220719-212128-marostegui.json
  • 21:17 jforrester@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/Scribunto/includes/Hooks.php: Train unblocker: Hooks: Bump scribunto-stats cache version (T313341) (duration: 03m 14s)
  • 21:16 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2026.codfw.wmnet with reason: host reimage
  • 21:14 cjming: end of UTC late backport window
  • 21:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2026.codfw.wmnet with reason: host reimage
  • 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:13 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: uzwiki: Create "eliminator" group (T302670) (duration: 03m 13s)
  • 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:07 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: uzwiki: Create "eliminator" group (T302670) (duration: 03m 19s)
  • 21:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31459 and previous config saved to /var/cache/conftool/dbconfig/20220719-210623-marostegui.json
  • 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2026.codfw.wmnet with OS bullseye
  • 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:00 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add "uploader" user group for kswiki. (T305320) (duration: 02m 58s)
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:56 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add file mover user group for azwiki (T304968) (duration: 02m 52s)
  • 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31458 and previous config saved to /var/cache/conftool/dbconfig/20220719-205118-marostegui.json
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:43 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add file mover user group for azwiki (T304968) (duration: 03m 15s)
  • 20:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2055.codfw.wmnet with OS bullseye
  • 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:36 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 53s)
  • 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312990)', diff saved to https://phabricator.wikimedia.org/P31457 and previous config saved to /var/cache/conftool/dbconfig/20220719-203613-marostegui.json
  • 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T312990)', diff saved to https://phabricator.wikimedia.org/P31456 and previous config saved to /var/cache/conftool/dbconfig/20220719-203327-marostegui.json
  • 20:33 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 03m 09s)
  • 20:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 20:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312990)', diff saved to https://phabricator.wikimedia.org/P31455 and previous config saved to /var/cache/conftool/dbconfig/20220719-203307-marostegui.json
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:29 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [wmf-config]: Undeploy GDI Survey Wave 2 (T312866) (duration: 03m 12s)
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2055.codfw.wmnet with reason: host reimage
  • 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:23 cjming@deploy1002: Synchronized wmf-config: Config: Deploy the new grid layout to group 0 wikis (T312241) (duration: 03m 05s)
  • 20:21 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2055.codfw.wmnet with reason: host reimage
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31454 and previous config saved to /var/cache/conftool/dbconfig/20220719-201802-marostegui.json
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:17 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: cirrus: Dont recycle completion suggester indices (duration: 03m 12s)
  • 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:09 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "testwikis to 1.39.0-wmf.19"
  • 20:06 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2055.codfw.wmnet with OS bullseye
  • 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31453 and previous config saved to /var/cache/conftool/dbconfig/20220719-200257-marostegui.json
  • 20:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:51 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.39.0-wmf.19"
  • 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312990)', diff saved to https://phabricator.wikimedia.org/P31452 and previous config saved to /var/cache/conftool/dbconfig/20220719-194752-marostegui.json
  • 19:29 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2056.codfw.wmnet with OS bullseye
  • 19:27 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
  • 19:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2069.codfw.wmnet with OS bullseye
  • 19:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T312990)', diff saved to https://phabricator.wikimedia.org/P31451 and previous config saved to /var/cache/conftool/dbconfig/20220719-192207-marostegui.json
  • 19:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 19:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 19:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312990)', diff saved to https://phabricator.wikimedia.org/P31450 and previous config saved to /var/cache/conftool/dbconfig/20220719-192147-marostegui.json
  • 19:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2069.codfw.wmnet with reason: host reimage
  • 19:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31449 and previous config saved to /var/cache/conftool/dbconfig/20220719-190642-marostegui.json
  • 19:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2056.codfw.wmnet with reason: host reimage
  • 19:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:04 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2069.codfw.wmnet with reason: host reimage
  • 19:02 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
  • 19:02 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2056.codfw.wmnet with reason: host reimage
  • 18:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31448 and previous config saved to /var/cache/conftool/dbconfig/20220719-185137-marostegui.json
  • 18:50 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2069.codfw.wmnet with OS bullseye
  • 18:49 dancy@deploy1002: Pruned MediaWiki: 1.39.0-wmf.17, 1.39.0-wmf.18 (duration: 02m 09s)
  • 18:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:44 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2056.codfw.wmnet with OS bullseye
  • 18:42 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312990)', diff saved to https://phabricator.wikimedia.org/P31447 and previous config saved to /var/cache/conftool/dbconfig/20220719-183632-marostegui.json
  • 18:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T312990)', diff saved to https://phabricator.wikimedia.org/P31446 and previous config saved to /var/cache/conftool/dbconfig/20220719-183351-marostegui.json
  • 18:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 18:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 18:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312990)', diff saved to https://phabricator.wikimedia.org/P31445 and previous config saved to /var/cache/conftool/dbconfig/20220719-183330-marostegui.json
  • 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:18 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
  • 18:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31444 and previous config saved to /var/cache/conftool/dbconfig/20220719-181825-marostegui.json
  • 18:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.21 refs T308074
  • 18:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31443 and previous config saved to /var/cache/conftool/dbconfig/20220719-180320-marostegui.json
  • 17:51 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.21 refs T308074 (duration: 04m 24s)
  • 17:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312990)', diff saved to https://phabricator.wikimedia.org/P31442 and previous config saved to /var/cache/conftool/dbconfig/20220719-174815-marostegui.json
  • 17:46 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21 refs T308074
  • 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T312990)', diff saved to https://phabricator.wikimedia.org/P31441 and previous config saved to /var/cache/conftool/dbconfig/20220719-174537-marostegui.json
  • 17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 17:45 jhuneidi@deploy1002: Installation of scap version "4.11.1" completed for 557 hosts
  • 17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T312990)', diff saved to https://phabricator.wikimedia.org/P31440 and previous config saved to /var/cache/conftool/dbconfig/20220719-174517-marostegui.json
  • 17:45 jhuneidi@deploy1002: Installing scap version "4.11.1" for 557 hosts
  • 17:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31439 and previous config saved to /var/cache/conftool/dbconfig/20220719-173012-marostegui.json
  • 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31438 and previous config saved to /var/cache/conftool/dbconfig/20220719-171507-marostegui.json
  • 17:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:06 jhuneidi@deploy1002: scap failed: ValueError php_fpm expected targets, 0 given (duration: 37m 54s)
  • 17:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T312990)', diff saved to https://phabricator.wikimedia.org/P31437 and previous config saved to /var/cache/conftool/dbconfig/20220719-170002-marostegui.json
  • 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T312990)', diff saved to https://phabricator.wikimedia.org/P31436 and previous config saved to /var/cache/conftool/dbconfig/20220719-165747-marostegui.json
  • 16:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 16:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 16:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1028.eqiad.wmnet
  • 16:43 XioNoX: cr2-eqiad# run request chassis fpc slot 3 offline
  • 16:42 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1028.eqiad.wmnet
  • 16:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:28 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21 refs T308074
  • 16:23 jhuneidi@deploy1002: scap failed: PermissionError [Errno 13] Permission denied: '/srv/mediawiki-staging/php-1.39.0-wmf.19/cache/gitinfo/info-extensions-FileImporter.json' (duration: 00m 00s)
  • 16:23 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21 refs T308074
  • 16:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 16:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 16:18 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
  • 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 8 hosts with reason: Maintenance
  • 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 8 hosts with reason: Maintenance
  • 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 16:18 XioNoX: drain traffic away from cr2-eqiad:fpc3 - T312745
  • 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312990)', diff saved to https://phabricator.wikimedia.org/P31435 and previous config saved to /var/cache/conftool/dbconfig/20220719-161803-marostegui.json
  • 16:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:14 jhuneidi@deploy1002: scap failed: PermissionError [Errno 13] Permission denied: '/srv/mediawiki-staging/php-1.39.0-wmf.19/cache/gitinfo/info-extensions-GrowthExperiments.json' (duration: 00m 00s)
  • 16:14 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21 refs T308074
  • 16:04 moritzm: installing node-minimist security updates
  • 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31434 and previous config saved to /var/cache/conftool/dbconfig/20220719-160258-marostegui.json
  • 15:58 moritzm: draining ganeti2020 T310483
  • 15:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2029.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 15:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2029.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 15:56 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 15:55 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31433 and previous config saved to /var/cache/conftool/dbconfig/20220719-154753-marostegui.json
  • 15:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312990)', diff saved to https://phabricator.wikimedia.org/P31432 and previous config saved to /var/cache/conftool/dbconfig/20220719-153248-marostegui.json
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T312990)', diff saved to https://phabricator.wikimedia.org/P31431 and previous config saved to /var/cache/conftool/dbconfig/20220719-153040-marostegui.json
  • 15:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 15:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312990)', diff saved to https://phabricator.wikimedia.org/P31430 and previous config saved to /var/cache/conftool/dbconfig/20220719-153009-marostegui.json
  • 15:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1007.wikimedia.org
  • 15:22 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 15:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:17 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31429 and previous config saved to /var/cache/conftool/dbconfig/20220719-151503-marostegui.json
  • 15:14 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
  • 15:13 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1001
  • 15:12 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest1001
  • 15:03 moritzm: installing nghttp2 security updates
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31427 and previous config saved to /var/cache/conftool/dbconfig/20220719-145958-marostegui.json
  • 14:50 moritzm: installing python-urlllib3 security updates
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312990)', diff saved to https://phabricator.wikimedia.org/P31426 and previous config saved to /var/cache/conftool/dbconfig/20220719-144453-marostegui.json
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T312990)', diff saved to https://phabricator.wikimedia.org/P31425 and previous config saved to /var/cache/conftool/dbconfig/20220719-144245-marostegui.json
  • 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T312990)', diff saved to https://phabricator.wikimedia.org/P31424 and previous config saved to /var/cache/conftool/dbconfig/20220719-144208-marostegui.json
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31423 and previous config saved to /var/cache/conftool/dbconfig/20220719-142703-marostegui.json
  • 14:23 dancy@deploy1002: Installation of scap version "4.11.0" completed for 557 hosts
  • 14:22 dancy@deploy1002: Installing scap version "4.11.0" for 557 hosts
  • 14:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sretest1001.eqiad.wmnet
  • 14:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:16 moritzm: installing glib2.0 security updates
  • 14:15 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31422 and previous config saved to /var/cache/conftool/dbconfig/20220719-141158-marostegui.json
  • 14:11 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts sretest1001.eqiad.wmnet
  • 13:58 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts sretest1001.eqiad.wmnet
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T312990)', diff saved to https://phabricator.wikimedia.org/P31421 and previous config saved to /var/cache/conftool/dbconfig/20220719-135652-marostegui.json
  • 13:55 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts sretest1001.eqiad.wmnet
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T312990)', diff saved to https://phabricator.wikimedia.org/P31420 and previous config saved to /var/cache/conftool/dbconfig/20220719-135532-marostegui.json
  • 13:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312990)', diff saved to https://phabricator.wikimedia.org/P31419 and previous config saved to /var/cache/conftool/dbconfig/20220719-135511-marostegui.json
  • 13:45 moritzm: installing cron security updates
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31418 and previous config saved to /var/cache/conftool/dbconfig/20220719-134006-marostegui.json
  • 13:37 marostegui: Stop mysql on db1132 to upgrade package
  • 13:34 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 13:33 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Resync after touching (duration: 02m 38s)
  • 13:32 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 13:30 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 13:28 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 13:28 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 13:27 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 13:26 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:25 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31417 and previous config saved to /var/cache/conftool/dbconfig/20220719-132501-marostegui.json
  • 13:24 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 13:23 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:22 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:22 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:21 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: brwikimedia: Use logo and wordmark in vector-2022 and minerva (T313194) (duration: 02m 48s)
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:16 hashar@deploy1002: Synchronized static/images/mobile/copyright: Config: brwikimedia: Add logo and wordmark for vector-2022 and minerva (T313194) (duration: 02m 57s)
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312990)', diff saved to https://phabricator.wikimedia.org/P31416 and previous config saved to /var/cache/conftool/dbconfig/20220719-130956-marostegui.json
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T312990)', diff saved to https://phabricator.wikimedia.org/P31415 and previous config saved to /var/cache/conftool/dbconfig/20220719-130736-marostegui.json
  • 13:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T312990)', diff saved to https://phabricator.wikimedia.org/P31414 and previous config saved to /var/cache/conftool/dbconfig/20220719-130716-marostegui.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31413 and previous config saved to /var/cache/conftool/dbconfig/20220719-125211-marostegui.json
  • 12:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts netboxdb1001.eqiad.wmnet
  • 12:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:45 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 12:45 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netboxdb1001.eqiad.wmnet
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31412 and previous config saved to /var/cache/conftool/dbconfig/20220719-123706-marostegui.json
  • 12:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netboxdb1001.eqiad.wmnet
  • 12:30 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:26 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 12:25 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netboxdb1001.eqiad.wmnet
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T312990)', diff saved to https://phabricator.wikimedia.org/P31411 and previous config saved to /var/cache/conftool/dbconfig/20220719-122201-marostegui.json
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T312990)', diff saved to https://phabricator.wikimedia.org/P31409 and previous config saved to /var/cache/conftool/dbconfig/20220719-121941-marostegui.json
  • 12:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312990)', diff saved to https://phabricator.wikimedia.org/P31408 and previous config saved to /var/cache/conftool/dbconfig/20220719-121921-marostegui.json
  • 12:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31407 and previous config saved to /var/cache/conftool/dbconfig/20220719-120416-marostegui.json
  • 12:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:01 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (T310777) (duration: 02m 49s)
  • 12:00 moritzm: upgrading ganeti/ulsfo to 3.0.2 T312637
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312984)', diff saved to https://phabricator.wikimedia.org/P31406 and previous config saved to /var/cache/conftool/dbconfig/20220719-115719-ladsgroup.json
  • 11:52 urbanecm@deploy1002: Synchronized langlist: Creating blkwiki (T310777) (duration: 02m 42s)
  • 11:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating blkwiki (T310777) (duration: 02m 35s)
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31405 and previous config saved to /var/cache/conftool/dbconfig/20220719-114911-marostegui.json
  • 11:46 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating blkwiki (T310777) (duration: 02m 49s)
  • 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 11:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 11:43 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating blkwiki (T310777) (duration: 02m 56s)
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31404 and previous config saved to /var/cache/conftool/dbconfig/20220719-114214-ladsgroup.json
  • 11:41 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating blkwiki (T310777)
  • 11:37 urbanecm@deploy1002: Synchronized dblists: Creating blkwiki (T310777) (duration: 02m 52s)
  • 11:34 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating blkwiki (T310777) (duration: 02m 47s)
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312990)', diff saved to https://phabricator.wikimedia.org/P31403 and previous config saved to /var/cache/conftool/dbconfig/20220719-113406-marostegui.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T312990)', diff saved to https://phabricator.wikimedia.org/P31401 and previous config saved to /var/cache/conftool/dbconfig/20220719-113158-marostegui.json
  • 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312990)', diff saved to https://phabricator.wikimedia.org/P31400 and previous config saved to /var/cache/conftool/dbconfig/20220719-113137-marostegui.json
  • 11:27 moritzm: remove ganeti 3.0.1-2+deb11u0 from buster-wikimedia, superceded by ganeti 3.0.2-1~deb11u1 from Bullseye 11.4 point release T312637
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31399 and previous config saved to /var/cache/conftool/dbconfig/20220719-112708-ladsgroup.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31398 and previous config saved to /var/cache/conftool/dbconfig/20220719-111632-marostegui.json
  • 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312984)', diff saved to https://phabricator.wikimedia.org/P31397 and previous config saved to /var/cache/conftool/dbconfig/20220719-111203-ladsgroup.json
  • 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2002.codfw.wmnet with reason: Switch instance to plain, T311686
  • 11:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2002.codfw.wmnet with reason: Switch instance to plain, T311686
  • 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31396 and previous config saved to /var/cache/conftool/dbconfig/20220719-110127-marostegui.json
  • 11:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netboxdb2001.codfw.wmnet
  • 11:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:59 moritzm: draining ganeti2020 T310483
  • 10:56 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312990)', diff saved to https://phabricator.wikimedia.org/P31395 and previous config saved to /var/cache/conftool/dbconfig/20220719-104622-marostegui.json
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T312984)', diff saved to https://phabricator.wikimedia.org/P31394 and previous config saved to /var/cache/conftool/dbconfig/20220719-104559-ladsgroup.json
  • 10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T312990)', diff saved to https://phabricator.wikimedia.org/P31393 and previous config saved to /var/cache/conftool/dbconfig/20220719-104414-marostegui.json
  • 10:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P31392 and previous config saved to /var/cache/conftool/dbconfig/20220719-103341-root.json
  • 10:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd2002.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 10:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2002.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 10:05 elukey: reboot an-worker1127 - hdfs datanode caused CPU stalls
  • 10:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 09:50 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netboxdb2001.codfw.wmnet
  • 09:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox1001.wikimedia.org
  • 09:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:46 moritzm: draining ganeti2029 T310483
  • 09:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:40 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netbox1001.wikimedia.org
  • 09:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox2001.wikimedia.org
  • 09:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 09:34 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:29 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netbox2001.wikimedia.org
  • 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:00 urbanecm: Deployed patch for T313205
  • 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2018.codfw.wmnet to cluster codfw and group D
  • 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2018.codfw.wmnet to cluster codfw and group D
  • 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
  • 08:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
  • 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 07:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust db2167:3311 and db2167:3318 weight T311493', diff saved to https://phabricator.wikimedia.org/P31390 and previous config saved to /var/cache/conftool/dbconfig/20220719-071836-marostegui.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2167:3311 and db2167:3318 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P31389 and previous config saved to /var/cache/conftool/dbconfig/20220719-071656-marostegui.json
  • 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2084.codfw.wmnet
  • 06:56 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:51 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2084.codfw.wmnet
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2084 from dbctl T313121', diff saved to https://phabricator.wikimedia.org/P31386 and previous config saved to /var/cache/conftool/dbconfig/20220719-051725-marostegui.json
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-07-18

  • 23:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1050.eqiad.wmnet
  • 23:46 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1050.eqiad.wmnet
  • 23:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt1049.eqiad.wmnet
  • 23:07 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1049.eqiad.wmnet
  • 21:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:36 sbassett: Deployed security fix for T309894
  • 20:58 ebernhardson: start reindex of all wikis except commonswiki and wikidatawiki in eqiad and codfw cirrus clusters
  • 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:45 urbanecm: UTC late B&C window finished
  • 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:45 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/CirrusSearch/: 930ecb7: reindex: Detect index type from live mappings (duration: 02m 55s)
  • 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8d1663c: Turn off fixed width in main namespace on Wikisource ( T311607) (duration: 02m 41s)
  • 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1c258b2: Enable language switching button for logged-out users on non-pilot wikis (T312861) (duration: 02m 43s)
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f99c533: Pin cu_log actor migration to old schema (T233004) (duration: 02m 41s)
  • 20:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 415c4ef: Collapse sidebar by default for anonymous users (T287609) (duration: 02m 41s)
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:13 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/resources/src/moment/moment-locale-overrides.js: c4d8a21: Ensure custom locales for Moment.js overrides, dont change en (T313188) (duration: 02m 44s)
  • 20:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 76b7cd6: Mentorship: enable the Vue version of the dashboard in test (T300532) (duration: 03m 00s)
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:45 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
  • 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:04 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
  • 19:02 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
  • 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31385 and previous config saved to /var/cache/conftool/dbconfig/20220718-184146-root.json
  • 18:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
  • 18:35 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
  • 18:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31384 and previous config saved to /var/cache/conftool/dbconfig/20220718-182642-root.json
  • 18:17 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
  • 18:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2065.codfw.wmnet with OS bullseye
  • 18:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31382 and previous config saved to /var/cache/conftool/dbconfig/20220718-181138-root.json
  • 18:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2065.codfw.wmnet with reason: host reimage
  • 17:57 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2065.codfw.wmnet with reason: host reimage
  • 17:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31381 and previous config saved to /var/cache/conftool/dbconfig/20220718-175634-root.json
  • 17:43 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2065.codfw.wmnet with OS bullseye
  • 17:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31380 and previous config saved to /var/cache/conftool/dbconfig/20220718-174130-root.json
  • 17:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31379 and previous config saved to /var/cache/conftool/dbconfig/20220718-172626-root.json
  • 17:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31378 and previous config saved to /var/cache/conftool/dbconfig/20220718-171122-root.json
  • 16:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31377 and previous config saved to /var/cache/conftool/dbconfig/20220718-165617-root.json
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31376 and previous config saved to /var/cache/conftool/dbconfig/20220718-165455-marostegui.json
  • 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31375 and previous config saved to /var/cache/conftool/dbconfig/20220718-165349-marostegui.json
  • 16:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 16:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T313070)', diff saved to https://phabricator.wikimedia.org/P31374 and previous config saved to /var/cache/conftool/dbconfig/20220718-165329-marostegui.json
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31373 and previous config saved to /var/cache/conftool/dbconfig/20220718-163824-marostegui.json
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31372 and previous config saved to /var/cache/conftool/dbconfig/20220718-162319-marostegui.json
  • 16:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T313070)', diff saved to https://phabricator.wikimedia.org/P31371 and previous config saved to /var/cache/conftool/dbconfig/20220718-160813-marostegui.json
  • 16:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T313070)', diff saved to https://phabricator.wikimedia.org/P31370 and previous config saved to /var/cache/conftool/dbconfig/20220718-160708-marostegui.json
  • 16:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 16:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T313070)', diff saved to https://phabricator.wikimedia.org/P31369 and previous config saved to /var/cache/conftool/dbconfig/20220718-160648-marostegui.json
  • 15:52 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 59s)
  • 15:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31368 and previous config saved to /var/cache/conftool/dbconfig/20220718-155143-marostegui.json
  • 15:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:49 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 03m 03s)
  • 15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:40 ejegg: updated fundraising CiviCRM from 55bc690b to b4a7154a
  • 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31367 and previous config saved to /var/cache/conftool/dbconfig/20220718-153637-marostegui.json
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T313070)', diff saved to https://phabricator.wikimedia.org/P31366 and previous config saved to /var/cache/conftool/dbconfig/20220718-152132-marostegui.json
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T313070)', diff saved to https://phabricator.wikimedia.org/P31365 and previous config saved to /var/cache/conftool/dbconfig/20220718-152026-marostegui.json
  • 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 15:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T313070)', diff saved to https://phabricator.wikimedia.org/P31364 and previous config saved to /var/cache/conftool/dbconfig/20220718-151944-marostegui.json
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31363 and previous config saved to /var/cache/conftool/dbconfig/20220718-150439-marostegui.json
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312984)', diff saved to https://phabricator.wikimedia.org/P31362 and previous config saved to /var/cache/conftool/dbconfig/20220718-145909-ladsgroup.json
  • 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2012.codfw.wmnet to cluster codfw and group C
  • 14:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2012.codfw.wmnet to cluster codfw and group C
  • 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31361 and previous config saved to /var/cache/conftool/dbconfig/20220718-144934-marostegui.json
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P31360 and previous config saved to /var/cache/conftool/dbconfig/20220718-144404-ladsgroup.json
  • 14:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T313070)', diff saved to https://phabricator.wikimedia.org/P31359 and previous config saved to /var/cache/conftool/dbconfig/20220718-143428-marostegui.json
  • 14:29 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:29 lucaswerkmeister-wmde@deploy1002: Finished scap: refresh everything after adding CampaignEvents to extension-list (T311752, only enabled in Beta so far), just in case (duration: 14m 40s)
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P31358 and previous config saved to /var/cache/conftool/dbconfig/20220718-142859-ladsgroup.json
  • 14:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:14 lucaswerkmeister-wmde@deploy1002: Started scap: refresh everything after adding CampaignEvents to extension-list (T311752, only enabled in Beta so far), just in case
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312984)', diff saved to https://phabricator.wikimedia.org/P31357 and previous config saved to /var/cache/conftool/dbconfig/20220718-141354-ladsgroup.json
  • 14:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: Load and configure the CampaignEvents extension where enabled (T311752) (2/2: should be prod no-op) (duration: 02m 40s)
  • 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T312984)', diff saved to https://phabricator.wikimedia.org/P31356 and previous config saved to /var/cache/conftool/dbconfig/20220718-140947-ladsgroup.json
  • 14:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 14:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T312984)', diff saved to https://phabricator.wikimedia.org/P31355 and previous config saved to /var/cache/conftool/dbconfig/20220718-140926-ladsgroup.json
  • 14:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Load and configure the CampaignEvents extension where enabled (T311752) (1/2: should be no-op) (duration: 02m 51s)
  • 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Enable the CampaignEvents extension on beta (T311752) (no-op) (duration: 02m 43s)
  • 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P31354 and previous config saved to /var/cache/conftool/dbconfig/20220718-135421-ladsgroup.json
  • 13:53 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add config variable for the CampaignEvents extension (T311752) (no-op) (duration: 02m 55s)
  • 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/extension-list: Config: Add CampaignEvents to extension-list (T311752) (duration: 03m 08s)
  • 13:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2028.codfw.wmnet to cluster codfw and group A
  • 13:45 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2028.codfw.wmnet to cluster codfw and group A
  • 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2018.codfw.wmnet with OS bullseye
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P31353 and previous config saved to /var/cache/conftool/dbconfig/20220718-133916-ladsgroup.json
  • 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T313070)', diff saved to https://phabricator.wikimedia.org/P31352 and previous config saved to /var/cache/conftool/dbconfig/20220718-133414-marostegui.json
  • 13:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 13:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T313070)', diff saved to https://phabricator.wikimedia.org/P31351 and previous config saved to /var/cache/conftool/dbconfig/20220718-133354-marostegui.json
  • 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2028.codfw.wmnet
  • 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Make weighted_tags search default for commonswiki (duration: 02m 54s)
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T312984)', diff saved to https://phabricator.wikimedia.org/P31350 and previous config saved to /var/cache/conftool/dbconfig/20220718-132411-ladsgroup.json
  • 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2018.codfw.wmnet with reason: host reimage
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T312984)', diff saved to https://phabricator.wikimedia.org/P31349 and previous config saved to /var/cache/conftool/dbconfig/20220718-132009-ladsgroup.json
  • 13:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T312984)', diff saved to https://phabricator.wikimedia.org/P31348 and previous config saved to /var/cache/conftool/dbconfig/20220718-131949-ladsgroup.json
  • 13:19 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/ImageSuggestions/maintenance/SendNotificationsForUnillustratedWatchedTitles.php: Backport: Use getOption to detect user preferences (T313209) (duration: 02m 50s)
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31347 and previous config saved to /var/cache/conftool/dbconfig/20220718-131848-marostegui.json
  • 13:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2018.codfw.wmnet with reason: host reimage
  • 13:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 13:15 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update config for commons custommatch search (duration: 02m 55s)
  • 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P31346 and previous config saved to /var/cache/conftool/dbconfig/20220718-130443-ladsgroup.json
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31345 and previous config saved to /var/cache/conftool/dbconfig/20220718-130343-marostegui.json
  • 13:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2018.codfw.wmnet with OS bullseye
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P31344 and previous config saved to /var/cache/conftool/dbconfig/20220718-124938-ladsgroup.json
  • 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2012.codfw.wmnet with OS bullseye
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T313070)', diff saved to https://phabricator.wikimedia.org/P31343 and previous config saved to /var/cache/conftool/dbconfig/20220718-124838-marostegui.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T313070)', diff saved to https://phabricator.wikimedia.org/P31342 and previous config saved to /var/cache/conftool/dbconfig/20220718-124732-marostegui.json
  • 12:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 12:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T313070)', diff saved to https://phabricator.wikimedia.org/P31341 and previous config saved to /var/cache/conftool/dbconfig/20220718-124712-marostegui.json
  • 12:35 godog: update grafana to 8.5.9
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T312984)', diff saved to https://phabricator.wikimedia.org/P31340 and previous config saved to /var/cache/conftool/dbconfig/20220718-123433-ladsgroup.json
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31339 and previous config saved to /var/cache/conftool/dbconfig/20220718-123207-marostegui.json
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T312984)', diff saved to https://phabricator.wikimedia.org/P31338 and previous config saved to /var/cache/conftool/dbconfig/20220718-123029-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T312984)', diff saved to https://phabricator.wikimedia.org/P31337 and previous config saved to /var/cache/conftool/dbconfig/20220718-123009-ladsgroup.json
  • 12:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31336 and previous config saved to /var/cache/conftool/dbconfig/20220718-121702-marostegui.json
  • 12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P31335 and previous config saved to /var/cache/conftool/dbconfig/20220718-121504-ladsgroup.json
  • 12:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2012.codfw.wmnet with OS bullseye
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2028.codfw.wmnet with OS bullseye
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T313070)', diff saved to https://phabricator.wikimedia.org/P31334 and previous config saved to /var/cache/conftool/dbconfig/20220718-120157-marostegui.json
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T313070)', diff saved to https://phabricator.wikimedia.org/P31333 and previous config saved to /var/cache/conftool/dbconfig/20220718-120051-marostegui.json
  • 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31332 and previous config saved to /var/cache/conftool/dbconfig/20220718-120030-marostegui.json
  • 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P31331 and previous config saved to /var/cache/conftool/dbconfig/20220718-115959-ladsgroup.json
  • 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
  • 11:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31330 and previous config saved to /var/cache/conftool/dbconfig/20220718-114525-marostegui.json
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T312984)', diff saved to https://phabricator.wikimedia.org/P31329 and previous config saved to /var/cache/conftool/dbconfig/20220718-114454-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T312984)', diff saved to https://phabricator.wikimedia.org/P31328 and previous config saved to /var/cache/conftool/dbconfig/20220718-113947-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312984)', diff saved to https://phabricator.wikimedia.org/P31327 and previous config saved to /var/cache/conftool/dbconfig/20220718-113927-ladsgroup.json
  • 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2028.codfw.wmnet with OS bullseye
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31326 and previous config saved to /var/cache/conftool/dbconfig/20220718-113020-marostegui.json
  • 11:25 jbond: re-enable puppet post postgresql re-sync
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P31325 and previous config saved to /var/cache/conftool/dbconfig/20220718-112422-ladsgroup.json
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31324 and previous config saved to /var/cache/conftool/dbconfig/20220718-111515-marostegui.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31323 and previous config saved to /var/cache/conftool/dbconfig/20220718-111409-marostegui.json
  • 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T313070)', diff saved to https://phabricator.wikimedia.org/P31322 and previous config saved to /var/cache/conftool/dbconfig/20220718-111348-marostegui.json
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P31319 and previous config saved to /var/cache/conftool/dbconfig/20220718-110916-ladsgroup.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31318 and previous config saved to /var/cache/conftool/dbconfig/20220718-105843-marostegui.json
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312984)', diff saved to https://phabricator.wikimedia.org/P31317 and previous config saved to /var/cache/conftool/dbconfig/20220718-105411-ladsgroup.json
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T312984)', diff saved to https://phabricator.wikimedia.org/P31316 and previous config saved to /var/cache/conftool/dbconfig/20220718-104921-ladsgroup.json
  • 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312984)', diff saved to https://phabricator.wikimedia.org/P31315 and previous config saved to /var/cache/conftool/dbconfig/20220718-104844-ladsgroup.json
  • 10:48 jbond: disable puppet fleet wide to resync db
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31314 and previous config saved to /var/cache/conftool/dbconfig/20220718-104337-marostegui.json
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P31313 and previous config saved to /var/cache/conftool/dbconfig/20220718-103339-ladsgroup.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T313070)', diff saved to https://phabricator.wikimedia.org/P31312 and previous config saved to /var/cache/conftool/dbconfig/20220718-102832-marostegui.json
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T313070)', diff saved to https://phabricator.wikimedia.org/P31311 and previous config saved to /var/cache/conftool/dbconfig/20220718-102726-marostegui.json
  • 10:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 10:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31310 and previous config saved to /var/cache/conftool/dbconfig/20220718-102706-marostegui.json
  • 10:26 Amir1: dbmaint on s5@eqiad (T312863)
  • 10:26 Amir1: dbmaint on s5@codfw (T312863)
  • 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P31308 and previous config saved to /var/cache/conftool/dbconfig/20220718-101834-ladsgroup.json
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31307 and previous config saved to /var/cache/conftool/dbconfig/20220718-101201-marostegui.json
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312984)', diff saved to https://phabricator.wikimedia.org/P31306 and previous config saved to /var/cache/conftool/dbconfig/20220718-100329-ladsgroup.json
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T312984)', diff saved to https://phabricator.wikimedia.org/P31305 and previous config saved to /var/cache/conftool/dbconfig/20220718-095916-ladsgroup.json
  • 09:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312984)', diff saved to https://phabricator.wikimedia.org/P31304 and previous config saved to /var/cache/conftool/dbconfig/20220718-095856-ladsgroup.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31303 and previous config saved to /var/cache/conftool/dbconfig/20220718-095656-marostegui.json
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P31302 and previous config saved to /var/cache/conftool/dbconfig/20220718-094351-ladsgroup.json
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31301 and previous config saved to /var/cache/conftool/dbconfig/20220718-094150-marostegui.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T313070)', diff saved to https://phabricator.wikimedia.org/P31300 and previous config saved to /var/cache/conftool/dbconfig/20220718-094043-marostegui.json
  • 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T313070)', diff saved to https://phabricator.wikimedia.org/P31299 and previous config saved to /var/cache/conftool/dbconfig/20220718-094033-marostegui.json
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P31298 and previous config saved to /var/cache/conftool/dbconfig/20220718-092845-ladsgroup.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31297 and previous config saved to /var/cache/conftool/dbconfig/20220718-092528-marostegui.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 T311106', diff saved to https://phabricator.wikimedia.org/P31295 and previous config saved to /var/cache/conftool/dbconfig/20220718-091957-root.json
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312984)', diff saved to https://phabricator.wikimedia.org/P31293 and previous config saved to /var/cache/conftool/dbconfig/20220718-091340-ladsgroup.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31292 and previous config saved to /var/cache/conftool/dbconfig/20220718-091023-marostegui.json
  • 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T312984)', diff saved to https://phabricator.wikimedia.org/P31291 and previous config saved to /var/cache/conftool/dbconfig/20220718-090919-ladsgroup.json
  • 09:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 09:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T312984)', diff saved to https://phabricator.wikimedia.org/P31290 and previous config saved to /var/cache/conftool/dbconfig/20220718-090857-ladsgroup.json
  • 09:05 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
  • 09:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 08:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T313070)', diff saved to https://phabricator.wikimedia.org/P31289 and previous config saved to /var/cache/conftool/dbconfig/20220718-085518-marostegui.json
  • 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P31288 and previous config saved to /var/cache/conftool/dbconfig/20220718-085352-ladsgroup.json
  • 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1109 (T313070)', diff saved to https://phabricator.wikimedia.org/P31287 and previous config saved to /var/cache/conftool/dbconfig/20220718-085312-marostegui.json
  • 08:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T313070)', diff saved to https://phabricator.wikimedia.org/P31286 and previous config saved to /var/cache/conftool/dbconfig/20220718-085251-marostegui.json
  • 08:42 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2012.codfw.wmnet with OS bullseye
  • 08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P31285 and previous config saved to /var/cache/conftool/dbconfig/20220718-083847-ladsgroup.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31284 and previous config saved to /var/cache/conftool/dbconfig/20220718-083746-marostegui.json
  • 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
  • 08:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
  • 08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T312984)', diff saved to https://phabricator.wikimedia.org/P31283 and previous config saved to /var/cache/conftool/dbconfig/20220718-082342-ladsgroup.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31282 and previous config saved to /var/cache/conftool/dbconfig/20220718-082241-marostegui.json
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T312984)', diff saved to https://phabricator.wikimedia.org/P31281 and previous config saved to /var/cache/conftool/dbconfig/20220718-081934-ladsgroup.json
  • 08:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 08:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T312984)', diff saved to https://phabricator.wikimedia.org/P31280 and previous config saved to /var/cache/conftool/dbconfig/20220718-081914-ladsgroup.json
  • 08:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2012.codfw.wmnet with OS bullseye
  • 08:12 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
  • 08:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 08:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
  • 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 08:10 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
  • 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T313070)', diff saved to https://phabricator.wikimedia.org/P31279 and previous config saved to /var/cache/conftool/dbconfig/20220718-080735-marostegui.json
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P31278 and previous config saved to /var/cache/conftool/dbconfig/20220718-080409-ladsgroup.json
  • 08:00 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
  • 08:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 07:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T313070)', diff saved to https://phabricator.wikimedia.org/P31277 and previous config saved to /var/cache/conftool/dbconfig/20220718-075527-marostegui.json
  • 07:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 07:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T313070)', diff saved to https://phabricator.wikimedia.org/P31276 and previous config saved to /var/cache/conftool/dbconfig/20220718-075501-marostegui.json
  • 07:54 kharlan@deploy1002: Synchronized wmf-config: Config: Structured task: Disable free text for "other" rejection reason (T304099) (duration: 02m 41s)
  • 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P31275 and previous config saved to /var/cache/conftool/dbconfig/20220718-074904-ladsgroup.json
  • 07:47 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2028.codfw.wmnet with OS bullseye
  • 07:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:40 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable ContentTranslation out of Beta for ay, ilo, kg, ln, nso, and tn Wikipedias (T309384) (duration: 02m 51s)
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31274 and previous config saved to /var/cache/conftool/dbconfig/20220718-073956-marostegui.json
  • 07:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
  • 07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:35 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
  • 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T312984)', diff saved to https://phabricator.wikimedia.org/P31273 and previous config saved to /var/cache/conftool/dbconfig/20220718-073359-ladsgroup.json
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T312984)', diff saved to https://phabricator.wikimedia.org/P31272 and previous config saved to /var/cache/conftool/dbconfig/20220718-072953-ladsgroup.json
  • 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 07:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31271 and previous config saved to /var/cache/conftool/dbconfig/20220718-072451-marostegui.json
  • 07:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: Maintenance
  • 07:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 13 hosts with reason: Maintenance
  • 07:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:21 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2028.codfw.wmnet with OS bullseye
  • 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 07:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 07:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T312984)', diff saved to https://phabricator.wikimedia.org/P31270 and previous config saved to /var/cache/conftool/dbconfig/20220718-071711-ladsgroup.json
  • 07:10 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Content and Section translation on WPs with NLLB-200 MT support (T309384) (duration: 02m 53s)
  • 07:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T313070)', diff saved to https://phabricator.wikimedia.org/P31269 and previous config saved to /var/cache/conftool/dbconfig/20220718-070946-marostegui.json
  • 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T313070)', diff saved to https://phabricator.wikimedia.org/P31268 and previous config saved to /var/cache/conftool/dbconfig/20220718-070840-marostegui.json
  • 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 07:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T313070)', diff saved to https://phabricator.wikimedia.org/P31267 and previous config saved to /var/cache/conftool/dbconfig/20220718-070820-marostegui.json
  • 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P31266 and previous config saved to /var/cache/conftool/dbconfig/20220718-070205-ladsgroup.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P31265 and previous config saved to /var/cache/conftool/dbconfig/20220718-065315-marostegui.json
  • 06:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P31264 and previous config saved to /var/cache/conftool/dbconfig/20220718-064700-ladsgroup.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P31263 and previous config saved to /var/cache/conftool/dbconfig/20220718-063809-marostegui.json
  • 06:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T312984)', diff saved to https://phabricator.wikimedia.org/P31262 and previous config saved to /var/cache/conftool/dbconfig/20220718-063155-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T312984)', diff saved to https://phabricator.wikimedia.org/P31261 and previous config saved to /var/cache/conftool/dbconfig/20220718-062648-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 06:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 06:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T313070)', diff saved to https://phabricator.wikimedia.org/P31260 and previous config saved to /var/cache/conftool/dbconfig/20220718-062304-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2166 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P31259 and previous config saved to /var/cache/conftool/dbconfig/20220718-055051-marostegui.json
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2082.codfw.wmnet
  • 05:43 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:39 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2082.codfw.wmnet
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2082 T313003', diff saved to https://phabricator.wikimedia.org/P31258 and previous config saved to /var/cache/conftool/dbconfig/20220718-052605-marostegui.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T313070)', diff saved to https://phabricator.wikimedia.org/P31257 and previous config saved to /var/cache/conftool/dbconfig/20220718-052250-marostegui.json
  • 05:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 05:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 05:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 15 hosts with reason: Maintenance
  • 05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 15 hosts with reason: Maintenance
  • 05:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 05:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance

2022-07-17

  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312984)', diff saved to https://phabricator.wikimedia.org/P31256 and previous config saved to /var/cache/conftool/dbconfig/20220717-180539-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31255 and previous config saved to /var/cache/conftool/dbconfig/20220717-175034-ladsgroup.json
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31254 and previous config saved to /var/cache/conftool/dbconfig/20220717-173528-ladsgroup.json
  • 17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312984)', diff saved to https://phabricator.wikimedia.org/P31253 and previous config saved to /var/cache/conftool/dbconfig/20220717-172023-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T312984)', diff saved to https://phabricator.wikimedia.org/P31252 and previous config saved to /var/cache/conftool/dbconfig/20220717-155102-ladsgroup.json
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31251 and previous config saved to /var/cache/conftool/dbconfig/20220717-155025-ladsgroup.json
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31250 and previous config saved to /var/cache/conftool/dbconfig/20220717-153520-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31249 and previous config saved to /var/cache/conftool/dbconfig/20220717-152015-ladsgroup.json
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31248 and previous config saved to /var/cache/conftool/dbconfig/20220717-150510-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31247 and previous config saved to /var/cache/conftool/dbconfig/20220717-132751-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312984)', diff saved to https://phabricator.wikimedia.org/P31246 and previous config saved to /var/cache/conftool/dbconfig/20220717-132731-ladsgroup.json
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31245 and previous config saved to /var/cache/conftool/dbconfig/20220717-131226-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31244 and previous config saved to /var/cache/conftool/dbconfig/20220717-125720-ladsgroup.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312984)', diff saved to https://phabricator.wikimedia.org/P31243 and previous config saved to /var/cache/conftool/dbconfig/20220717-124215-ladsgroup.json
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T312984)', diff saved to https://phabricator.wikimedia.org/P31242 and previous config saved to /var/cache/conftool/dbconfig/20220717-110523-ladsgroup.json
  • 11:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 11:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31241 and previous config saved to /var/cache/conftool/dbconfig/20220717-110503-ladsgroup.json
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31240 and previous config saved to /var/cache/conftool/dbconfig/20220717-104958-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31239 and previous config saved to /var/cache/conftool/dbconfig/20220717-103453-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31238 and previous config saved to /var/cache/conftool/dbconfig/20220717-101948-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31237 and previous config saved to /var/cache/conftool/dbconfig/20220717-084432-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312984)', diff saved to https://phabricator.wikimedia.org/P31236 and previous config saved to /var/cache/conftool/dbconfig/20220717-084411-ladsgroup.json
  • 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31235 and previous config saved to /var/cache/conftool/dbconfig/20220717-082906-ladsgroup.json
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31234 and previous config saved to /var/cache/conftool/dbconfig/20220717-081401-ladsgroup.json
  • 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312984)', diff saved to https://phabricator.wikimedia.org/P31233 and previous config saved to /var/cache/conftool/dbconfig/20220717-075856-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T312984)', diff saved to https://phabricator.wikimedia.org/P31232 and previous config saved to /var/cache/conftool/dbconfig/20220717-071149-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 07:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31231 and previous config saved to /var/cache/conftool/dbconfig/20220717-071129-ladsgroup.json
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31230 and previous config saved to /var/cache/conftool/dbconfig/20220717-065624-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31229 and previous config saved to /var/cache/conftool/dbconfig/20220717-064119-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31228 and previous config saved to /var/cache/conftool/dbconfig/20220717-062614-ladsgroup.json
  • 04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31227 and previous config saved to /var/cache/conftool/dbconfig/20220717-044802-ladsgroup.json
  • 04:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 04:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 04:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
  • 04:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
  • 04:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 04:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 02:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 01:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 01:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31226 and previous config saved to /var/cache/conftool/dbconfig/20220717-010309-ladsgroup.json
  • 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31225 and previous config saved to /var/cache/conftool/dbconfig/20220717-004804-ladsgroup.json
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31224 and previous config saved to /var/cache/conftool/dbconfig/20220717-003259-ladsgroup.json
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31223 and previous config saved to /var/cache/conftool/dbconfig/20220717-001754-ladsgroup.json
  • 00:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31222 and previous config saved to /var/cache/conftool/dbconfig/20220717-000143-ladsgroup.json
  • 00:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 00:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance

2022-07-16

  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31221 and previous config saved to /var/cache/conftool/dbconfig/20220716-221808-ladsgroup.json
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31220 and previous config saved to /var/cache/conftool/dbconfig/20220716-220303-ladsgroup.json
  • 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31219 and previous config saved to /var/cache/conftool/dbconfig/20220716-214758-ladsgroup.json
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31218 and previous config saved to /var/cache/conftool/dbconfig/20220716-213253-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31217 and previous config saved to /var/cache/conftool/dbconfig/20220716-203238-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 10 hosts with reason: Maintenance
  • 20:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 10 hosts with reason: Maintenance
  • 20:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T312984)', diff saved to https://phabricator.wikimedia.org/P31216 and previous config saved to /var/cache/conftool/dbconfig/20220716-200803-ladsgroup.json
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P31215 and previous config saved to /var/cache/conftool/dbconfig/20220716-195258-ladsgroup.json
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P31214 and previous config saved to /var/cache/conftool/dbconfig/20220716-193753-ladsgroup.json
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T312984)', diff saved to https://phabricator.wikimedia.org/P31213 and previous config saved to /var/cache/conftool/dbconfig/20220716-192248-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T312984)', diff saved to https://phabricator.wikimedia.org/P31212 and previous config saved to /var/cache/conftool/dbconfig/20220716-184459-ladsgroup.json
  • 18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T312984)', diff saved to https://phabricator.wikimedia.org/P31211 and previous config saved to /var/cache/conftool/dbconfig/20220716-184428-ladsgroup.json
  • 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31210 and previous config saved to /var/cache/conftool/dbconfig/20220716-182922-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31209 and previous config saved to /var/cache/conftool/dbconfig/20220716-181417-ladsgroup.json
  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T312984)', diff saved to https://phabricator.wikimedia.org/P31208 and previous config saved to /var/cache/conftool/dbconfig/20220716-175912-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T312984)', diff saved to https://phabricator.wikimedia.org/P31207 and previous config saved to /var/cache/conftool/dbconfig/20220716-174959-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 17:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312984)', diff saved to https://phabricator.wikimedia.org/P31205 and previous config saved to /var/cache/conftool/dbconfig/20220716-173811-ladsgroup.json
  • 17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31204 and previous config saved to /var/cache/conftool/dbconfig/20220716-172305-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31203 and previous config saved to /var/cache/conftool/dbconfig/20220716-170800-ladsgroup.json
  • 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312984)', diff saved to https://phabricator.wikimedia.org/P31202 and previous config saved to /var/cache/conftool/dbconfig/20220716-165255-ladsgroup.json
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T312984)', diff saved to https://phabricator.wikimedia.org/P31201 and previous config saved to /var/cache/conftool/dbconfig/20220716-163449-ladsgroup.json
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31200 and previous config saved to /var/cache/conftool/dbconfig/20220716-163418-ladsgroup.json
  • 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31199 and previous config saved to /var/cache/conftool/dbconfig/20220716-161913-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31198 and previous config saved to /var/cache/conftool/dbconfig/20220716-160408-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31197 and previous config saved to /var/cache/conftool/dbconfig/20220716-154903-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31196 and previous config saved to /var/cache/conftool/dbconfig/20220716-153647-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31195 and previous config saved to /var/cache/conftool/dbconfig/20220716-153627-ladsgroup.json
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31194 and previous config saved to /var/cache/conftool/dbconfig/20220716-152122-ladsgroup.json
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31193 and previous config saved to /var/cache/conftool/dbconfig/20220716-150616-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31192 and previous config saved to /var/cache/conftool/dbconfig/20220716-145111-ladsgroup.json
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31191 and previous config saved to /var/cache/conftool/dbconfig/20220716-143705-ladsgroup.json
  • 14:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 14:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312984)', diff saved to https://phabricator.wikimedia.org/P31190 and previous config saved to /var/cache/conftool/dbconfig/20220716-143645-ladsgroup.json
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31189 and previous config saved to /var/cache/conftool/dbconfig/20220716-142140-ladsgroup.json
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31188 and previous config saved to /var/cache/conftool/dbconfig/20220716-140634-ladsgroup.json
  • 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312984)', diff saved to https://phabricator.wikimedia.org/P31187 and previous config saved to /var/cache/conftool/dbconfig/20220716-135129-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T312984)', diff saved to https://phabricator.wikimedia.org/P31186 and previous config saved to /var/cache/conftool/dbconfig/20220716-134429-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 00:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2064.codfw.wmnet with OS bullseye
  • 00:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2064.codfw.wmnet with reason: host reimage
  • 00:27 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2064.codfw.wmnet with reason: host reimage
  • 00:13 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2064.codfw.wmnet with OS bullseye

2022-07-15

  • 23:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 23:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 23:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 23:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T312984)', diff saved to https://phabricator.wikimedia.org/P31185 and previous config saved to /var/cache/conftool/dbconfig/20220715-231400-ladsgroup.json
  • 22:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31184 and previous config saved to /var/cache/conftool/dbconfig/20220715-225855-ladsgroup.json
  • 22:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31183 and previous config saved to /var/cache/conftool/dbconfig/20220715-224350-ladsgroup.json
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T312984)', diff saved to https://phabricator.wikimedia.org/P31182 and previous config saved to /var/cache/conftool/dbconfig/20220715-222845-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T312984)', diff saved to https://phabricator.wikimedia.org/P31181 and previous config saved to /var/cache/conftool/dbconfig/20220715-222427-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 22:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312984)', diff saved to https://phabricator.wikimedia.org/P31180 and previous config saved to /var/cache/conftool/dbconfig/20220715-222407-ladsgroup.json
  • 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31179 and previous config saved to /var/cache/conftool/dbconfig/20220715-220902-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31178 and previous config saved to /var/cache/conftool/dbconfig/20220715-215357-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312984)', diff saved to https://phabricator.wikimedia.org/P31177 and previous config saved to /var/cache/conftool/dbconfig/20220715-213852-ladsgroup.json
  • 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T312984)', diff saved to https://phabricator.wikimedia.org/P31176 and previous config saved to /var/cache/conftool/dbconfig/20220715-213153-ladsgroup.json
  • 21:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 21:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31175 and previous config saved to /var/cache/conftool/dbconfig/20220715-213133-ladsgroup.json
  • 21:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31174 and previous config saved to /var/cache/conftool/dbconfig/20220715-211628-ladsgroup.json
  • 21:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2063.codfw.wmnet with OS bullseye
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31173 and previous config saved to /var/cache/conftool/dbconfig/20220715-210122-ladsgroup.json
  • 20:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2063.codfw.wmnet with reason: host reimage
  • 20:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2063.codfw.wmnet with reason: host reimage
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31172 and previous config saved to /var/cache/conftool/dbconfig/20220715-204617-ladsgroup.json
  • 20:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31171 and previous config saved to /var/cache/conftool/dbconfig/20220715-203909-ladsgroup.json
  • 20:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 20:38 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2063.codfw.wmnet with OS bullseye
  • 20:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312984)', diff saved to https://phabricator.wikimedia.org/P31170 and previous config saved to /var/cache/conftool/dbconfig/20220715-203849-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31169 and previous config saved to /var/cache/conftool/dbconfig/20220715-202344-ladsgroup.json
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31168 and previous config saved to /var/cache/conftool/dbconfig/20220715-200839-ladsgroup.json
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312984)', diff saved to https://phabricator.wikimedia.org/P31167 and previous config saved to /var/cache/conftool/dbconfig/20220715-195334-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T312984)', diff saved to https://phabricator.wikimedia.org/P31166 and previous config saved to /var/cache/conftool/dbconfig/20220715-194418-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31165 and previous config saved to /var/cache/conftool/dbconfig/20220715-194358-ladsgroup.json
  • 19:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2062.codfw.wmnet with OS bullseye
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31164 and previous config saved to /var/cache/conftool/dbconfig/20220715-192852-ladsgroup.json
  • 19:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2062.codfw.wmnet with reason: host reimage
  • 19:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2062.codfw.wmnet with reason: host reimage
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31163 and previous config saved to /var/cache/conftool/dbconfig/20220715-191347-ladsgroup.json
  • 19:01 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2062.codfw.wmnet with OS bullseye
  • 19:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2061.codfw.wmnet with OS bullseye
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31162 and previous config saved to /var/cache/conftool/dbconfig/20220715-185842-ladsgroup.json
  • 18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31161 and previous config saved to /var/cache/conftool/dbconfig/20220715-185107-ladsgroup.json
  • 18:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31160 and previous config saved to /var/cache/conftool/dbconfig/20220715-185047-ladsgroup.json
  • 18:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2061.codfw.wmnet with reason: host reimage
  • 18:44 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2061.codfw.wmnet with reason: host reimage
  • 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31159 and previous config saved to /var/cache/conftool/dbconfig/20220715-183542-ladsgroup.json
  • 18:31 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2061.codfw.wmnet with OS bullseye
  • 18:30 ryankemper: T300943 Re-imaging `elastic20[61-72]` from buster -> bullseye, one host at a time. These hosts are not in service currently so re-imaging is safe.
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31158 and previous config saved to /var/cache/conftool/dbconfig/20220715-182037-ladsgroup.json
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31157 and previous config saved to /var/cache/conftool/dbconfig/20220715-180532-ladsgroup.json
  • 18:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1004.wikimedia.org with OS bullseye
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31156 and previous config saved to /var/cache/conftool/dbconfig/20220715-175822-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312984)', diff saved to https://phabricator.wikimedia.org/P31155 and previous config saved to /var/cache/conftool/dbconfig/20220715-175801-ladsgroup.json
  • 17:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1003.wikimedia.org with OS bullseye
  • 17:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
  • 17:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
  • 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31154 and previous config saved to /var/cache/conftool/dbconfig/20220715-174256-ladsgroup.json
  • 17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS bullseye
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31152 and previous config saved to /var/cache/conftool/dbconfig/20220715-172751-ladsgroup.json
  • 17:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS bullseye
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312984)', diff saved to https://phabricator.wikimedia.org/P31151 and previous config saved to /var/cache/conftool/dbconfig/20220715-171246-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T312984)', diff saved to https://phabricator.wikimedia.org/P31150 and previous config saved to /var/cache/conftool/dbconfig/20220715-170545-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 17:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 17:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 6 hosts with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 6 hosts with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312984)', diff saved to https://phabricator.wikimedia.org/P31149 and previous config saved to /var/cache/conftool/dbconfig/20220715-155021-ladsgroup.json
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31148 and previous config saved to /var/cache/conftool/dbconfig/20220715-153515-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31147 and previous config saved to /var/cache/conftool/dbconfig/20220715-152010-ladsgroup.json
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312984)', diff saved to https://phabricator.wikimedia.org/P31146 and previous config saved to /var/cache/conftool/dbconfig/20220715-150505-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T312984)', diff saved to https://phabricator.wikimedia.org/P31144 and previous config saved to /var/cache/conftool/dbconfig/20220715-140451-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 14:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312984)', diff saved to https://phabricator.wikimedia.org/P31143 and previous config saved to /var/cache/conftool/dbconfig/20220715-140431-ladsgroup.json
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31141 and previous config saved to /var/cache/conftool/dbconfig/20220715-134926-ladsgroup.json
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31140 and previous config saved to /var/cache/conftool/dbconfig/20220715-133421-ladsgroup.json
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312984)', diff saved to https://phabricator.wikimedia.org/P31139 and previous config saved to /var/cache/conftool/dbconfig/20220715-131916-ladsgroup.json
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T312984)', diff saved to https://phabricator.wikimedia.org/P31138 and previous config saved to /var/cache/conftool/dbconfig/20220715-130706-ladsgroup.json
  • 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312984)', diff saved to https://phabricator.wikimedia.org/P31137 and previous config saved to /var/cache/conftool/dbconfig/20220715-130634-ladsgroup.json
  • 13:05 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:05 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31136 and previous config saved to /var/cache/conftool/dbconfig/20220715-125129-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31135 and previous config saved to /var/cache/conftool/dbconfig/20220715-123624-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312984)', diff saved to https://phabricator.wikimedia.org/P31134 and previous config saved to /var/cache/conftool/dbconfig/20220715-122119-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T312984)', diff saved to https://phabricator.wikimedia.org/P31133 and previous config saved to /var/cache/conftool/dbconfig/20220715-120750-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312984)', diff saved to https://phabricator.wikimedia.org/P31132 and previous config saved to /var/cache/conftool/dbconfig/20220715-120713-ladsgroup.json
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31131 and previous config saved to /var/cache/conftool/dbconfig/20220715-115207-ladsgroup.json
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31130 and previous config saved to /var/cache/conftool/dbconfig/20220715-113702-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312984)', diff saved to https://phabricator.wikimedia.org/P31129 and previous config saved to /var/cache/conftool/dbconfig/20220715-112157-ladsgroup.json
  • 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T312984)', diff saved to https://phabricator.wikimedia.org/P31128 and previous config saved to /var/cache/conftool/dbconfig/20220715-105748-ladsgroup.json
  • 10:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:56 hashar@deploy1002: Finished deploy [integration/docroot@e563641]: Add banan-i18n library (duration: 00m 08s)
  • 10:56 hashar@deploy1002: Started deploy [integration/docroot@e563641]: Add banan-i18n library
  • 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312984)', diff saved to https://phabricator.wikimedia.org/P31127 and previous config saved to /var/cache/conftool/dbconfig/20220715-103513-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31126 and previous config saved to /var/cache/conftool/dbconfig/20220715-102008-ladsgroup.json
  • 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31125 and previous config saved to /var/cache/conftool/dbconfig/20220715-100503-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312984)', diff saved to https://phabricator.wikimedia.org/P31124 and previous config saved to /var/cache/conftool/dbconfig/20220715-094958-ladsgroup.json
  • 09:38 Amir1: killed refreshLinkRecommendations.php in testwiki (T299021)
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T312984)', diff saved to https://phabricator.wikimedia.org/P31123 and previous config saved to /var/cache/conftool/dbconfig/20220715-093449-ladsgroup.json
  • 09:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:26 moritzm: update thirdparty/node16 to Node 16.16.0
  • 07:26 moritzm: update thirdparty/node14 to Node 14.20.0
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31121 and previous config saved to /var/cache/conftool/dbconfig/20220715-064928-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31120 and previous config saved to /var/cache/conftool/dbconfig/20220715-063424-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31119 and previous config saved to /var/cache/conftool/dbconfig/20220715-061920-root.json
  • 06:08 ryankemper: T311939 Updated list of masters for psi-codfw search to `elastic2027.codfw.wmnet:9700,elastic2029.codfw.wmnet:9700,elastic2054.codfw.wmnet:9700`
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31118 and previous config saved to /var/cache/conftool/dbconfig/20220715-060416-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31117 and previous config saved to /var/cache/conftool/dbconfig/20220715-054912-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31116 and previous config saved to /var/cache/conftool/dbconfig/20220715-053408-root.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31115 and previous config saved to /var/cache/conftool/dbconfig/20220715-051904-root.json
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31114 and previous config saved to /var/cache/conftool/dbconfig/20220715-050400-root.json
  • 00:30 TimStarling: on ms-fe1010 restarting swift-proxy

2022-07-14

  • 22:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 22:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312984)', diff saved to https://phabricator.wikimedia.org/P31112 and previous config saved to /var/cache/conftool/dbconfig/20220714-221112-ladsgroup.json
  • 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31111 and previous config saved to /var/cache/conftool/dbconfig/20220714-215606-ladsgroup.json
  • 21:41 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31110 and previous config saved to /var/cache/conftool/dbconfig/20220714-214101-ladsgroup.json
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312984)', diff saved to https://phabricator.wikimedia.org/P31109 and previous config saved to /var/cache/conftool/dbconfig/20220714-212556-ladsgroup.json
  • 21:15 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 21:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T312984)', diff saved to https://phabricator.wikimedia.org/P31108 and previous config saved to /var/cache/conftool/dbconfig/20220714-210347-ladsgroup.json
  • 21:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 21:03 ryankemper: T289135 First host reimage done, manually killed rolling-operation cookbook before the next host reimage so that we can test out https://gerrit.wikimedia.org/r/813979
  • 21:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 21:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312984)', diff saved to https://phabricator.wikimedia.org/P31107 and previous config saved to /var/cache/conftool/dbconfig/20220714-210327-ladsgroup.json
  • 21:02 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 20:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2027.codfw.wmnet with OS bullseye
  • 20:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31106 and previous config saved to /var/cache/conftool/dbconfig/20220714-204822-ladsgroup.json
  • 20:45 thcipriani: utc-late backport window complete
  • 20:45 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/CampaignEvents: Backport: CampaignEvents: backport extension for Jul 18 beta deploy (T311752) (duration: 02m 49s)
  • 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:36 ryankemper: Restarting elastic services `ryankemper@elastic2054:~$ sudo systemctl restart elasticsearch_6@production*`
  • 20:34 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic2027.codfw.wmnet with reason: host reimage
  • 20:34 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2027.codfw.wmnet with reason: host reimage
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31105 and previous config saved to /var/cache/conftool/dbconfig/20220714-203317-ladsgroup.json
  • 20:33 ryankemper: [Elastic] `ryankemper@elastic2054:~$ sudo run-puppet-agent` to add 2054 as an eligible master for codfw-psi
  • 20:30 ryankemper: [Elastic] We're working on promoting `elastic2054` to a master to replace `elastic2049` which is in hw failure
  • 20:24 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudweb1004.wikimedia.org with OS bullseye
  • 20:18 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2027.codfw.wmnet with OS bullseye
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312984)', diff saved to https://phabricator.wikimedia.org/P31104 and previous config saved to /var/cache/conftool/dbconfig/20220714-201812-ladsgroup.json
  • 20:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 19:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T312984)', diff saved to https://phabricator.wikimedia.org/P31103 and previous config saved to /var/cache/conftool/dbconfig/20220714-195715-ladsgroup.json
  • 19:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 19:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31102 and previous config saved to /var/cache/conftool/dbconfig/20220714-195655-ladsgroup.json
  • 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31100 and previous config saved to /var/cache/conftool/dbconfig/20220714-194150-ladsgroup.json
  • 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31098 and previous config saved to /var/cache/conftool/dbconfig/20220714-192645-ladsgroup.json
  • 19:24 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudweb1003.wikimedia.org with OS bullseye
  • 19:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS bullseye
  • 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31097 and previous config saved to /var/cache/conftool/dbconfig/20220714-191140-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31096 and previous config saved to /var/cache/conftool/dbconfig/20220714-182328-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31095 and previous config saved to /var/cache/conftool/dbconfig/20220714-182308-ladsgroup.json
  • 18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS bullseye
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31094 and previous config saved to /var/cache/conftool/dbconfig/20220714-180803-ladsgroup.json
  • 18:02 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudweb1003.wikimedia.org with OS bullseye
  • 17:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS bullseye
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31093 and previous config saved to /var/cache/conftool/dbconfig/20220714-175258-ladsgroup.json
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31092 and previous config saved to /var/cache/conftool/dbconfig/20220714-173753-ladsgroup.json
  • 17:17 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:15 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:15 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31091 and previous config saved to /var/cache/conftool/dbconfig/20220714-163953-ladsgroup.json
  • 16:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 16:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31090 and previous config saved to /var/cache/conftool/dbconfig/20220714-163933-ladsgroup.json
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31089 and previous config saved to /var/cache/conftool/dbconfig/20220714-162428-ladsgroup.json
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31088 and previous config saved to /var/cache/conftool/dbconfig/20220714-160923-ladsgroup.json
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312977)', diff saved to https://phabricator.wikimedia.org/P31087 and previous config saved to /var/cache/conftool/dbconfig/20220714-160846-marostegui.json
  • 16:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 16:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31086 and previous config saved to /var/cache/conftool/dbconfig/20220714-155418-ladsgroup.json
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31085 and previous config saved to /var/cache/conftool/dbconfig/20220714-155341-marostegui.json
  • 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31084 and previous config saved to /var/cache/conftool/dbconfig/20220714-153836-marostegui.json
  • 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312977)', diff saved to https://phabricator.wikimedia.org/P31083 and previous config saved to /var/cache/conftool/dbconfig/20220714-152331-marostegui.json
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T312977)', diff saved to https://phabricator.wikimedia.org/P31082 and previous config saved to /var/cache/conftool/dbconfig/20220714-152118-marostegui.json
  • 15:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31081 and previous config saved to /var/cache/conftool/dbconfig/20220714-152040-marostegui.json
  • 15:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: sync
  • 15:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: sync
  • 15:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: sync
  • 15:14 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: sync
  • 15:13 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: sync
  • 15:13 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: sync
  • 15:12 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@b8f66e9]: (no justification provided) (duration: 00m 10s)
  • 15:11 ebysans@deploy1002: Started deploy [airflow-dags/analytics@b8f66e9]: (no justification provided)
  • 15:10 ejegg: updated payments-wiki from 6a8aa302 to be11fac2
  • 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31080 and previous config saved to /var/cache/conftool/dbconfig/20220714-150535-marostegui.json
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31079 and previous config saved to /var/cache/conftool/dbconfig/20220714-145736-ladsgroup.json
  • 14:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 14:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312984)', diff saved to https://phabricator.wikimedia.org/P31078 and previous config saved to /var/cache/conftool/dbconfig/20220714-145716-ladsgroup.json
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31077 and previous config saved to /var/cache/conftool/dbconfig/20220714-145030-marostegui.json
  • 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31076 and previous config saved to /var/cache/conftool/dbconfig/20220714-144211-ladsgroup.json
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31075 and previous config saved to /var/cache/conftool/dbconfig/20220714-143525-marostegui.json
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31074 and previous config saved to /var/cache/conftool/dbconfig/20220714-142706-ladsgroup.json
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31073 and previous config saved to /var/cache/conftool/dbconfig/20220714-141917-marostegui.json
  • 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:19 papaul: on going PDU maintenance in rack A6 codfw
  • 14:19 papaul: on going PU maintenance in rack A6 codfw
  • 14:18 papaul: on going PU maintenance in rack A6 codfw
  • 14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31072 and previous config saved to /var/cache/conftool/dbconfig/20220714-141846-marostegui.json
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312984)', diff saved to https://phabricator.wikimedia.org/P31071 and previous config saved to /var/cache/conftool/dbconfig/20220714-141201-ladsgroup.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31070 and previous config saved to /var/cache/conftool/dbconfig/20220714-140341-marostegui.json
  • 14:02 matthiasmullie: UTC afternoon backport window done
  • 13:53 mlitn@deploy1002: Finished scap: Backport: Improve maint script output & update i18n messages (duration: 16m 05s)
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T312984)', diff saved to https://phabricator.wikimedia.org/P31069 and previous config saved to /var/cache/conftool/dbconfig/20220714-135038-ladsgroup.json
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312984)', diff saved to https://phabricator.wikimedia.org/P31068 and previous config saved to /var/cache/conftool/dbconfig/20220714-135000-ladsgroup.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31067 and previous config saved to /var/cache/conftool/dbconfig/20220714-134836-marostegui.json
  • 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:37 mlitn@deploy1002: Started scap: Backport: Improve maint script output & update i18n messages
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31065 and previous config saved to /var/cache/conftool/dbconfig/20220714-133455-ladsgroup.json
  • 13:34 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update boosts for weighted_tags (duration: 02m 45s)
  • 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31064 and previous config saved to /var/cache/conftool/dbconfig/20220714-133331-marostegui.json
  • 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31063 and previous config saved to /var/cache/conftool/dbconfig/20220714-133051-marostegui.json
  • 13:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31062 and previous config saved to /var/cache/conftool/dbconfig/20220714-133031-marostegui.json
  • 13:30 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add custommatch search feature config for commons (duration: 02m 58s)
  • 13:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Special:NewLexemeAlpha on Wikidata and TestWikidata (T306016) (re-sync, config change seemingly not consistently picked up) (duration: 02m 45s)
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31061 and previous config saved to /var/cache/conftool/dbconfig/20220714-131950-ladsgroup.json
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:15 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Special:NewLexemeAlpha on Wikidata and TestWikidata (T306016) (duration: 02m 57s)
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31060 and previous config saved to /var/cache/conftool/dbconfig/20220714-131525-marostegui.json
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312984)', diff saved to https://phabricator.wikimedia.org/P31059 and previous config saved to /var/cache/conftool/dbconfig/20220714-130445-ladsgroup.json
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31058 and previous config saved to /var/cache/conftool/dbconfig/20220714-130020-marostegui.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31057 and previous config saved to /var/cache/conftool/dbconfig/20220714-124515-marostegui.json
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T312984)', diff saved to https://phabricator.wikimedia.org/P31056 and previous config saved to /var/cache/conftool/dbconfig/20220714-124321-ladsgroup.json
  • 12:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 12:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31055 and previous config saved to /var/cache/conftool/dbconfig/20220714-124239-marostegui.json
  • 12:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312977)', diff saved to https://phabricator.wikimedia.org/P31054 and previous config saved to /var/cache/conftool/dbconfig/20220714-124219-marostegui.json
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31053 and previous config saved to /var/cache/conftool/dbconfig/20220714-122714-marostegui.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31052 and previous config saved to /var/cache/conftool/dbconfig/20220714-121209-marostegui.json
  • 12:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312977)', diff saved to https://phabricator.wikimedia.org/P31051 and previous config saved to /var/cache/conftool/dbconfig/20220714-115701-marostegui.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T312977)', diff saved to https://phabricator.wikimedia.org/P31050 and previous config saved to /var/cache/conftool/dbconfig/20220714-115448-marostegui.json
  • 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T312977)', diff saved to https://phabricator.wikimedia.org/P31049 and previous config saved to /var/cache/conftool/dbconfig/20220714-115316-marostegui.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31048 and previous config saved to /var/cache/conftool/dbconfig/20220714-113811-marostegui.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31047 and previous config saved to /var/cache/conftool/dbconfig/20220714-112304-marostegui.json
  • 11:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T312977)', diff saved to https://phabricator.wikimedia.org/P31046 and previous config saved to /var/cache/conftool/dbconfig/20220714-110759-marostegui.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2164 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P31038 and previous config saved to /var/cache/conftool/dbconfig/20220714-052056-marostegui.json
  • 05:07 AndyRussG: update payments-wiki-staging 10304f69 -> be11fac2
  • 04:32 oblivian@puppetmaster1001: conftool action : edit; selector: name=ReadOnly,scope=codfw
  • 04:25 tstarling@puppetmaster1001: conftool action : edit; selector: name=ReadOnly,scope=codfw
  • 04:23 tstarling@puppetmaster1001: conftool action : get/ReadOnly; selector: name=ReadOnly,scope=codfw
  • 01:12 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I73fbfee8248c (duration: 02m 56s)
  • 01:09 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I73fbfee8248c (duration: 02m 45s)
  • 01:03 krinkle@deploy1002: Synchronized php-1.39.0-wmf.19/includes/ResourceLoader/: Ie11bdf (duration: 02m 55s)
  • 01:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:44 krinkle@deploy1002: Synchronized php-1.39.0-wmf.19/includes/ResourceLoader/: Ie11bdf (duration: 02m 55s)
  • 00:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:29 krinkle@deploy1002: Synchronized wmf-config/wikitech.php: Ib539da0c0953 (duration: 02m 47s)
  • 00:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-07-13

  • 22:17 inflatador: bking@elastic2055 successfully staged NIC firmware updates for elastic2055-2060
  • 22:09 inflatador: bking@elastic2055 staging NIC firmware updates for elastic2055-2060
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:09 Lucas_WMDE: UTC late backport+config window done
  • 21:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable DiscussionTools beta feature at mediawikiwiki (T310960) (duration: 02m 47s)
  • 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:02 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: QuickSurveys: Undeploy 'research-incentive' (T311015) (2/2, beta) (duration: 02m 58s)
  • 20:59 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: QuickSurveys: Undeploy 'research-incentive' (T311015) (1/2, prod) (duration: 02m 48s)
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/DiscussionTools/modules/CommentItem.js: Backport: Avoid localized digits in internal timestamps in JS (T312828) (duration: 02m 49s)
  • 20:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2040.codfw.wmnet with OS bullseye
  • 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/extension-list: Config: Undeploy CongressLookup (part 3) (T312894) (duration: 03m 00s)
  • 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:28 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Undeploy CongressLookup (part 2) (T312894) (duration: 02m 53s)
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Undeploy CongressLookup (part 1) (T312894) (duration: 03m 04s)
  • 20:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2040.codfw.wmnet with reason: host reimage
  • 20:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2040.codfw.wmnet with reason: host reimage
  • 19:59 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2040.codfw.wmnet with OS bullseye
  • 18:20 sukhe: upload pdns-recursor_4.6.2-1+wmf11u1 to apt.wm.org (bullseye) - T305589
  • 17:54 sukhe: upload dnsdist_1.7.2-1+wmf11u1 to apt.wm.org (bullseye) - T305589
  • 17:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 17:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 16:17 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@e58e61d]: (no justification provided) (duration: 00m 10s)
  • 16:17 milimetric@deploy1002: Started deploy [airflow-dags/analytics@e58e61d]: (no justification provided)
  • 15:59 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2040.codfw.wmnet with OS bullseye
  • 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:56 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2040.codfw.wmnet with OS bullseye
  • 15:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:12 aqu@deploy1002: Finished deploy [airflow-dags/analytics@9edd1ab]: Deploy [airflow-dags/analytics@9edd1ab] (duration: 00m 10s)
  • 15:12 aqu@deploy1002: Started deploy [airflow-dags/analytics@9edd1ab]: Deploy [airflow-dags/analytics@9edd1ab]
  • 15:10 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9edd1ab]: Deploy [airflow-dags/analytics_test@9edd1ab] (duration: 00m 08s)
  • 15:10 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9edd1ab]: Deploy [airflow-dags/analytics_test@9edd1ab]
  • 14:52 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye
  • 14:38 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye
  • 14:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@03c1a05]: Deploy [airflow-dags/analytics_test@03c1a05] (duration: 00m 12s)
  • 14:34 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@03c1a05]: Deploy [airflow-dags/analytics_test@03c1a05]
  • 14:19 aqu: Deployed refinery using scap, then deployed onto hdfs
  • 14:11 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye
  • 14:08 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@bd39e67] (duration: 07m 42s)
  • 14:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye
  • 14:01 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@bd39e67]
  • 14:00 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67] (thin): Regular analytics weekly train THIN [analytics/refinery@bd39e67] (duration: 00m 07s)
  • 14:00 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67] (thin): Regular analytics weekly train THIN [analytics/refinery@bd39e67]
  • 13:47 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from x1 master', diff saved to https://phabricator.wikimedia.org/P31037 and previous config saved to /var/cache/conftool/dbconfig/20220713-134413-marostegui.json
  • 13:37 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye
  • 13:20 Lucas_WMDE: UTC afternoon backport window done
  • 13:20 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host elastic2049.codfw.wmnet
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:17 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Configure wgLexemeLexicalCategoryItemIds on Wikidata (T307441) (duration: 02m 45s)
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Configure $wgBabelCategoryNames on Test Wikidata (T312920) (duration: 02m 51s)
  • 13:05 inflatador: bking@elastic2049 rebooting for read-only fs
  • 13:04 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2049.codfw.wmnet
  • 12:49 damilare: payments-wiki upgraded from 2f95d8b4 to 6a8aa302
  • 12:12 moritzm: draining ganeti2028 T311686
  • 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2018.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 12:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2018.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: codfw s8 sanitarium master switch
  • 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: codfw s8 sanitarium master switch
  • 10:42 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67]: Regular analytics weekly train (2nd try. --force) [analytics/refinery@bd39e67] (duration: 04m 52s)
  • 10:38 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67]: Regular analytics weekly train (2nd try. --force) [analytics/refinery@bd39e67]
  • 10:27 moritzm: draining ganeti1028 T311686
  • 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2012.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 10:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2012.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31035 and previous config saved to /var/cache/conftool/dbconfig/20220713-090748-ladsgroup.json
  • 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31034 and previous config saved to /var/cache/conftool/dbconfig/20220713-085244-ladsgroup.json
  • 08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31033 and previous config saved to /var/cache/conftool/dbconfig/20220713-083740-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31032 and previous config saved to /var/cache/conftool/dbconfig/20220713-082236-ladsgroup.json
  • 08:05 jayme: 'systemctl restart rsyslog' on kubernetes2007.codfw.wmnet,kubernetes2010.codfw.wmnet,kubernetes2014.codfw.wmnet,kubernetes2020.codfw.wmnet,kubernetes2009.codfw.wmnet
  • 07:52 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 07:52 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 07:51 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 07:50 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31031 and previous config saved to /var/cache/conftool/dbconfig/20220713-070229-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31030 and previous config saved to /var/cache/conftool/dbconfig/20220713-064725-root.json
  • 06:45 aqu: analytics/refinery deploy aborted, no more space to deploy in /srv on an-launcher1002 eqiad
  • 06:44 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67]: Regular analytics weekly train [analytics/refinery@bd39e67] (duration: 27m 02s)
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31029 and previous config saved to /var/cache/conftool/dbconfig/20220713-063221-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31028 and previous config saved to /var/cache/conftool/dbconfig/20220713-061717-root.json
  • 06:16 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67]: Regular analytics weekly train [analytics/refinery@bd39e67]
  • 06:16 aqu: analytics/refinery deployment
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31027 and previous config saved to /var/cache/conftool/dbconfig/20220713-060213-root.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31026 and previous config saved to /var/cache/conftool/dbconfig/20220713-054709-root.json
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31025 and previous config saved to /var/cache/conftool/dbconfig/20220713-053205-root.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31024 and previous config saved to /var/cache/conftool/dbconfig/20220713-051701-root.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2162 in s8 T311493', diff saved to https://phabricator.wikimedia.org/P31023 and previous config saved to /var/cache/conftool/dbconfig/20220713-051239-marostegui.json

2022-07-12

  • 22:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2039.codfw.wmnet with OS bullseye
  • 22:19 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@45ae36d]: subgraph_and_query_metrics: Drop wiki from sparql event partition spec (duration: 02m 04s)
  • 22:17 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@45ae36d]: subgraph_and_query_metrics: Drop wiki from sparql event partition spec
  • 22:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2039.codfw.wmnet with reason: host reimage
  • 22:11 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2039.codfw.wmnet with reason: host reimage
  • 21:50 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2039.codfw.wmnet with OS bullseye
  • 20:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2038.codfw.wmnet with OS bullseye
  • 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2038.codfw.wmnet with reason: host reimage
  • 20:07 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2038.codfw.wmnet with reason: host reimage
  • 19:49 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
  • 19:38 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2038.codfw.wmnet with OS bullseye
  • 19:35 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
  • 19:34 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2038.codfw.wmnet with OS bullseye
  • 19:31 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
  • 19:31 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2038.codfw.wmnet with OS bullseye
  • 19:31 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
  • 19:30 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2038.codfw.wmnet with OS bullseye
  • 19:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
  • 19:26 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I3071c009c (2) (duration: 02m 45s)
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:20 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I3071c009c (duration: 03m 09s)
  • 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic2038.codfw.wmnet with reason: firmware update T312298
  • 19:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic2038.codfw.wmnet with reason: firmware update T312298
  • 19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:13 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic1065.eqiad.wmnet
  • 19:13 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic1065.eqiad.wmnet
  • 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2037.codfw.wmnet with OS bullseye
  • 16:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2037.codfw.wmnet with reason: host reimage
  • 16:55 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2037.codfw.wmnet with reason: host reimage
  • 16:55 bblack: codfw dns repooled for front edge traffic
  • 16:50 herron: ran failed codfw puppet agents
  • 16:47 mutante: doc1002 - systemctl reset-failed
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1026.eqiad.wmnet
  • 16:36 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 16:19 mutante: rebooting mwdebug2001 via ganeti2022
  • 16:15 cwhite: repair networking on people2002
  • 16:11 cwhite: repair networking on puppetdb2002
  • 16:10 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1026.eqiad.wmnet
  • 16:05 mutante: parse200[1-3] - restarted ferm
  • 16:03 mutante: mw2401 through mw2410 - performing ferm restarts (without cumin, has its own issue)
  • 15:57 mutante: mw2405 - restarted ferm
  • 15:50 bblack: codfw dns depooled for front edge traffic
  • 15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic1065.eqiad.wmnet with reason: firmware update T312298
  • 15:48 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic1065.eqiad.wmnet with reason: firmware update T312298
  • 15:30 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
  • 15:06 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 15:06 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
  • 15:06 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 15:06 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2037.codfw.wmnet with OS bullseye
  • 15:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:02 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 14:57 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
  • 14:56 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 14:52 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
  • 14:52 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 14:48 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
  • 14:48 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 14:47 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2037.codfw.wmnet with OS bullseye
  • 14:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on druid1008.eqiad.wmnet with reason: T308331 btullis
  • 14:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on druid1008.eqiad.wmnet with reason: T308331 btullis
  • 14:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 14:30 papaul: on going PDU maintenenace in rack A5
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic2037.codfw.wmnet
  • 13:59 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
  • 13:41 Lucas_WMDE: UTC afternoon backport window done
  • 13:40 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/DiscussionTools/modules/CommentItem.js: Backport: Parse 'DiscussionToolsTimestampFormatSwitchTime' config value as UTC (T312828) (duration: 02m 50s)
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1020.eqiad.wmnet with reason: Rack move, T308331
  • 12:01 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti1020.eqiad.wmnet with reason: Rack move, T308331
  • 10:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 10:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to x1 master until the replica is back from maintenance', diff saved to https://phabricator.wikimedia.org/P31018 and previous config saved to /var/cache/conftool/dbconfig/20220712-101246-marostegui.json
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 for onsite maintenance T308331', diff saved to https://phabricator.wikimedia.org/P31017 and previous config saved to /var/cache/conftool/dbconfig/20220712-101211-root.json
  • 09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:12 hashar: Restarted Zuul T309371
  • 08:58 hashar: Restarted Gerrit T309371
  • 08:25 hashar@deploy1002: Finished deploy [integration/docroot@c2cceaf]: Fix NPM URL for Wikimedia language-data library (duration: 00m 08s)
  • 08:25 hashar@deploy1002: Started deploy [integration/docroot@c2cceaf]: Fix NPM URL for Wikimedia language-data library
  • 07:10 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@89cb17d]: subgraph_and_query_mapping: Increase executor memory to 12g, use repartition (duration: 02m 02s)
  • 07:08 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@89cb17d]: subgraph_and_query_mapping: Increase executor memory to 12g, use repartition
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123', diff saved to https://phabricator.wikimedia.org/P31014 and previous config saved to /var/cache/conftool/dbconfig/20220712-070240-root.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31013 and previous config saved to /var/cache/conftool/dbconfig/20220712-065352-root.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31012 and previous config saved to /var/cache/conftool/dbconfig/20220712-063848-root.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31011 and previous config saved to /var/cache/conftool/dbconfig/20220712-062344-root.json
  • 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:12 marostegui: dbmaint s3@eqiad T310011
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 T311610', diff saved to https://phabricator.wikimedia.org/P31010 and previous config saved to /var/cache/conftool/dbconfig/20220712-060407-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1157 to s3 primary and set section read-write T311610', diff saved to https://phabricator.wikimedia.org/P31009 and previous config saved to /var/cache/conftool/dbconfig/20220712-060058-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T311610', diff saved to https://phabricator.wikimedia.org/P31008 and previous config saved to /var/cache/conftool/dbconfig/20220712-060031-marostegui.json
  • 06:00 marostegui: Starting s3 eqiad failover from db1123 to db1157 - T311610
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1157 with weight 0 T311610', diff saved to https://phabricator.wikimedia.org/P31007 and previous config saved to /var/cache/conftool/dbconfig/20220712-051927-root.json
  • 05:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: Primary switchover s3 T311610
  • 05:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: Primary switchover s3 T311610
  • 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:10 ejegg: updated payments-wiki from 53a7b7bd to 2f95d8b4

2022-07-11

  • 21:49 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3ba1d4c]: subgraph_query_mapping_daily: Increase partitioning to 2048 (duration: 02m 02s)
  • 21:47 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3ba1d4c]: subgraph_query_mapping_daily: Increase partitioning to 2048
  • 20:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@a559f82]: subgraph: Use HivePartitionRangeSensor to wait for sparql queries (duration: 02m 00s)
  • 20:36 TheresNoTime: UTC late deploys done
  • 20:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@a559f82]: subgraph: Use HivePartitionRangeSensor to wait for sparql queries
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:28 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Migrate WikibaseTermboxInteraction from EventLogging to EventGate on all wikis (T290303) (duration: 02m 53s)
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:12 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I82262e try again ref T311788 (duration: 03m 07s)
  • 19:41 hashar@deploy1002: Finished deploy [integration/docroot@fc5d65a]: Add language-data library (duration: 00m 08s)
  • 19:41 hashar@deploy1002: Started deploy [integration/docroot@fc5d65a]: Add language-data library
  • 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P31005 and previous config saved to /var/cache/conftool/dbconfig/20220711-193315-marostegui.json
  • 18:32 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:10 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 16:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@02ab1c2]: use mode=reschedule on all airflow sensors (duration: 02m 02s)
  • 16:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@02ab1c2]: use mode=reschedule on all airflow sensors
  • 16:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 16:11 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I82262e (duration: 02m 55s)
  • 16:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:56 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:56 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2175.codfw.wmnet with OS bullseye
  • 15:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1005.wikimedia.org with reason: host reimage
  • 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1005.wikimedia.org with reason: host reimage
  • 15:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:42 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 51s)
  • 15:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2175.codfw.wmnet with reason: host reimage
  • 15:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 58s)
  • 15:38 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2175.codfw.wmnet with reason: host reimage
  • 15:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:32 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:28 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:27 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:23 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:23 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2175.codfw.wmnet with OS bullseye
  • 15:08 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:34 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:34 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:34 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:34 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:11 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:54 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 13:53 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 13:53 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 13:53 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:05 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2163 to s8 T311493', diff saved to https://phabricator.wikimedia.org/P31002 and previous config saved to /var/cache/conftool/dbconfig/20220711-130441-marostegui.json
  • 12:05 moritzm: updated bullseye netboot image for Bullseye 11.4 point release T312637
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AniketArs out of all services on: 1292 hosts
  • 10:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AniketArs out of all services on: 1292 hosts
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AniketArs out of all services on: 663 hosts
  • 10:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AniketArs out of all services on: 663 hosts
  • 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2027.codfw.wmnet to cluster codfw and group A
  • 08:06 godog: trim thanos raw samples retention to 54w - T311690
  • 08:04 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2027.codfw.wmnet to cluster codfw and group A
  • 07:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
  • 07:52 godog: roll-restart swift-account swift-container across swift/thanos bullseye hosts - T297959
  • 07:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
  • 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:43 taavi@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/PageTriage/includes/HookHandlers/UndeleteHookHandler.php: Backport: UndeleteHookHandler: fix namespace conditional (T311347) (duration: 02m 54s)
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2027.codfw.wmnet with OS bullseye
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2080 from dbtcl T312618', diff saved to https://phabricator.wikimedia.org/P30999 and previous config saved to /var/cache/conftool/dbconfig/20220711-073346-marostegui.json
  • 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2080.codfw.wmnet
  • 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2027.codfw.wmnet with reason: host reimage
  • 07:26 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 07:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2027.codfw.wmnet with reason: host reimage
  • 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2080.codfw.wmnet
  • 07:09 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2027.codfw.wmnet with OS bullseye
  • 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2077.codfw.wmnet
  • 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:54 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:50 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2077.codfw.wmnet
  • 06:28 _joe_: repool thumbor1005
  • 06:28 _joe_: depooled thumbor1005, downgraded firejail, restarted units
  • 00:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply

2022-07-10

  • 13:48 godog: silence ProbeDown pages for thumbor:8800 until wed

2022-07-09

  • 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:48 krinkle@deploy1002: Synchronized php-1.39.0-wmf.19/includes/ResourceLoader/: I3e43b1 (duration: 03m 37s)
  • 01:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:35 krinkle@deploy1002: Synchronized wmf-config/: I1bb97d1d601 (duration: 03m 24s)
  • 01:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-07-08

  • 21:44 ryankemper: [Elastic] Reshuffled shards on eqiad to get cluster back into green status (from yellow): https://phabricator.wikimedia.org/P30995#130117
  • 21:32 ori: apt1001: reprepro -C main include buster-wikimedia libvmod-querysort_0.2_amd64.changes
  • 19:58 thcipriani: quick phab downtime for deploy to fix T312614
  • 19:57 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
  • 19:57 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
  • 19:57 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
  • 19:56 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
  • 19:56 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
  • 19:56 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
  • 19:49 tzatziki: removing 2 files for legal compliance
  • 18:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1001.wikimedia.org with OS bullseye
  • 18:26 urandom: changing Cassandra superuser password, AQS cluster -- T311652
  • 18:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1001.wikimedia.org with reason: host reimage
  • 18:18 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1001.wikimedia.org with reason: host reimage
  • 18:03 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1001.wikimedia.org with OS bullseye
  • 16:25 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:29 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:27 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:15 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:59 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:49 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1004.wikimedia.org with OS bullseye
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30990 and previous config saved to /var/cache/conftool/dbconfig/20220708-143411-root.json
  • 14:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1004.wikimedia.org with reason: host reimage
  • 14:22 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1004.wikimedia.org with reason: host reimage
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30983 and previous config saved to /var/cache/conftool/dbconfig/20220708-141907-root.json
  • 14:11 hashar@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: AddImage: Only process metadata for a single valid suggestion - T312544 (duration: 03m 25s)
  • 14:09 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1004.wikimedia.org with OS bullseye
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30978 and previous config saved to /var/cache/conftool/dbconfig/20220708-140404-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30975 and previous config saved to /var/cache/conftool/dbconfig/20220708-134900-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30974 and previous config saved to /var/cache/conftool/dbconfig/20220708-133356-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30973 and previous config saved to /var/cache/conftool/dbconfig/20220708-131852-root.json
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30971 and previous config saved to /var/cache/conftool/dbconfig/20220708-130348-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30970 and previous config saved to /var/cache/conftool/dbconfig/20220708-124844-root.json
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts deneb.codfw.wmnet
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts deneb.codfw.wmnet
  • 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti2027.codfw.wmnet with reason: Temporarily remove from Ganeti cluster for reimage
  • 09:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti2027.codfw.wmnet with reason: Temporarily remove from Ganeti cluster for reimage
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2016.codfw.wmnet to cluster codfw and group D
  • 07:33 akosiaris: reboot rdb1009 for kernel upgrades
  • 07:29 vgutierrez: restart pybal on lvs6002
  • 07:22 akosiaris: reboot rdb1010 for kernel upgrades
  • 06:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2016.codfw.wmnet to cluster codfw and group D
  • 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
  • 06:47 TimStarling: on mwmaint2002: using iptables to simulate cross-DC memcached traffic loss
  • 06:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
  • 06:05 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Switch $wgCentralAuthTokenCacheType to mcrouter-primary-dc (duration: 03m 18s)
  • 06:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2016.codfw.wmnet with OS bullseye
  • 06:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2077 from dbctl T312191', diff saved to https://phabricator.wikimedia.org/P30963 and previous config saved to /var/cache/conftool/dbconfig/20220708-055334-marostegui.json
  • 05:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2016.codfw.wmnet with reason: host reimage
  • 05:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2016.codfw.wmnet with reason: host reimage
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2076.codfw.wmnet
  • 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:38 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2076.codfw.wmnet
  • 05:31 moritzm: draining ganeti2027 T311686
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2076 from dbctl T312190', diff saved to https://phabricator.wikimedia.org/P30962 and previous config saved to /var/cache/conftool/dbconfig/20220708-052926-marostegui.json
  • 05:26 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2016.codfw.wmnet with OS bullseye
  • 05:23 marostegui: dbmaint s3@eqiad T312574
  • 04:08 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@b5d49fe]: use mode=reschedule on all airflow sensors (duration: 02m 03s)
  • 04:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@b5d49fe]: use mode=reschedule on all airflow sensors
  • 03:33 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - T309343
  • 03:22 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1004.wikimedia.org with OS bullseye
  • 02:27 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@c271774]: Update rdf-spark-tools to 0.3.112 (duration: 02m 13s)
  • 02:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1004.wikimedia.org with OS bullseye
  • 02:25 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - T309343
  • 02:25 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@c271774]: Update rdf-spark-tools to 0.3.112
  • 02:12 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: RL use MainStash on dewiki I1c120d64d226 (duration: 03m 21s)
  • 01:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS bullseye
  • 01:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
  • 01:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
  • 01:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS bullseye
  • 01:12 mutante: gitlab1004 - _still_ icinga alerts about rsync to decom'ed host. 'systemctl daemon-reload' to teach it about deleted units, then systemctl reset failed ..then RECOVERY T307142
  • 00:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2181.codfw.wmnet with OS bullseye

2022-07-07

  • 23:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2181.codfw.wmnet with reason: host reimage
  • 23:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2181.codfw.wmnet with reason: host reimage
  • 23:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2180.codfw.wmnet with OS bullseye
  • 23:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2180.codfw.wmnet with reason: host reimage
  • 23:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2181.codfw.wmnet with OS bullseye
  • 23:26 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I9b97f79618 (duration: 03m 23s)
  • 23:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2180.codfw.wmnet with reason: host reimage
  • 23:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2179.codfw.wmnet with OS bullseye
  • 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2179.codfw.wmnet with reason: host reimage
  • 22:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2179.codfw.wmnet with reason: host reimage
  • 22:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:56 krinkle@deploy1002: Synchronized multiversion/: I1f2daab316 (duration: 03m 43s)
  • 22:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2178.codfw.wmnet with reason: host reimage
  • 22:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2178.codfw.wmnet with reason: host reimage
  • 22:42 krinkle@deploy1002: Synchronized wmf-config/missing.php: I13a4ba0e307a (duration: 03m 33s)
  • 22:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2179.codfw.wmnet with OS bullseye
  • 22:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2177.codfw.wmnet with OS bullseye
  • 22:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2178.codfw.wmnet with OS bullseye
  • 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2176.codfw.wmnet with OS bullseye
  • 22:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2177.codfw.wmnet with reason: host reimage
  • 22:17 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2177.codfw.wmnet with reason: host reimage
  • 22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2176.codfw.wmnet with reason: host reimage
  • 21:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2177.codfw.wmnet with OS bullseye
  • 21:33 krinkle@deploy1002: Synchronized multiversion/: Ice5302 (duration: 03m 18s)
  • 21:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:28 krinkle@deploy1002: Synchronized multiversion/MWMultiVersion.php: Ice5302 (duration: 03m 18s)
  • 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2177.codfw.wmnet with OS bullseye
  • 20:55 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Migrate WikibaseTermboxInteraction from EventLogging to EventGate on testwiki (T290303) (duration: 03m 12s)
  • 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e0a8f03]: tune subgraph_mapping_weekly based on first prod run (duration: 02m 05s)
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:49 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e0a8f03]: tune subgraph_mapping_weekly based on first prod run
  • 20:49 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.19/includes/parser/ParserOutput.php: Backport: ParserOutput::mergeMapStrategy: don't crash if merging non-array values (T312242) (duration: 03m 05s)
  • 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:38 thcipriani@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: Config: Enable VisualEditor on thwikibooks by default (T308379) (duration: 03m 13s)
  • 20:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2176.codfw.wmnet with OS bullseye
  • 20:34 thcipriani@deploy1002: Synchronized wmf-config/config/thwikibooks.yaml: Config: Enable VisualEditor on thwikibooks by default (T308379) (duration: 03m 25s)
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1012.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1014.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2181.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2182.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudweb1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudweb1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1012.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1014.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudweb1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudweb1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc1001.eqiad.wmnet
  • 20:11 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:08 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:03 mutante: destroying former strech backend of doc.wikimedia.org, replaced by doc1002 on buster (T247653)
  • 20:03 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts doc1001.eqiad.wmnet
  • 20:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1006.wikimedia.org with OS bullseye
  • 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2182.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2181.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2180.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2179.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1006.wikimedia.org with reason: host reimage
  • 19:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1006.wikimedia.org with reason: host reimage
  • 19:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1007.wikimedia.org with OS bullseye
  • 19:16 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:15 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.wikimedia.org with OS bullseye
  • 19:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:10 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:07 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 19:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1007.wikimedia.org with reason: host reimage
  • 19:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2180.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1007.wikimedia.org with reason: host reimage
  • 18:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2179.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2178.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1007.wikimedia.org with OS bullseye
  • 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 18:46 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1006.wikimedia.org with OS bullseye
  • 18:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.wikimedia.org with OS bullseye
  • 18:36 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - T309343
  • 18:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 18:26 brett@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2178.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:22 brett@cumin1001: START - Cookbook sre.dns.netbox
  • 18:22 brett@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:18 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:16 brett@cumin1001: START - Cookbook sre.dns.netbox
  • 18:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:05 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:02 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:58 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:56 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:53 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:51 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:51 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:39 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:33 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1002.wikimedia.org with OS bullseye
  • 17:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1006.wikimedia.org with OS bullseye
  • 17:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.wikimedia.org with OS bullseye
  • 17:12 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1002.wikimedia.org with reason: host reimage
  • 17:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1002.wikimedia.org with reason: host reimage
  • 16:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1005.wikimedia.org with OS bullseye
  • 16:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 16:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 16:49 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1002.wikimedia.org with OS bullseye
  • 16:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bullseye
  • 16:48 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - T309343
  • 16:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1002.wikimedia.org with OS bullseye
  • 16:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1005.wikimedia.org with reason: host reimage
  • 16:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1005.wikimedia.org with reason: host reimage
  • 16:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1001.wikimedia.org with OS bullseye
  • 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 16:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
  • 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 16:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
  • 16:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.wikimedia.org with OS bullseye
  • 16:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1001.wikimedia.org with reason: host reimage
  • 16:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1003.wikimedia.org with reason: host reimage
  • 16:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1001.wikimedia.org with reason: host reimage
  • 16:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 16:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1003.wikimedia.org with reason: host reimage
  • 16:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye
  • 16:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1002.wikimedia.org with OS bullseye
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30959 and previous config saved to /var/cache/conftool/dbconfig/20220707-160308-root.json
  • 16:02 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 16:01 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 16:01 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bullseye
  • 15:59 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30958 and previous config saved to /var/cache/conftool/dbconfig/20220707-154804-root.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30957 and previous config saved to /var/cache/conftool/dbconfig/20220707-153300-root.json
  • 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30956 and previous config saved to /var/cache/conftool/dbconfig/20220707-151756-root.json
  • 15:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2016.codfw.wmnet with reason: Drop from ganeti cluster for eventual reimage
  • 15:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2016.codfw.wmnet with reason: Drop from ganeti cluster for eventual reimage
  • 15:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2010.codfw.wmnet to cluster codfw and group C
  • 15:09 moritzm: installing containerd security updates
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30955 and previous config saved to /var/cache/conftool/dbconfig/20220707-150252-root.json
  • 14:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2010.codfw.wmnet to cluster codfw and group C
  • 14:54 reedy@deploy1002: Synchronized composer.json: Cleanup (duration: 03m 19s)
  • 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30953 and previous config saved to /var/cache/conftool/dbconfig/20220707-144748-root.json
  • 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
  • 14:41 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2010.codfw.wmnet to cluster codfw and group C
  • 14:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2010.codfw.wmnet to cluster codfw and group C
  • 14:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30952 and previous config saved to /var/cache/conftool/dbconfig/20220707-143244-root.json
  • 14:28 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 14:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
  • 14:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:23 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 14:23 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30951 and previous config saved to /var/cache/conftool/dbconfig/20220707-141740-root.json
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1132', diff saved to https://phabricator.wikimedia.org/P30950 and previous config saved to /var/cache/conftool/dbconfig/20220707-141724-marostegui.json
  • 14:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 14:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 14:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 13:49 moritzm: draining ganeti2016 T311686
  • 13:44 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: 95c38bd: ServiceImageRecommendationProvider: Dont fail on first validation error (T312521) (duration: 03m 24s)
  • 13:41 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/Translate/tag/PageTranslationHooks.php: af51745: Translation unit deletion: Skip translation update if it doesnt exist (T312293) (duration: 03m 32s)
  • 13:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:31 urbanecm@deploy1002: Synchronized wmf-config/: aa1d8c8: GrowthExperiments: Set GEImageRecommendationApiHandler (T306032; 2/2) (duration: 03m 37s)
  • 13:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:27 urbanecm@deploy1002: Synchronized wmf-config/ProductionServices.php: aa1d8c8: GrowthExperiments: Set GEImageRecommendationApiHandler (T306032; 1/2) (duration: 03m 20s)
  • 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:24 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: df1393f: ServiceImageRecommendationProvider: Dont fail on first validation error (T312521) (duration: 03m 30s)
  • 13:21 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host ganeti2010.codfw.wmnet with OS bullseye
  • 13:21 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 13:20 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:20 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 13:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:19 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:19 elukey: roll restart eventgate-main pods to add a new stream - T301878
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2165 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P30948 and previous config saved to /var/cache/conftool/dbconfig/20220707-131852-marostegui.json
  • 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2010.codfw.wmnet with reason: host reimage
  • 13:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:01 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2010.codfw.wmnet with reason: host reimage
  • 12:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:44 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS bullseye
  • 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2004.codfw.wmnet
  • 12:37 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2004.codfw.wmnet
  • 12:22 moritzm: draining ganeti2015 T311686
  • 11:53 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 11:49 jayme: rolling back helm release eventstreams-internal/main to revision 3 on eqiad and codfw clusters because it's pending-upgrade since Mon Mar 21 21:36:56 2022 / Mon Mar 21 16:05:54 2022
  • 11:42 jayme@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 11:42 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 11:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 11:40 jayme: rolling back helm release tegola-vector-tiles/main to revision 11 on staging-eqiad because it's pending-upgrade since Mon Jun 27 09:45:56 2022
  • 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 11:00 moritzm: installing intel-microcode security updates
  • 10:59 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 10:59 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 10:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 10:56 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 10:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 10:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 10:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 10:32 moritzm: draining ganeti2010 T311686
  • 10:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:47 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 09:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:44 moritzm: installing 5.10.120-1~bpo10+1 kernels on buster hosts running Linux 5.10
  • 09:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8599f39: Declare mediawiki.editgrowthconfig schema (T312148) (duration: 03m 37s)
  • 09:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:38 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:37 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:35 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:33 marostegui: dbmaint s3@eqiad T312285
  • 09:33 marostegui: dbmaint s7@eqiad T312285
  • 09:33 marostegui: dbmaint s2@eqiad T312285
  • 09:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dbstore1007.eqiad.wmnet
  • 09:31 marostegui: dbmaint s6@eqiad T312285
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2161 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P30940 and previous config saved to /var/cache/conftool/dbconfig/20220707-092424-marostegui.json
  • 09:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dbstore1007.eqiad.wmnet
  • 09:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dbstore1005.eqiad.wmnet
  • 09:17 moritzm: draining ganeti2009 T311686
  • 09:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dbstore1005.eqiad.wmnet
  • 09:11 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dbstore1003.eqiad.wmnet
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type back to plain
  • 09:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type back to plain
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2074', diff saved to https://phabricator.wikimedia.org/P30938 and previous config saved to /var/cache/conftool/dbconfig/20220707-090700-marostegui.json
  • 09:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dbstore1003.eqiad.wmnet
  • 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2074.codfw.wmnet
  • 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 08:47 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2074.codfw.wmnet
  • 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:08 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.19 refs T308072
  • 07:31 marostegui: dbmaint s3@eqiad T312286
  • 07:29 marostegui: dbmaint s7@eqiad T312286
  • 07:29 marostegui: dbmaint s2@eqiad T312286
  • 07:28 marostegui: dbmaint s6@eqiad T312286
  • 07:27 apergos: UTC morning backport and config training window closed
  • 07:23 marostegui: dbmaint s3@eqiad T312287
  • 07:20 marostegui: dbmaint s6@eqiad T312287
  • 07:19 marostegui: dbmaint s7@eqiad T312287
  • 07:19 marostegui: dbmaint s2@eqiad T312287
  • 07:14 kartik@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/ContentTranslation/modules/mw.cx.MachineTranslationManager.js: Backport: Update MT label for Flores (T311411) (duration: 03m 20s)
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:07 kartik@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/ContentTranslation/modules/mw.cx.MachineTranslationManager.js: Backport: Update MT label for Flores (T311411) (duration: 03m 41s)
  • 07:07 moritzm: drain ganeti1020 T308331
  • 07:07 marostegui: dbmaint s3@eqiad T312288
  • 07:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:03 marostegui: dbmaint s6@eqiad T312288
  • 07:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:00 marostegui: dbmaint s2@eqiad T312288
  • 06:56 marostegui: dbmaint s7@eqiad T312288
  • 06:31 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 06:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1160 T311611', diff saved to https://phabricator.wikimedia.org/P30937 and previous config saved to /var/cache/conftool/dbconfig/20220707-060743-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 primary and set section read-write T311611', diff saved to https://phabricator.wikimedia.org/P30936 and previous config saved to /var/cache/conftool/dbconfig/20220707-060112-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T311611', diff saved to https://phabricator.wikimedia.org/P30935 and previous config saved to /var/cache/conftool/dbconfig/20220707-060037-ladsgroup.json
  • 06:00 Amir1: Starting s4 eqiad failover from db1160 to db1138 - T311611
  • 05:35 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 05:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1138 with weight 0 T311611', diff saved to https://phabricator.wikimedia.org/P30933 and previous config saved to /var/cache/conftool/dbconfig/20220707-051406-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 31 hosts with reason: Primary switchover s4 T311611
  • 05:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 31 hosts with reason: Primary switchover s4 T311611
  • 01:09 mutante: gitlab1004 - systemctl reset-failed, clear icinga alerts about rsync to decom'ed machine
  • 00:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:25 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab1001.wikimedia.org
  • 00:25 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)

2022-07-06

  • 23:50 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@5082f17]: increase subgraph_mapping_weekly executor memory (duration: 02m 05s)
  • 23:48 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@5082f17]: increase subgraph_mapping_weekly executor memory
  • 23:30 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 23:25 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab1001.wikimedia.org
  • 23:00 mutante: gitlab1004 - rm /lib/systemd/system/rsync-config-backup-gitlab1001* T307142
  • 22:52 mutante: etherpad - deleted 2 pads that had leaked information
  • 22:52 ebernhardson: restart airflow-webserver and airflow-scheduler for plugins update on an-airflow1001
  • 22:37 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@debd402]: airflow dags to generate subgraph and query mapping along with their metrics (duration: 02m 01s)
  • 22:35 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@debd402]: airflow dags to generate subgraph and query mapping along with their metrics
  • 21:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1005.wikimedia.org with OS bullseye
  • 21:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 21:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 21:39 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudrabbit1003.wikimedia.org with OS bullseye
  • 21:39 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudrabbit1002.wikimedia.org with OS bullseye
  • 20:59 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudrabbit1001.wikimedia.org with OS bullseye
  • 20:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.wikimedia.org with OS bullseye
  • 20:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 20:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 20:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye
  • 20:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1002.wikimedia.org with OS bullseye
  • 20:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bullseye
  • 20:36 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudrabbit1001.wikimedia.org with OS bullseye
  • 20:35 cjming: end of UTC late backport window
  • 20:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bullseye
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable sticky header edit A/B test for pilot wikis excluding idwiki/viwiki (T311144) (duration: 03m 25s)
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:54 bd808@mwmaint1002: Testing statshbot following deploy of gerrit:809732. This should be logged in SAL, but stashbot should not say that was done on irc.
  • 19:13 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 19:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:48 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:48 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:47 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:47 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:45 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - T309343
  • 18:45 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:10 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:02 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - T309343
  • 17:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:55 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudcephmon1002.eqiad.wmnet with reason: Moving racks
  • 17:55 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudcephmon1002.eqiad.wmnet with reason: Moving racks
  • 17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:06 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 38s)
  • 17:06 inflatador: bking@cloudelastic1006 "restarting elastic services in preparation for cloudelastic reimage T309343"
  • 16:07 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-ctrl1002.eqiad.wmnet
  • 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1002.eqiad.wmnet on all recursors
  • 15:57 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1002.eqiad.wmnet on all recursors
  • 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:53 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 15:53 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl1002.eqiad.wmnet
  • 15:51 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-ctrl1001.eqiad.wmnet
  • 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1001.eqiad.wmnet on all recursors
  • 15:41 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1001.eqiad.wmnet on all recursors
  • 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 15:37 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl1001.eqiad.wmnet
  • 15:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 15:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 15:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 15:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:09 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 41s)
  • 15:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:05 moritzm: installing intel-microcode security updates
  • 15:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:00 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 28s)
  • 14:56 cmjohnson1: moving switch ports cloudcephosd1021 from cloudsw1-c to cloudsw2-c T310546
  • 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:53 akosiaris: reboot poolcounter1005 for kernel upgrades
  • 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:49 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 33s)
  • 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:39 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudstore1009.wikimedia.org
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:34 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:32 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 14:32 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 14:30 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 14:30 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 14:27 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 14:26 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 14:22 akosiaris: depool eqiad kartotherian T305845
  • 14:22 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 14:17 akosiaris: pool codfw for kartotherian T305845
  • 14:16 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 14:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudstore1009.wikimedia.org
  • 14:15 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudstore1008.wikimedia.org
  • 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:10 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:05 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudstore1008.wikimedia.org
  • 13:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:54 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.addnode (exit_code=97) for new host ganeti2024.codfw.wmnet to cluster codfw and group A
  • 13:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add a new Eventgate stream for revision-score events (T301878) (duration: 03m 46s)
  • 13:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2024.codfw.wmnet to cluster codfw and group A
  • 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
  • 13:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 13:35 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2039.codfw.wmnet
  • 13:30 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd1003.eqiad.wmnet
  • 13:28 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2039.codfw.wmnet
  • 13:28 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
  • 13:28 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1039.eqiad.wmnet
  • 13:20 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
  • 13:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2037.codfw.wmnet
  • 13:19 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1039.eqiad.wmnet
  • 13:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1038.eqiad.wmnet
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1132 (T311106)', diff saved to https://phabricator.wikimedia.org/P30930 and previous config saved to /var/cache/conftool/dbconfig/20220706-131715-ladsgroup.json
  • 13:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1038.eqiad.wmnet
  • 13:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2037.codfw.wmnet
  • 13:09 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1037.eqiad.wmnet
  • 13:08 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2036.codfw.wmnet
  • 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudstore1009.wikimedia.org
  • 13:04 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudstore1009.wikimedia.org
  • 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudstore1008.wikimedia.org
  • 13:03 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudstore1008.wikimedia.org
  • 13:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1037.eqiad.wmnet
  • 13:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2036.codfw.wmnet
  • 12:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2035.codfw.wmnet
  • 12:56 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1036.eqiad.wmnet
  • 12:51 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudstore1008.wikimedia.org
  • 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:49 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1036.eqiad.wmnet
  • 12:49 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2035.codfw.wmnet
  • 12:48 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:43 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudstore1008.wikimedia.org
  • 12:41 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd1003.eqiad.wmnet on all recursors
  • 12:41 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd1003.eqiad.wmnet on all recursors
  • 12:41 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:40 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
  • 12:28 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 12:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2034.codfw.wmnet
  • 12:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1035.eqiad.wmnet
  • 12:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1035.eqiad.wmnet
  • 12:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2034.codfw.wmnet
  • 12:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1033.eqiad.wmnet
  • 12:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2033.codfw.wmnet
  • 11:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1033.eqiad.wmnet
  • 11:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2033.codfw.wmnet
  • 11:52 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2032.codfw.wmnet
  • 11:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1032.eqiad.wmnet
  • 11:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1032.eqiad.wmnet
  • 11:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2032.codfw.wmnet
  • 11:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2031.codfw.wmnet
  • 11:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1031.eqiad.wmnet
  • 11:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1031.eqiad.wmnet
  • 11:28 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2031.codfw.wmnet
  • 11:18 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1030.eqiad.wmnet
  • 11:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2030.codfw.wmnet
  • 11:09 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1030.eqiad.wmnet
  • 11:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2030.codfw.wmnet
  • 11:07 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1029.eqiad.wmnet
  • 11:07 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2029.codfw.wmnet
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P30927 and previous config saved to /var/cache/conftool/dbconfig/20220706-110658-root.json
  • 10:58 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1029.eqiad.wmnet
  • 10:58 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2029.codfw.wmnet
  • 10:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1028.eqiad.wmnet
  • 10:54 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 10:54 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd1003.eqiad.wmnet
  • 10:53 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
  • 10:52 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd1002.eqiad.wmnet
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P30925 and previous config saved to /var/cache/conftool/dbconfig/20220706-105154-root.json
  • 10:46 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1028.eqiad.wmnet
  • 10:44 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
  • 10:42 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd1002.eqiad.wmnet on all recursors
  • 10:42 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd1002.eqiad.wmnet on all recursors
  • 10:42 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet
  • 10:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1009.eqiad.wmnet
  • 10:37 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 10:37 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd1002.eqiad.wmnet
  • 10:37 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd1001.eqiad.wmnet
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P30923 and previous config saved to /var/cache/conftool/dbconfig/20220706-103650-root.json
  • 10:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet
  • 10:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe1009.eqiad.wmnet
  • 10:27 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd1001.eqiad.wmnet on all recursors
  • 10:27 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd1001.eqiad.wmnet on all recursors
  • 10:27 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:22 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 10:22 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd1001.eqiad.wmnet
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P30921 and previous config saved to /var/cache/conftool/dbconfig/20220706-102146-root.json
  • 10:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:19 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2024.codfw.wmnet
  • 10:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P30920 and previous config saved to /var/cache/conftool/dbconfig/20220706-100642-root.json
  • 10:02 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 09:59 volans: restarted wikibugs
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P30919 and previous config saved to /var/cache/conftool/dbconfig/20220706-095138-root.json
  • 09:50 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30918 and previous config saved to /var/cache/conftool/dbconfig/20220706-093752-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30917 and previous config saved to /var/cache/conftool/dbconfig/20220706-093741-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30916 and previous config saved to /var/cache/conftool/dbconfig/20220706-093733-root.json
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P30915 and previous config saved to /var/cache/conftool/dbconfig/20220706-093634-root.json
  • 09:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:23 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2024.codfw.wmnet
  • 09:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30914 and previous config saved to /var/cache/conftool/dbconfig/20220706-092248-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30913 and previous config saved to /var/cache/conftool/dbconfig/20220706-092237-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30912 and previous config saved to /var/cache/conftool/dbconfig/20220706-092229-root.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P30911 and previous config saved to /var/cache/conftool/dbconfig/20220706-092130-root.json
  • 09:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30908 and previous config saved to /var/cache/conftool/dbconfig/20220706-091717-root.json
  • 09:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:15 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1039.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:14 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1039.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:14 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1038.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:13 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1038.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:13 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1037.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:11 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1037.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:11 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1036.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:10 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1036.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:10 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1035.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 09:09 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1035.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30907 and previous config saved to /var/cache/conftool/dbconfig/20220706-090744-root.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30906 and previous config saved to /var/cache/conftool/dbconfig/20220706-090731-root.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30905 and previous config saved to /var/cache/conftool/dbconfig/20220706-090725-root.json
  • 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:04 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2039.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:02 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2039.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:02 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2038.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:01 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2038.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:01 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2037.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:00 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2037.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:00 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2036.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:58 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2036.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:55 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2034.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:54 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2034.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:54 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2033.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:53 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2033.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:53 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2032.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30904 and previous config saved to /var/cache/conftool/dbconfig/20220706-085240-root.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30903 and previous config saved to /var/cache/conftool/dbconfig/20220706-085227-root.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30902 and previous config saved to /var/cache/conftool/dbconfig/20220706-085221-root.json
  • 08:51 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2032.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:51 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2031.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:50 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2031.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:50 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2030.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:48 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2030.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:48 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2029.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:47 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2029.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:47 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2028.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:46 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2028.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:43 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1033.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:41 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1033.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:41 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1032.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:40 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1032.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:40 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1031.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:39 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1031.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:39 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1030.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:37 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1030.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:37 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1029.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30901 and previous config saved to /var/cache/conftool/dbconfig/20220706-083736-root.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30900 and previous config saved to /var/cache/conftool/dbconfig/20220706-083723-root.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30899 and previous config saved to /var/cache/conftool/dbconfig/20220706-083718-root.json
  • 08:36 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1029.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:26 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1033.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30898 and previous config saved to /var/cache/conftool/dbconfig/20220706-082603-ladsgroup.json
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30897 and previous config saved to /var/cache/conftool/dbconfig/20220706-082540-ladsgroup.json
  • 08:25 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1033.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:25 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1032.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:23 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1032.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30896 and previous config saved to /var/cache/conftool/dbconfig/20220706-082232-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30895 and previous config saved to /var/cache/conftool/dbconfig/20220706-082219-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30894 and previous config saved to /var/cache/conftool/dbconfig/20220706-082214-root.json
  • 08:21 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1031.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:20 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1031.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2024.codfw.wmnet with OS bullseye
  • 08:16 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1030.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:14 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1030.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:12 jnuche@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.19 refs T308072 (duration: 03m 39s)
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30893 and previous config saved to /var/cache/conftool/dbconfig/20220706-081059-ladsgroup.json
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30892 and previous config saved to /var/cache/conftool/dbconfig/20220706-081036-ladsgroup.json
  • 08:09 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1029.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:08 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.19 refs T308072
  • 08:07 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1029.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30891 and previous config saved to /var/cache/conftool/dbconfig/20220706-080728-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30890 and previous config saved to /var/cache/conftool/dbconfig/20220706-080715-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30889 and previous config saved to /var/cache/conftool/dbconfig/20220706-080710-root.json
  • 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2024.codfw.wmnet with reason: host reimage
  • 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1028.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:01 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1028.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 07:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2024.codfw.wmnet with reason: host reimage
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30888 and previous config saved to /var/cache/conftool/dbconfig/20220706-075555-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30887 and previous config saved to /var/cache/conftool/dbconfig/20220706-075532-ladsgroup.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30886 and previous config saved to /var/cache/conftool/dbconfig/20220706-075224-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30885 and previous config saved to /var/cache/conftool/dbconfig/20220706-075211-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30884 and previous config saved to /var/cache/conftool/dbconfig/20220706-075206-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P30883 and previous config saved to /var/cache/conftool/dbconfig/20220706-074721-root.json
  • 07:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2024.codfw.wmnet with OS bullseye
  • 07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30882 and previous config saved to /var/cache/conftool/dbconfig/20220706-074051-ladsgroup.json
  • 07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 10%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30881 and previous config saved to /var/cache/conftool/dbconfig/20220706-074028-ladsgroup.json
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1135, if anything breaks, it's marostegui's fault (T311106)', diff saved to https://phabricator.wikimedia.org/P30880 and previous config saved to /var/cache/conftool/dbconfig/20220706-073052-ladsgroup.json
  • 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2024.codfw.wmnet with reason: Remove node for reimage
  • 07:28 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2024.codfw.wmnet with reason: Remove node for reimage
  • 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:20 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/DiscussionTools/modules/dt.init.less: Backport: Revert "Hide the lede section on mobile when DiscussionTools is enabled" (T312177) (duration: 03m 37s)
  • 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30879 and previous config saved to /var/cache/conftool/dbconfig/20220706-071157-ladsgroup.json
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30878 and previous config saved to /var/cache/conftool/dbconfig/20220706-070835-ladsgroup.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P30876 and previous config saved to /var/cache/conftool/dbconfig/20220706-065143-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P30875 and previous config saved to /var/cache/conftool/dbconfig/20220706-063639-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P30874 and previous config saved to /var/cache/conftool/dbconfig/20220706-062135-root.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P30873 and previous config saved to /var/cache/conftool/dbconfig/20220706-060631-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P30872 and previous config saved to /var/cache/conftool/dbconfig/20220706-055127-root.json
  • 05:48 marostegui: dbmaint x1@eqiad T312162
  • 05:48 marostegui: dbmaint s3@eqiad T312162
  • 05:46 marostegui: dbmaint s3@eqiad T312161
  • 05:45 marostegui: dbmaint x1@eqiad T312161
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P30871 and previous config saved to /var/cache/conftool/dbconfig/20220706-053623-root.json
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 12 hosts with reason: codfw s7 sanitarium master switch
  • 05:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 12 hosts with reason: codfw s7 sanitarium master switch
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P30870 and previous config saved to /var/cache/conftool/dbconfig/20220706-052119-root.json
  • 05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: codfw s6 sanitarium master switch
  • 05:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: codfw s6 sanitarium master switch
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2159 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P30869 and previous config saved to /var/cache/conftool/dbconfig/20220706-051046-marostegui.json
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P30868 and previous config saved to /var/cache/conftool/dbconfig/20220706-050615-root.json
  • 04:18 tstarling@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/AbuseFilter: T310662 deployment with possible post-send error spike due to ServiceWiring/FilterProfiler interdependency (duration: 03m 33s)
  • 04:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 04:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:34 tstarling@deploy1002: Finished scap: WRStats core prereq T310662 g811407 (duration: 17m 20s)
  • 03:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:17 tstarling@deploy1002: Started scap: WRStats core prereq T310662 g811407
  • 02:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:30 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T310662 g 811394 harmless prerequisite (duration: 03m 39s)
  • 02:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:28 mutante: gitlab1004 - rm /lib/systemd/system/rsync-config-backup-gitlab2001.wikimedia.org.*
  • 01:21 mutante: gitlab1004 rm /lib/systemd/system/rsync-data-backup-gitlab2001.wikimedia.org.* ; systemctl reset-failed (T274463, T307142) - fix icinga alert after gitlab2001 was decom'ed, we didn't have puppet remove the timer/service

2022-07-05

  • 23:30 ebernhardson: start restore of commonswiki_file from thanos-swift to cloudelastic
  • 23:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - ryankemper@cumin1001 - T309648
  • 22:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - ryankemper@cumin1001 - T309648
  • 22:28 ryankemper: T309648 Manually restarting `cloudelastic1006` before proceeding to a normal rolling restart of cloudelastic
  • 21:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:55 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable title above tabs everywhere (T311773) (duration: 03m 23s)
  • 21:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:35 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert: cirrus: Disable commonswiki writes to cloudelastic (T309648) (duration: 03m 42s)
  • 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:27 ebernhardson@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/CirrusSearch/includes/Job/ElasticaWrite.php: Backport: job queue: Squelch errors related to unwritable cloudelastic (T309648) (duration: 03m 37s)
  • 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:19 ebernhardson@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/CirrusSearch/includes/Job/ElasticaWrite.php: Backport: job queue: Squelch errors related to unwritable cloudelastic (T309648) (duration: 03m 43s)
  • 20:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2174.codfw.wmnet with OS bullseye
  • 20:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2174.codfw.wmnet with reason: host reimage
  • 20:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2174.codfw.wmnet with reason: host reimage
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2173.codfw.wmnet with OS bullseye
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:24 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: cirrus: Disable commonswiki writes to cloudelastic (T309648) (duration: 03m 23s)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2174.codfw.wmnet with OS bullseye
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 66c9730: QuickSurveys: Increase coverage of research-incentive survey (T311015) (duration: 03m 28s)
  • 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2173.codfw.wmnet with reason: host reimage
  • 20:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2171.codfw.wmnet with OS bullseye
  • 20:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2173.codfw.wmnet with reason: host reimage
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b1c2171: GrowthExperiments: End mailing list campaign on eswiki (T307985) (duration: 03m 39s)
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 20:00 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 19:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2171.codfw.wmnet with OS bullseye
  • 19:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2173.codfw.wmnet with OS bullseye
  • 19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2172.codfw.wmnet with OS bullseye
  • 19:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2172.codfw.wmnet with reason: host reimage
  • 19:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2172.codfw.wmnet with reason: host reimage
  • 18:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2172.codfw.wmnet with OS bullseye
  • 18:53 papaul: power down moss-be2002 for NVMe installation
  • 18:52 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab2001.wikimedia.org
  • 18:52 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db2171.codfw.wmnet with OS bullseye
  • 18:44 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:40 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab2001.wikimedia.org
  • 18:39 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab2001.codfw.wmnet
  • 18:39 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2170.codfw.wmnet with OS bullseye
  • 18:36 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:32 papaul: power down moss-be2001 for NVMe installation
  • 18:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 18:32 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab2001.codfw.wmnet
  • 18:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 18:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2170.codfw.wmnet with reason: host reimage
  • 18:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2170.codfw.wmnet with reason: host reimage
  • 18:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2171.codfw.wmnet with OS bullseye
  • 18:01 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2174
  • 18:01 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2174
  • 18:00 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2173
  • 18:00 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2173
  • 17:59 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2172
  • 17:59 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2172
  • 17:57 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2171
  • 17:57 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2171
  • 17:57 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2170
  • 17:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2170
  • 17:54 mutante: disabling puppet on gitlab* - debugging gerrit:811276
  • 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2170.codfw.wmnet with OS bullseye
  • 17:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2170.codfw.wmnet with OS bullseye
  • 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2174.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:33 moritzm: installing haproxy security updates on stretch
  • 17:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2174.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2173.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2172.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2170.codfw.wmnet with OS bullseye
  • 16:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2169.codfw.wmnet with OS bullseye
  • 16:44 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
  • 16:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2169.codfw.wmnet with reason: host reimage
  • 16:34 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 16:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2169.codfw.wmnet with reason: host reimage
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2173.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2172.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2171.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2170.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2169.codfw.wmnet with OS bullseye
  • 16:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2164.codfw.wmnet with OS bullseye
  • 15:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2171.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2164.codfw.wmnet with reason: host reimage
  • 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2170.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2164.codfw.wmnet with reason: host reimage
  • 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2169.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2164.codfw.wmnet with OS bullseye
  • 15:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2164.codfw.wmnet with OS bullseye
  • 15:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2169.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:09 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2169
  • 15:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2169
  • 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:05 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db2169
  • 15:05 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db2169
  • 15:05 moritzm: installing firejail updates on stretch
  • 15:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2164.codfw.wmnet with OS bullseye
  • 15:02 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:00 moritzm: draining ganeti2024 for eventual reimage T311686
  • 14:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2164.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type to DRBD
  • 14:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type to DRBD
  • 14:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:22 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
  • 14:22 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 14:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:02 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2164.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:34 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:33 urbanecm: UTC afternoon B&C window done
  • 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:26 urbanecm@deploy1002: Synchronized w/static.php: 300ef4a: static.php: Update call to deprecated IContextSource::getStats (duration: 03m 41s)
  • 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:15 urbanecm@deploy1002: Synchronized wmf-config/: 1287b96: Drop deprecated feature flags (T310684) (duration: 03m 32s)
  • 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:08 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 891057f: Drop dependent feature flags (T310684) (duration: 03m 37s)
  • 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 12:42 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30861 and previous config saved to /var/cache/conftool/dbconfig/20220705-124101-ladsgroup.json
  • 12:37 btullis@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 12:36 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 12:31 moritzm: draining ganeti2023 for eventual reimage T311686
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'T311106', diff saved to https://phabricator.wikimedia.org/P30859 and previous config saved to /var/cache/conftool/dbconfig/20220705-122941-ladsgroup.json
  • 11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2158 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P30848 and previous config saved to /var/cache/conftool/dbconfig/20220705-110432-marostegui.json
  • 11:01 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:30 _joe_: running benchmarks in codfw for php7.2/7.4 comparison.
  • 10:29 moritzm: sudo gnt-cluster upgrade --to 3.0 for ganeti/codfw T311686
  • 10:05 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.0 - volans@cumin1001
  • 10:04 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.0 - volans@cumin1001
  • 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:00 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.19 refs T308072
  • 09:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:36 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.19 refs T308072 (duration: 34m 21s)
  • 09:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1002
  • 09:33 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest1002
  • 09:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:02 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.19 refs T308072
  • 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0)
  • 08:52 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces
  • 08:43 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:30 moritzm: uploaded 7.4.30-3+0~20220627.69+debian10~1.gbpf2b381+wmf1+buster3 to component/php74 (pulling php-common with the socket helper) T311386
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30835 and previous config saved to /var/cache/conftool/dbconfig/20220705-082415-root.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P30834 and previous config saved to /var/cache/conftool/dbconfig/20220705-082058-root.json
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30833 and previous config saved to /var/cache/conftool/dbconfig/20220705-080911-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P30832 and previous config saved to /var/cache/conftool/dbconfig/20220705-080554-root.json
  • 07:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 89aef54: MentorDashboard: enable the Vue version of the dashboard in beta (T300532) (duration: 03m 18s)
  • 07:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30831 and previous config saved to /var/cache/conftool/dbconfig/20220705-075408-root.json
  • 07:54 urbanecm@deploy1002: Synchronized logos/config.yaml: c8c092a: trwiki: Change old and new vector logos for 500k articles (T311946; 3/3) (duration: 03m 34s)
  • 07:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P30830 and previous config saved to /var/cache/conftool/dbconfig/20220705-075050-root.json
  • 07:50 urbanecm@deploy1002: Synchronized wmf-config/: c8c092a: trwiki: Change old and new vector logos for 500k articles (T311946; 2/3) (duration: 03m 36s)
  • 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:46 urbanecm@deploy1002: Synchronized static/: c8c092a: trwiki: Change old and new vector logos for 500k articles (T311946; 1/3) (duration: 03m 17s)
  • 07:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30829 and previous config saved to /var/cache/conftool/dbconfig/20220705-073904-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P30828 and previous config saved to /var/cache/conftool/dbconfig/20220705-073546-root.json
  • 07:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: ce64780: SuggestedEdits: Adjust thumbnailSource logic (T311789) (duration: 03m 32s)
  • 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30827 and previous config saved to /var/cache/conftool/dbconfig/20220705-072400-root.json
  • 07:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:21 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/ImageSuggestions/maintenance/SendNotificationsForUnillustratedWatchedTitles.php: d5050b7: Retrieve pages-with-suggestion via Elastic scroll directly (T311476) (duration: 03m 32s)
  • 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P30826 and previous config saved to /var/cache/conftool/dbconfig/20220705-072043-root.json
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:17 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/CentralNotice/includes/specials/CentralNotice.php: 414b7b8: Only add tabs to special pages (T311944) (duration: 03m 30s)
  • 07:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 14df0e2: zh(wikiversity|wiktionary): Disable local upload (T312012) (duration: 03m 47s)
  • 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30824 and previous config saved to /var/cache/conftool/dbconfig/20220705-070856-root.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P30823 and previous config saved to /var/cache/conftool/dbconfig/20220705-070539-root.json
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: codfw s3 sanitarium master switch
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 8 hosts with reason: codfw s3 sanitarium master switch
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Decommission db2073 T311837', diff saved to https://phabricator.wikimedia.org/P30822 and previous config saved to /var/cache/conftool/dbconfig/20220705-070019-marostegui.json
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2073.codfw.wmnet
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30821 and previous config saved to /var/cache/conftool/dbconfig/20220705-065352-root.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P30820 and previous config saved to /var/cache/conftool/dbconfig/20220705-065035-root.json
  • 06:50 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2073.codfw.wmnet
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30819 and previous config saved to /var/cache/conftool/dbconfig/20220705-063848-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P30818 and previous config saved to /var/cache/conftool/dbconfig/20220705-063531-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30817 and previous config saved to /var/cache/conftool/dbconfig/20220705-063402-root.json
  • 06:09 marostegui: dbmaint s6@eqiad T298557
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 T311522', diff saved to https://phabricator.wikimedia.org/P30816 and previous config saved to /var/cache/conftool/dbconfig/20220705-060526-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T311522', diff saved to https://phabricator.wikimedia.org/P30814 and previous config saved to /var/cache/conftool/dbconfig/20220705-060111-marostegui.json
  • 06:00 marostegui: Starting s6 eqiad failover from db1131 to db1173 - T311522
  • 05:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 05:58 TimStarling: deploying multi-DC support g 801621, manual puppet run on cp1080
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1173 with weight 0 T311522', diff saved to https://phabricator.wikimedia.org/P30813 and previous config saved to /var/cache/conftool/dbconfig/20220705-052219-marostegui.json
  • 05:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s6 T311522
  • 05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s6 T311522
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-07-04

  • 20:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1004.wikimedia.org
  • 19:53 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
  • 19:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2004-dev.wikimedia.org
  • 19:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1005.wikimedia.org
  • 19:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 8 hosts with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 8 hosts with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30811 and previous config saved to /var/cache/conftool/dbconfig/20220704-192955-ladsgroup.json
  • 19:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2003-dev.wikimedia.org
  • 19:27 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2004-dev.wikimedia.org
  • 19:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1004.wikimedia.org
  • 19:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
  • 19:17 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2003-dev.wikimedia.org
  • 19:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
  • 19:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol2001-dev.wikimedia.org
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P30810 and previous config saved to /var/cache/conftool/dbconfig/20220704-191450-ladsgroup.json
  • 19:07 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1003.wikimedia.org
  • 19:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.wikimedia.org
  • 19:01 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2001-dev.wikimedia.org
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P30809 and previous config saved to /var/cache/conftool/dbconfig/20220704-185945-ladsgroup.json
  • 18:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1004.wikimedia.org
  • 18:53 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.wikimedia.org
  • 18:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2004-dev.wikimedia.org
  • 18:52 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
  • 18:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1003.wikimedia.org
  • 18:51 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1003.wikimedia.org
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30808 and previous config saved to /var/cache/conftool/dbconfig/20220704-184440-ladsgroup.json
  • 18:43 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.wikimedia.org
  • 18:43 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1003.wikimedia.org
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30807 and previous config saved to /var/cache/conftool/dbconfig/20220704-184231-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312027)', diff saved to https://phabricator.wikimedia.org/P30806 and previous config saved to /var/cache/conftool/dbconfig/20220704-184211-ladsgroup.json
  • 18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P30805 and previous config saved to /var/cache/conftool/dbconfig/20220704-182706-ladsgroup.json
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P30804 and previous config saved to /var/cache/conftool/dbconfig/20220704-181200-ladsgroup.json
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312027)', diff saved to https://phabricator.wikimedia.org/P30803 and previous config saved to /var/cache/conftool/dbconfig/20220704-175655-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T312027)', diff saved to https://phabricator.wikimedia.org/P30802 and previous config saved to /var/cache/conftool/dbconfig/20220704-175446-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30801 and previous config saved to /var/cache/conftool/dbconfig/20220704-175425-ladsgroup.json
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P30800 and previous config saved to /var/cache/conftool/dbconfig/20220704-173920-ladsgroup.json
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P30799 and previous config saved to /var/cache/conftool/dbconfig/20220704-172415-ladsgroup.json
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30798 and previous config saved to /var/cache/conftool/dbconfig/20220704-170910-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30797 and previous config saved to /var/cache/conftool/dbconfig/20220704-170800-ladsgroup.json
  • 17:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 17:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312027)', diff saved to https://phabricator.wikimedia.org/P30796 and previous config saved to /var/cache/conftool/dbconfig/20220704-170740-ladsgroup.json
  • 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P30795 and previous config saved to /var/cache/conftool/dbconfig/20220704-165235-ladsgroup.json
  • 16:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P30793 and previous config saved to /var/cache/conftool/dbconfig/20220704-163730-ladsgroup.json
  • 16:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312027)', diff saved to https://phabricator.wikimedia.org/P30792 and previous config saved to /var/cache/conftool/dbconfig/20220704-162225-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T312027)', diff saved to https://phabricator.wikimedia.org/P30791 and previous config saved to /var/cache/conftool/dbconfig/20220704-162015-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30790 and previous config saved to /var/cache/conftool/dbconfig/20220704-161944-ladsgroup.json
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P30789 and previous config saved to /var/cache/conftool/dbconfig/20220704-161817-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P30788 and previous config saved to /var/cache/conftool/dbconfig/20220704-160439-ladsgroup.json
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P30787 and previous config saved to /var/cache/conftool/dbconfig/20220704-160314-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P30786 and previous config saved to /var/cache/conftool/dbconfig/20220704-154933-ladsgroup.json
  • 15:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P30785 and previous config saved to /var/cache/conftool/dbconfig/20220704-154810-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30784 and previous config saved to /var/cache/conftool/dbconfig/20220704-153428-ladsgroup.json
  • 15:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P30783 and previous config saved to /var/cache/conftool/dbconfig/20220704-153306-ladsgroup.json
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30782 and previous config saved to /var/cache/conftool/dbconfig/20220704-153218-ladsgroup.json
  • 15:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T305300)', diff saved to https://phabricator.wikimedia.org/P30781 and previous config saved to /var/cache/conftool/dbconfig/20220704-152931-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:35 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
  • 14:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
  • 14:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Exempt WMCS ranges from globalblocking everywhere (T307648) (duration: 03m 26s)
  • 14:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
  • 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
  • 14:20 oblivian@deploy1002: Synchronized README: testing new php restart script (duration: 03m 23s)
  • 14:19 elukey: roll restart of thanos-fe's proxy to pick up a new account - T311628
  • 14:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
  • 14:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
  • 14:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
  • 14:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
  • 14:10 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set GlobalBlockingAllowedRanges for testwiki (T307648) (duration: 03m 39s)
  • 14:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:05 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
  • 14:05 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 13:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
  • 13:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
  • 13:27 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
  • 13:25 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 13:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
  • 13:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
  • 13:11 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
  • 13:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
  • 12:38 jynus: running alter table on dbbackups db T283017
  • 12:27 _joe_: updated etcdmirror to 0.0.8 everywhere
  • 12:17 moritzm: installing 4.9.320 on stretch hosts
  • 11:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:55 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GlobalBlocking/includes/GlobalBlocking.php: Backport: Add statsd metric collection on db calls (T307648) (duration: 03m 26s)
  • 11:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:50 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/addimage/AddImageArticleTarget.js: Backport: AddImageArticleTarget: Update to new mediaClass/mediaTag format (T311916) (duration: 03m 33s)
  • 11:36 marostegui@cumin2002: dbctl commit (dc=all): 'Add db2156 to s3 T311493', diff saved to https://phabricator.wikimedia.org/P30774 and previous config saved to /var/cache/conftool/dbconfig/20220704-113640-marostegui.json
  • 11:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:54 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.18/includes: Backport: Revert "Revert "RecentChange: Straight join to actor table when needed"" (T311360) (duration: 03m 49s)
  • 10:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:25 _joe_: rollback etcdmirror to 0.0.6 on conf2005
  • 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:25 godog: silence etcd p a g e
  • 10:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:21 _joe_: restarting etcdmirror on conf2005
  • 10:21 moritzm: installing gnupg2 security updates
  • 10:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:17 _joe_: upgraded etcdmirror to 0.0.7 on conf2006, now going with the rest of codfw
  • 10:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:24 marostegui@cumin2002: dbctl commit (dc=all): 'Add db2157 to s5 T311493', diff saved to https://phabricator.wikimedia.org/P30758 and previous config saved to /var/cache/conftool/dbconfig/20220704-082406-marostegui.json
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MewOphaswongse out of all services on: 634 hosts
  • 08:07 jmm@cumin2002: START - Cookbook sre.idm.logout Logging MewOphaswongse out of all services on: 634 hosts
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MewOphaswongse out of all services on: 1299 hosts
  • 08:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging MewOphaswongse out of all services on: 1299 hosts
  • 08:04 elukey: kill leftover processes of user `mewoph` on stat100x to allow puppet runs
  • 07:39 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin1001.eqiad.wmnet
  • 07:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
  • 06:49 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2092.codfw.wmnet
  • 06:47 marostegui@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:43 marostegui@cumin2002: START - Cookbook sre.dns.netbox
  • 06:39 marostegui@cumin2002: START - Cookbook sre.hosts.decommission for hosts db2092.codfw.wmnet
  • 06:34 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2091.codfw.wmnet
  • 06:32 marostegui@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:28 marostegui@cumin2002: START - Cookbook sre.dns.netbox
  • 06:24 marostegui@cumin2002: START - Cookbook sre.hosts.decommission for hosts db2091.codfw.wmnet
  • 05:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: codfw s4 sanitarium master switch
  • 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: codfw s4 sanitarium master switch

2022-07-03

  • 11:36 _joe_: temporarily raised replicas for shellbox to 24
  • 11:35 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 11:35 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply

2022-07-02

  • 05:36 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 05:36 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 05:24 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 05:23 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 05:21 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 05:20 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 05:11 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 05:11 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 04:49 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 04:49 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 04:48 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 04:48 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 03:59 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 03:59 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 03:57 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 03:57 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 03:56 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 03:56 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 02:49 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 02:49 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 01:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 01:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 00:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 00:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance

2022-07-01

  • 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T309311)', diff saved to https://phabricator.wikimedia.org/P30753 and previous config saved to /var/cache/conftool/dbconfig/20220701-235524-ladsgroup.json
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P30752 and previous config saved to /var/cache/conftool/dbconfig/20220701-234019-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P30751 and previous config saved to /var/cache/conftool/dbconfig/20220701-232514-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T309311)', diff saved to https://phabricator.wikimedia.org/P30750 and previous config saved to /var/cache/conftool/dbconfig/20220701-231009-ladsgroup.json
  • 23:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1012.eqiad.wmnet with reason: host reimage
  • 22:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1012.eqiad.wmnet with reason: host reimage
  • 22:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1015.eqiad.wmnet with OS bullseye
  • 22:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1015.eqiad.wmnet with reason: host reimage
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T309311)', diff saved to https://phabricator.wikimedia.org/P30749 and previous config saved to /var/cache/conftool/dbconfig/20220701-221438-ladsgroup.json
  • 22:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 22:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1015.eqiad.wmnet with reason: host reimage
  • 22:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T309311)', diff saved to https://phabricator.wikimedia.org/P30748 and previous config saved to /var/cache/conftool/dbconfig/20220701-221418-ladsgroup.json
  • 22:12 mutante: restbase2018 - attempting power cycle via mgmt - /admin1-> racadm serveraction powercycle (T311890)
  • 22:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1014.eqiad.wmnet with OS bullseye
  • 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1013.eqiad.wmnet with OS bullseye
  • 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1008.eqiad.wmnet with OS bullseye
  • 22:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1010.eqiad.wmnet with OS bullseye
  • 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1015.eqiad.wmnet with OS bullseye
  • 21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P30747 and previous config saved to /var/cache/conftool/dbconfig/20220701-215913-ladsgroup.json
  • 21:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1009.eqiad.wmnet with OS bullseye
  • 21:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1011.eqiad.wmnet with OS bullseye
  • 21:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1007.eqiad.wmnet with OS bullseye
  • 21:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1015.eqiad.wmnet with OS bullseye
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1012.eqiad.wmnet with OS bullseye
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1014.eqiad.wmnet with reason: host reimage
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1010.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1008.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1013.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1008.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1013.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1010.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1014.eqiad.wmnet with reason: host reimage
  • 21:48 mutante: https://doc.wikimedia.org switched to doc1002 backend on buster T247653
  • 21:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host stat1009.eqiad.wmnet with OS bullseye
  • 21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P30746 and previous config saved to /var/cache/conftool/dbconfig/20220701-214408-ladsgroup.json
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1015.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1010.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1011.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1008.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1013.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1009.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1012.eqiad.wmnet with OS bullseye
  • 21:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1014.eqiad.wmnet with OS bullseye
  • 21:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1006.eqiad.wmnet with OS bullseye
  • 21:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stat1009.eqiad.wmnet with reason: host reimage
  • 21:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stat1009.eqiad.wmnet with reason: host reimage
  • 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T309311)', diff saved to https://phabricator.wikimedia.org/P30745 and previous config saved to /var/cache/conftool/dbconfig/20220701-212903-ladsgroup.json
  • 21:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1006.eqiad.wmnet with reason: host reimage
  • 21:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host stat1009.eqiad.wmnet with OS bullseye
  • 21:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1006.eqiad.wmnet with reason: host reimage
  • 21:09 mutante: https://doc.wikimedia.org - scheduled maintenance period - switching to buster backend doc1002 (T247653)
  • 21:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
  • 20:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T309311)', diff saved to https://phabricator.wikimedia.org/P30744 and previous config saved to /var/cache/conftool/dbconfig/20220701-203251-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30743 and previous config saved to /var/cache/conftool/dbconfig/20220701-203231-ladsgroup.json
  • 20:29 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P30742 and previous config saved to /var/cache/conftool/dbconfig/20220701-201726-ladsgroup.json
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P30741 and previous config saved to /var/cache/conftool/dbconfig/20220701-200221-ladsgroup.json
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30740 and previous config saved to /var/cache/conftool/dbconfig/20220701-194716-ladsgroup.json
  • 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30739 and previous config saved to /var/cache/conftool/dbconfig/20220701-183504-ladsgroup.json
  • 18:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30738 and previous config saved to /var/cache/conftool/dbconfig/20220701-183444-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P30737 and previous config saved to /var/cache/conftool/dbconfig/20220701-181939-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P30736 and previous config saved to /var/cache/conftool/dbconfig/20220701-180434-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30735 and previous config saved to /var/cache/conftool/dbconfig/20220701-174929-ladsgroup.json
  • 17:47 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 17:47 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30734 and previous config saved to /var/cache/conftool/dbconfig/20220701-165407-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T309311)', diff saved to https://phabricator.wikimedia.org/P30733 and previous config saved to /var/cache/conftool/dbconfig/20220701-165347-ladsgroup.json
  • 16:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P30732 and previous config saved to /var/cache/conftool/dbconfig/20220701-163842-ladsgroup.json
  • 16:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS bullseye
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P30731 and previous config saved to /var/cache/conftool/dbconfig/20220701-162337-ladsgroup.json
  • 16:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage
  • 16:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T309311)', diff saved to https://phabricator.wikimedia.org/P30730 and previous config saved to /var/cache/conftool/dbconfig/20220701-160831-ladsgroup.json
  • 15:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS bullseye
  • 15:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2167.codfw.wmnet with OS bullseye
  • 15:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2166.codfw.wmnet with OS bullseye
  • 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2167.codfw.wmnet with reason: host reimage
  • 15:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2167.codfw.wmnet with reason: host reimage
  • 15:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2166.codfw.wmnet with reason: host reimage
  • 15:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 15:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 15:01 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudstore[1008-1009]
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T309311)', diff saved to https://phabricator.wikimedia.org/P30729 and previous config saved to /var/cache/conftool/dbconfig/20220701-145937-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 14:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2166.codfw.wmnet with reason: host reimage
  • 14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 14:55 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:48 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2167.codfw.wmnet with OS bullseye
  • 14:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2166.codfw.wmnet with OS bullseye
  • 14:39 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudstore[1008-1009]
  • 14:05 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 14:04 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T309311)', diff saved to https://phabricator.wikimedia.org/P30728 and previous config saved to /var/cache/conftool/dbconfig/20220701-135831-ladsgroup.json
  • 13:50 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:50 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:43 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P30727 and previous config saved to /var/cache/conftool/dbconfig/20220701-134326-ladsgroup.json
  • 13:43 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:36 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:36 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P30726 and previous config saved to /var/cache/conftool/dbconfig/20220701-132821-ladsgroup.json
  • 13:23 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 13:23 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:19 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:19 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T309311)', diff saved to https://phabricator.wikimedia.org/P30725 and previous config saved to /var/cache/conftool/dbconfig/20220701-131316-ladsgroup.json
  • 13:12 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:12 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:08 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:08 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2155 to s4 T311493', diff saved to https://phabricator.wikimedia.org/P30724 and previous config saved to /var/cache/conftool/dbconfig/20220701-130106-marostegui.json
  • 12:38 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 12:38 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 12:37 moritzm: uploaded rsyslog 8.2102.0-2+deb11u1+wmf2 to component/rsyslog-k8s (backport of latest security fixes on top of the rsyslog with mmkubernetes plugin)
  • 12:09 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 12:09 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T309311)', diff saved to https://phabricator.wikimedia.org/P30723 and previous config saved to /var/cache/conftool/dbconfig/20220701-120657-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T309311)', diff saved to https://phabricator.wikimedia.org/P30722 and previous config saved to /var/cache/conftool/dbconfig/20220701-120636-ladsgroup.json
  • 12:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 12:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T309311)', diff saved to https://phabricator.wikimedia.org/P30721 and previous config saved to /var/cache/conftool/dbconfig/20220701-115414-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P30720 and previous config saved to /var/cache/conftool/dbconfig/20220701-115131-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P30719 and previous config saved to /var/cache/conftool/dbconfig/20220701-113909-ladsgroup.json
  • 11:38 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 11:38 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P30718 and previous config saved to /var/cache/conftool/dbconfig/20220701-113626-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P30717 and previous config saved to /var/cache/conftool/dbconfig/20220701-112404-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T309311)', diff saved to https://phabricator.wikimedia.org/P30716 and previous config saved to /var/cache/conftool/dbconfig/20220701-112121-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T309311)', diff saved to https://phabricator.wikimedia.org/P30715 and previous config saved to /var/cache/conftool/dbconfig/20220701-110859-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T309311)', diff saved to https://phabricator.wikimedia.org/P30714 and previous config saved to /var/cache/conftool/dbconfig/20220701-110204-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T309311)', diff saved to https://phabricator.wikimedia.org/P30713 and previous config saved to /var/cache/conftool/dbconfig/20220701-110117-ladsgroup.json
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P30712 and previous config saved to /var/cache/conftool/dbconfig/20220701-104612-ladsgroup.json
  • 10:45 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 10:45 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 10:44 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 10:44 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P30711 and previous config saved to /var/cache/conftool/dbconfig/20220701-103107-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T309311)', diff saved to https://phabricator.wikimedia.org/P30710 and previous config saved to /var/cache/conftool/dbconfig/20220701-102810-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T309311)', diff saved to https://phabricator.wikimedia.org/P30709 and previous config saved to /var/cache/conftool/dbconfig/20220701-101602-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T309311)', diff saved to https://phabricator.wikimedia.org/P30708 and previous config saved to /var/cache/conftool/dbconfig/20220701-094927-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 13 hosts with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 13 hosts with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:35 marostegui: Stop mysql on db2073 for cloning db2155
  • 07:47 mmandere: kubemaster2001, restart rsyslog
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2154 to s8 T311493', diff saved to https://phabricator.wikimedia.org/P30705 and previous config saved to /var/cache/conftool/dbconfig/20220701-074607-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2153 to s1 T311493', diff saved to https://phabricator.wikimedia.org/P30704 and previous config saved to /var/cache/conftool/dbconfig/20220701-073512-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2091 from dbctl T311803', diff saved to https://phabricator.wikimedia.org/P30703 and previous config saved to /var/cache/conftool/dbconfig/20220701-060000-marostegui.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2092 from dbctl T311802', diff saved to https://phabricator.wikimedia.org/P30701 and previous config saved to /var/cache/conftool/dbconfig/20220701-054102-marostegui.json
  • 02:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2165.codfw.wmnet with OS bullseye
  • 02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2165.codfw.wmnet with reason: host reimage
  • 02:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2165.codfw.wmnet with reason: host reimage
  • 02:06 krinkle@deploy1002: Synchronized wmf-config/: I60edfb0f60 (3/3) (duration: 03m 31s)
  • 02:01 krinkle@deploy1002: Synchronized multiversion/: I60edfb0f60 (2/3) (duration: 03m 34s)
  • 01:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2165.codfw.wmnet with OS bullseye
  • 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2163.codfw.wmnet with OS bullseye
  • 01:39 krinkle@deploy1002: Synchronized tests/: I60edfb0f60 (1/3) (duration: 03m 32s)
  • 01:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2163.codfw.wmnet with reason: host reimage
  • 01:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2163.codfw.wmnet with reason: host reimage
  • 01:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:30 krinkle@deploy1002: Synchronized src/: I796f38 (3/3) (duration: 03m 24s)
  • 01:26 krinkle@deploy1002: Synchronized multiversion/: I796f38 (2/3) (duration: 03m 32s)
  • 01:23 krinkle@deploy1002: Synchronized tests/: I796f38 (1/3) (duration: 03m 41s)
  • 01:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2162.codfw.wmnet with OS bullseye
  • 01:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2163.codfw.wmnet with OS bullseye
  • 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2162.codfw.wmnet with reason: host reimage
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2161.codfw.wmnet with OS bullseye
  • 00:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2162.codfw.wmnet with reason: host reimage
  • 00:53 ejegg: updated payments-wiki from ef53c82e to 78dee85e
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2168.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2167.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2161.codfw.wmnet with reason: host reimage
  • 00:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2161.codfw.wmnet with reason: host reimage
  • 00:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2162.codfw.wmnet with OS bullseye
  • 00:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2168.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2167.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2166.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2165.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2161.codfw.wmnet with OS bullseye
  • 00:05 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2166.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2163.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2165.mgmt.codfw.wmnet with reboot policy FORCED

Other archives

2000s

2010s

2020s