Jump to content

Server Admin Log/Archive 48

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2022-01-31

  • 23:50 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: sync on production
  • 23:50 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply on staging
  • 23:50 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply on production
  • 23:49 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: sync on production
  • 23:49 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply on staging
  • 23:49 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply on production
  • 23:44 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: sync on staging
  • 23:44 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply on production
  • 23:44 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply on staging
  • 23:31 inflatador: [WCQS Deploy] Restarted `wcqs-updater` across all hosts: `sudo cumin -b 6 'wcqs*' 'sudo systemctl restart wcqs-updater'`
  • 23:29 bking@deploy1002: Finished deploy [wdqs/wdqs@f0287fb] (wcqs): Deploy 0.3.101 to WCQS (duration: 02m 39s)
  • 23:28 inflatador: [WCQS Deploy] Tests look good following deploy of `0.3.101` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
  • 23:26 bking@deploy1002: Started deploy [wdqs/wdqs@f0287fb] (wcqs): Deploy 0.3.101 to WCQS
  • 23:17 inflatador: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 23:16 inflatador: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 23:16 inflatador: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 23:12 bking@deploy1002: Finished deploy [wdqs/wdqs@f0287fb]: 0.3.101 (duration: 08m 18s)
  • 23:06 inflatador: [WDQS Deploy] Tests passing following deploy of 0.3.101 on canary `wdqs1003`; proceeding to rest of fleet
  • 23:04 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet2004-dev.codfw.wmnet with OS bullseye
  • 23:03 bking@deploy1002: Started deploy [wdqs/wdqs@f0287fb]: 0.3.101
  • 22:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 22:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 22:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 22:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 22:38 urbanecm: Deploy security patch for T298312
  • 22:12 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2004-dev.codfw.wmnet with OS bullseye
  • 21:15 mutante: installed bullseye on new VM etherpad1003, signing puppet certs for etherpad1003.eqiad.wmnet - puppet error expected until we add the role (T300568)
  • 21:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:54 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:54 ejegg: updated payments-wiki from a4b21e52 to 933e8669
  • 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host etherpad1003.eqiad.wmnet
  • 20:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host etherpad1003.eqiad.wmnet
  • 20:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-staging2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-staging2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-staging2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:12 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-staging2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298559)', diff saved to https://phabricator.wikimedia.org/P19707 and previous config saved to /var/cache/conftool/dbconfig/20220131-201118-marostegui.json
  • 20:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-staging2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-staging2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-staging2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-staging2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-staging2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-staging2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P19706 and previous config saved to /var/cache/conftool/dbconfig/20220131-195614-marostegui.json
  • 19:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-staging2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P19705 and previous config saved to /var/cache/conftool/dbconfig/20220131-194109-marostegui.json
  • 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:27 cjming: end of UTC evening backport & config window
  • 19:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298559)', diff saved to https://phabricator.wikimedia.org/P19704 and previous config saved to /var/cache/conftool/dbconfig/20220131-192604-marostegui.json
  • 19:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable A/B test (T297924) (duration: 00m 49s)
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:17 cjming@deploy1002: Synchronized wmf-config/config: Config: Update config for idwiki: (T299676) (duration: 00m 50s)
  • 19:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T298559)', diff saved to https://phabricator.wikimedia.org/P19703 and previous config saved to /var/cache/conftool/dbconfig/20220131-191356-marostegui.json
  • 19:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 19:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 19:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298559)', diff saved to https://phabricator.wikimedia.org/P19702 and previous config saved to /var/cache/conftool/dbconfig/20220131-191348-marostegui.json
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 411af37: [wmf-config]: Undeploy gdi survey from cawiki in production (T300544) (duration: 00m 50s)
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a659cb0: Revert "commonswiki: Add leg.journals.isu.ac.ir to the wgCopyUploadsDomains allowlist" (T300217) (duration: 00m 50s)
  • 18:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P19701 and previous config saved to /var/cache/conftool/dbconfig/20220131-185843-marostegui.json
  • 18:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P19700 and previous config saved to /var/cache/conftool/dbconfig/20220131-184339-marostegui.json
  • 18:41 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudbackup1004.eqiad.wmnet with OS buster
  • 18:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1004.eqiad.wmnet with OS buster
  • 18:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1003.eqiad.wmnet with OS buster
  • 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298558)', diff saved to https://phabricator.wikimedia.org/P19699 and previous config saved to /var/cache/conftool/dbconfig/20220131-184006-marostegui.json
  • 18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-staging2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298559)', diff saved to https://phabricator.wikimedia.org/P19698 and previous config saved to /var/cache/conftool/dbconfig/20220131-182834-marostegui.json
  • 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T298559)', diff saved to https://phabricator.wikimedia.org/P19697 and previous config saved to /var/cache/conftool/dbconfig/20220131-182728-marostegui.json
  • 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298559)', diff saved to https://phabricator.wikimedia.org/P19696 and previous config saved to /var/cache/conftool/dbconfig/20220131-182719-marostegui.json
  • 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P19695 and previous config saved to /var/cache/conftool/dbconfig/20220131-182501-marostegui.json
  • 18:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P19694 and previous config saved to /var/cache/conftool/dbconfig/20220131-181215-marostegui.json
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P19693 and previous config saved to /var/cache/conftool/dbconfig/20220131-180956-marostegui.json
  • 18:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1003.eqiad.wmnet with OS buster
  • 18:01 moritzm: installing NSS security updates
  • 18:01 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudbackup1003.eqiad.wmnet with OS buster
  • 17:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P19691 and previous config saved to /var/cache/conftool/dbconfig/20220131-175710-marostegui.json
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298558)', diff saved to https://phabricator.wikimedia.org/P19690 and previous config saved to /var/cache/conftool/dbconfig/20220131-175452-marostegui.json
  • 17:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2017.wmnet
  • 17:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2017.codfw.wmnet with reason: Firmware upgrades
  • 17:54 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2017.codfw.wmnet with reason: Firmware upgrades
  • 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298558)', diff saved to https://phabricator.wikimedia.org/P19689 and previous config saved to /var/cache/conftool/dbconfig/20220131-175333-marostegui.json
  • 17:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298558)', diff saved to https://phabricator.wikimedia.org/P19688 and previous config saved to /var/cache/conftool/dbconfig/20220131-175326-marostegui.json
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T300510)', diff saved to https://phabricator.wikimedia.org/P19687 and previous config saved to /var/cache/conftool/dbconfig/20220131-175304-ladsgroup.json
  • 17:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2148.codfw.wmnet with OS bullseye
  • 17:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298559)', diff saved to https://phabricator.wikimedia.org/P19686 and previous config saved to /var/cache/conftool/dbconfig/20220131-174206-marostegui.json
  • 17:41 sukhe: disable puppet on A:rec-dns for T758063
  • 17:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T298559)', diff saved to https://phabricator.wikimedia.org/P19685 and previous config saved to /var/cache/conftool/dbconfig/20220131-174059-marostegui.json
  • 17:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 17:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298559)', diff saved to https://phabricator.wikimedia.org/P19684 and previous config saved to /var/cache/conftool/dbconfig/20220131-174052-marostegui.json
  • 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P19683 and previous config saved to /var/cache/conftool/dbconfig/20220131-173821-marostegui.json
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P19682 and previous config saved to /var/cache/conftool/dbconfig/20220131-172547-marostegui.json
  • 17:24 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudbackup1004.eqiad.wmnet with OS buster
  • 17:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1004.eqiad.wmnet with OS buster
  • 17:23 urandom: restarting Cassandra, aqs1015-{a,b}, to apply upgrade to 3.11.11 -- T298516
  • 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P19681 and previous config saved to /var/cache/conftool/dbconfig/20220131-172317-marostegui.json
  • 17:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudbackup1004.eqiad.wmnet with OS buster
  • 17:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1004.eqiad.wmnet with OS buster
  • 17:15 urandom: restarting Cassandra, aqs1014-{a,b}, to apply upgrade to 3.11.11 -- T298516
  • 17:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1003.eqiad.wmnet with OS buster
  • 17:11 urandom: restarting Cassandra, aqs1013-{a,b}, to apply upgrade to 3.11.11 -- T298516
  • 17:11 urandom: restarting Cassandra, aqs1012-{a,b}, to apply upgrade to 3.11.11 -- T298516
  • 17:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2148.codfw.wmnet with OS bullseye
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P19680 and previous config saved to /var/cache/conftool/dbconfig/20220131-171036-marostegui.json
  • 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298558)', diff saved to https://phabricator.wikimedia.org/P19679 and previous config saved to /var/cache/conftool/dbconfig/20220131-170812-marostegui.json
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T300510)', diff saved to https://phabricator.wikimedia.org/P19678 and previous config saved to /var/cache/conftool/dbconfig/20220131-170808-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 17:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298558)', diff saved to https://phabricator.wikimedia.org/P19677 and previous config saved to /var/cache/conftool/dbconfig/20220131-170653-marostegui.json
  • 17:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298558)', diff saved to https://phabricator.wikimedia.org/P19676 and previous config saved to /var/cache/conftool/dbconfig/20220131-170646-marostegui.json
  • 17:03 urandom: restarting Cassandra, aqs1012-{a,b}, to apply upgrade to 3.11.11 -- T298516
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298559)', diff saved to https://phabricator.wikimedia.org/P19675 and previous config saved to /var/cache/conftool/dbconfig/20220131-165531-marostegui.json
  • 16:53 urandom: restarting Cassandra, aqs1011-{a,b}, to apply upgrade to 3.11.11 -- T298516
  • 16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P19674 and previous config saved to /var/cache/conftool/dbconfig/20220131-165141-marostegui.json
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T300510)', diff saved to https://phabricator.wikimedia.org/P19673 and previous config saved to /var/cache/conftool/dbconfig/20220131-164550-ladsgroup.json
  • 16:40 mmandere@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host durum6001.drmrs.wmnet
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T298559)', diff saved to https://phabricator.wikimedia.org/P19672 and previous config saved to /var/cache/conftool/dbconfig/20220131-163921-marostegui.json
  • 16:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 16:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298559)', diff saved to https://phabricator.wikimedia.org/P19671 and previous config saved to /var/cache/conftool/dbconfig/20220131-163908-marostegui.json
  • 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P19670 and previous config saved to /var/cache/conftool/dbconfig/20220131-163637-marostegui.json
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2125.codfw.wmnet with OS bullseye
  • 16:25 ariel@deploy1002: Finished deploy [dumps/dumps@8820784]: add dump of siteinfo in format version 2 (duration: 00m 03s)
  • 16:25 ariel@deploy1002: Started deploy [dumps/dumps@8820784]: add dump of siteinfo in format version 2
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P19669 and previous config saved to /var/cache/conftool/dbconfig/20220131-162403-marostegui.json
  • 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298558)', diff saved to https://phabricator.wikimedia.org/P19668 and previous config saved to /var/cache/conftool/dbconfig/20220131-162132-marostegui.json
  • 16:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298558)', diff saved to https://phabricator.wikimedia.org/P19667 and previous config saved to /var/cache/conftool/dbconfig/20220131-162014-marostegui.json
  • 16:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 16:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 16:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298558)', diff saved to https://phabricator.wikimedia.org/P19666 and previous config saved to /var/cache/conftool/dbconfig/20220131-162000-marostegui.json
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P19665 and previous config saved to /var/cache/conftool/dbconfig/20220131-160859-marostegui.json
  • 16:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P19664 and previous config saved to /var/cache/conftool/dbconfig/20220131-160456-marostegui.json
  • 16:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2125.codfw.wmnet with OS bullseye
  • 16:01 XioNoX: Move core routers loopback filter to Capirca
  • 16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T300510)', diff saved to https://phabricator.wikimedia.org/P19663 and previous config saved to /var/cache/conftool/dbconfig/20220131-160054-ladsgroup.json
  • 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 16:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T300510)', diff saved to https://phabricator.wikimedia.org/P19662 and previous config saved to /var/cache/conftool/dbconfig/20220131-155905-ladsgroup.json
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298559)', diff saved to https://phabricator.wikimedia.org/P19661 and previous config saved to /var/cache/conftool/dbconfig/20220131-155353-marostegui.json
  • 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T298559)', diff saved to https://phabricator.wikimedia.org/P19660 and previous config saved to /var/cache/conftool/dbconfig/20220131-155246-marostegui.json
  • 15:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298559)', diff saved to https://phabricator.wikimedia.org/P19659 and previous config saved to /var/cache/conftool/dbconfig/20220131-155239-marostegui.json
  • 15:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P19658 and previous config saved to /var/cache/conftool/dbconfig/20220131-154950-marostegui.json
  • 15:37 jelto@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 04m 34s)
  • 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P19657 and previous config saved to /var/cache/conftool/dbconfig/20220131-153734-marostegui.json
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298558)', diff saved to https://phabricator.wikimedia.org/P19656 and previous config saved to /var/cache/conftool/dbconfig/20220131-153446-marostegui.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298558)', diff saved to https://phabricator.wikimedia.org/P19655 and previous config saved to /var/cache/conftool/dbconfig/20220131-153328-marostegui.json
  • 15:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298558)', diff saved to https://phabricator.wikimedia.org/P19654 and previous config saved to /var/cache/conftool/dbconfig/20220131-153320-marostegui.json
  • 15:33 jelto@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 15:24 hnowlan@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 13s)
  • 15:24 hnowlan@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P19653 and previous config saved to /var/cache/conftool/dbconfig/20220131-152230-marostegui.json
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P19652 and previous config saved to /var/cache/conftool/dbconfig/20220131-151816-marostegui.json
  • 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298559)', diff saved to https://phabricator.wikimedia.org/P19651 and previous config saved to /var/cache/conftool/dbconfig/20220131-150725-marostegui.json
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T298559)', diff saved to https://phabricator.wikimedia.org/P19650 and previous config saved to /var/cache/conftool/dbconfig/20220131-150619-marostegui.json
  • 15:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298559)', diff saved to https://phabricator.wikimedia.org/P19649 and previous config saved to /var/cache/conftool/dbconfig/20220131-150611-marostegui.json
  • 15:05 jelto: update scap to 4.2.2 on A:restbase-canary - T300392
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P19648 and previous config saved to /var/cache/conftool/dbconfig/20220131-150311-marostegui.json
  • 14:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2107.codfw.wmnet with OS bullseye
  • 14:58 jelto: update scap to 4.2.2 on A:mw-canary or A:parsoid-canary or A:mw-jobrunner-canary - T300392
  • 14:53 lucaswerkmeister-wmde@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: sync on production
  • 14:51 lucaswerkmeister-wmde@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply on test
  • 14:51 lucaswerkmeister-wmde@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply on staging
  • 14:51 lucaswerkmeister-wmde@deploy1002: helmfile [eqiad] START helmfile.d/services/termbox: apply on production
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P19647 and previous config saved to /var/cache/conftool/dbconfig/20220131-145107-marostegui.json
  • 14:50 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] DONE helmfile.d/services/termbox: sync on production
  • 14:48 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] DONE helmfile.d/services/termbox: apply on staging
  • 14:48 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] DONE helmfile.d/services/termbox: apply on test
  • 14:48 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] START helmfile.d/services/termbox: apply on production
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298558)', diff saved to https://phabricator.wikimedia.org/P19646 and previous config saved to /var/cache/conftool/dbconfig/20220131-144806-marostegui.json
  • 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298558)', diff saved to https://phabricator.wikimedia.org/P19645 and previous config saved to /var/cache/conftool/dbconfig/20220131-144650-marostegui.json
  • 14:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298558)', diff saved to https://phabricator.wikimedia.org/P19644 and previous config saved to /var/cache/conftool/dbconfig/20220131-144642-marostegui.json
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P19643 and previous config saved to /var/cache/conftool/dbconfig/20220131-143602-marostegui.json
  • 14:34 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] DONE helmfile.d/services/termbox: sync on production
  • 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P19642 and previous config saved to /var/cache/conftool/dbconfig/20220131-143138-marostegui.json
  • 14:28 filippo@deploy1002: Finished deploy [librenms/librenms@f049593]: Add custom patches to librenms 21.4.0 (duration: 00m 10s)
  • 14:28 filippo@deploy1002: Started deploy [librenms/librenms@f049593]: Add custom patches to librenms 21.4.0
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2107.codfw.wmnet with OS bullseye
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 (T300510)', diff saved to https://phabricator.wikimedia.org/P19641 and previous config saved to /var/cache/conftool/dbconfig/20220131-142550-ladsgroup.json
  • 14:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 14:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 14:24 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] DONE helmfile.d/services/termbox: apply on test
  • 14:24 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] DONE helmfile.d/services/termbox: apply on staging
  • 14:24 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] START helmfile.d/services/termbox: apply on production
  • 14:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1027.eqiad.wmnet
  • 14:22 lucaswerkmeister-wmde@deploy1002: helmfile [staging] DONE helmfile.d/services/termbox: sync on staging
  • 14:22 lucaswerkmeister-wmde@deploy1002: helmfile [staging] DONE helmfile.d/services/termbox: sync on test
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298559)', diff saved to https://phabricator.wikimedia.org/P19640 and previous config saved to /var/cache/conftool/dbconfig/20220131-142057-marostegui.json
  • 14:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1027.eqiad.wmnet with OS buster
  • 14:20 lucaswerkmeister-wmde@deploy1002: helmfile [staging] DONE helmfile.d/services/termbox: apply on production
  • 14:20 lucaswerkmeister-wmde@deploy1002: helmfile [staging] START helmfile.d/services/termbox: apply on test
  • 14:20 lucaswerkmeister-wmde@deploy1002: helmfile [staging] START helmfile.d/services/termbox: apply on staging
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T298559)', diff saved to https://phabricator.wikimedia.org/P19639 and previous config saved to /var/cache/conftool/dbconfig/20220131-141951-marostegui.json
  • 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298559)', diff saved to https://phabricator.wikimedia.org/P19638 and previous config saved to /var/cache/conftool/dbconfig/20220131-141943-marostegui.json
  • 14:18 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum6001.drmrs.wmnet
  • 14:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 14:17 moritzm: draining ganeti1008 for eventual reimage
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1015.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 14:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P19637 and previous config saved to /var/cache/conftool/dbconfig/20220131-141633-marostegui.json
  • 14:14 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1015.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
  • 14:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 14:10 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host prometheus2006.codfw.wmnet
  • 14:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 14:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P19636 and previous config saved to /var/cache/conftool/dbconfig/20220131-140439-marostegui.json
  • 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298558)', diff saved to https://phabricator.wikimedia.org/P19635 and previous config saved to /var/cache/conftool/dbconfig/20220131-140127-marostegui.json
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298558)', diff saved to https://phabricator.wikimedia.org/P19634 and previous config saved to /var/cache/conftool/dbconfig/20220131-135610-marostegui.json
  • 13:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 13:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 13:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 13:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 13:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298558)', diff saved to https://phabricator.wikimedia.org/P19633 and previous config saved to /var/cache/conftool/dbconfig/20220131-135525-marostegui.json
  • 13:52 XioNoX: Move sandbox filter to Capirca on all core routers
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P19632 and previous config saved to /var/cache/conftool/dbconfig/20220131-134934-marostegui.json
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P19631 and previous config saved to /var/cache/conftool/dbconfig/20220131-134021-marostegui.json
  • 13:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1154.eqiad.wmnet with OS bullseye
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298559)', diff saved to https://phabricator.wikimedia.org/P19630 and previous config saved to /var/cache/conftool/dbconfig/20220131-133430-marostegui.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T298559)', diff saved to https://phabricator.wikimedia.org/P19629 and previous config saved to /var/cache/conftool/dbconfig/20220131-133323-marostegui.json
  • 13:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298559)', diff saved to https://phabricator.wikimedia.org/P19628 and previous config saved to /var/cache/conftool/dbconfig/20220131-133316-marostegui.json
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P19627 and previous config saved to /var/cache/conftool/dbconfig/20220131-132516-marostegui.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P19626 and previous config saved to /var/cache/conftool/dbconfig/20220131-131811-marostegui.json
  • 13:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1007.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 13:15 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1007.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1007.eqiad.wmnet
  • 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1007.eqiad.wmnet
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298558)', diff saved to https://phabricator.wikimedia.org/P19625 and previous config saved to /var/cache/conftool/dbconfig/20220131-131011-marostegui.json
  • 13:06 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1154.eqiad.wmnet with OS bullseye
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P19624 and previous config saved to /var/cache/conftool/dbconfig/20220131-130306-marostegui.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298559)', diff saved to https://phabricator.wikimedia.org/P19623 and previous config saved to /var/cache/conftool/dbconfig/20220131-124801-marostegui.json
  • 12:47 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1027.eqiad.wmnet with OS buster
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T298559)', diff saved to https://phabricator.wikimedia.org/P19622 and previous config saved to /var/cache/conftool/dbconfig/20220131-124655-marostegui.json
  • 12:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 12:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 12:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1026.eqiad.wmnet
  • 12:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 12:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298559)', diff saved to https://phabricator.wikimedia.org/P19621 and previous config saved to /var/cache/conftool/dbconfig/20220131-124627-marostegui.json
  • 12:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1026.eqiad.wmnet with OS buster
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P19620 and previous config saved to /var/cache/conftool/dbconfig/20220131-123123-marostegui.json
  • 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:20 Lucas_WMDE: UTC morning backport window done
  • 12:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: commonswiki: Add four domains to the wgCopyUploadsDomains allowlist (T300375, T300360, T300359, T300357) (duration: 00m 50s)
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P19619 and previous config saved to /var/cache/conftool/dbconfig/20220131-121618-marostegui.json
  • 12:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298558)', diff saved to https://phabricator.wikimedia.org/P19618 and previous config saved to /var/cache/conftool/dbconfig/20220131-120952-marostegui.json
  • 12:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 12:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 12:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: azwikiquote: Add autopatrolled user group (T300435) (duration: 00m 50s)
  • 12:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:02 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1026.eqiad.wmnet with OS buster
  • 12:02 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1025.eqiad.wmnet
  • 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1025.eqiad.wmnet with OS buster
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298559)', diff saved to https://phabricator.wikimedia.org/P19616 and previous config saved to /var/cache/conftool/dbconfig/20220131-120113-marostegui.json
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T298559)', diff saved to https://phabricator.wikimedia.org/P19615 and previous config saved to /var/cache/conftool/dbconfig/20220131-120007-marostegui.json
  • 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 11:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:21 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1025.eqiad.wmnet with OS buster
  • 11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:19 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1024.eqiad.wmnet
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298559)', diff saved to https://phabricator.wikimedia.org/P19614 and previous config saved to /var/cache/conftool/dbconfig/20220131-111904-marostegui.json
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: repooling', diff saved to https://phabricator.wikimedia.org/P19613 and previous config saved to /var/cache/conftool/dbconfig/20220131-111147-root.json
  • 11:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
  • 11:08 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P19612 and previous config saved to /var/cache/conftool/dbconfig/20220131-110400-marostegui.json
  • 10:58 vgutierrez: pool cp5011 running envoy as TLS terminator - T271421
  • 10:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5011.eqsin.wmnet with OS buster
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: repooling', diff saved to https://phabricator.wikimedia.org/P19611 and previous config saved to /var/cache/conftool/dbconfig/20220131-105643-root.json
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P19610 and previous config saved to /var/cache/conftool/dbconfig/20220131-104855-marostegui.json
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1015.eqiad.wmnet with OS buster
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: repooling', diff saved to https://phabricator.wikimedia.org/P19609 and previous config saved to /var/cache/conftool/dbconfig/20220131-104140-root.json
  • 10:37 mmandere: cp[1087,1089-1090] remove unused libvarnishapi1 T300247
  • 10:36 mmandere: cp[2041-2042] remove unused libvarnishapi1 T300247
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298559)', diff saved to https://phabricator.wikimedia.org/P19608 and previous config saved to /var/cache/conftool/dbconfig/20220131-103350-marostegui.json
  • 10:33 mmandere: cp[3052,3064-3065].esams.wmnet remove unused libvarnishapi1 T300247
  • 10:31 mmandere: cp[4021,4025-4026,4032-4034,4036].ulsfo.wmnet remove unused libvarnishapi1 T300247
  • 10:27 mmandere: cp[5006,5012].eqsin.wmnet remove unused libvarnishapi1 T300247
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: repooling', diff saved to https://phabricator.wikimedia.org/P19607 and previous config saved to /var/cache/conftool/dbconfig/20220131-102636-root.json
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P19606 and previous config saved to /var/cache/conftool/dbconfig/20220131-102457-marostegui.json
  • 10:21 moritzm: installing apache/apache-modsecurity2 security updates
  • 10:15 mmandere: cp[6001-6016].drmrs.wmnet remove unused libvarnishapi1 T300247
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T298559)', diff saved to https://phabricator.wikimedia.org/P19605 and previous config saved to /var/cache/conftool/dbconfig/20220131-101439-marostegui.json
  • 10:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298559)', diff saved to https://phabricator.wikimedia.org/P19604 and previous config saved to /var/cache/conftool/dbconfig/20220131-101431-marostegui.json
  • 10:01 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1015.eqiad.wmnet with OS buster
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P19603 and previous config saved to /var/cache/conftool/dbconfig/20220131-095926-marostegui.json
  • 09:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5011.eqsin.wmnet with OS buster
  • 09:53 mmandere: cp3062: upgrade varnish to 6.0.10-1wm1 T300264
  • 09:53 vgutierrez: depool cp5011 to be reimaged as cache::text_envoy - T271421
  • 09:48 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 09:48 mmandere: cp3061: upgrade varnish to 6.0.10-1wm1 T300264
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1007.eqiad.wmnet with OS buster
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P19602 and previous config saved to /var/cache/conftool/dbconfig/20220131-094422-marostegui.json
  • 09:35 dcausse: restart blazegraph on wdqs1012 (jvm stuck for 6hours)
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19601 and previous config saved to /var/cache/conftool/dbconfig/20220131-093450-root.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298559)', diff saved to https://phabricator.wikimedia.org/P19600 and previous config saved to /var/cache/conftool/dbconfig/20220131-092917-marostegui.json
  • 09:21 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1007.eqiad.wmnet with OS buster
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T298559)', diff saved to https://phabricator.wikimedia.org/P19599 and previous config saved to /var/cache/conftool/dbconfig/20220131-091959-marostegui.json
  • 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298559)', diff saved to https://phabricator.wikimedia.org/P19598 and previous config saved to /var/cache/conftool/dbconfig/20220131-091952-marostegui.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19597 and previous config saved to /var/cache/conftool/dbconfig/20220131-091943-root.json
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P19596 and previous config saved to /var/cache/conftool/dbconfig/20220131-090441-marostegui.json
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19595 and previous config saved to /var/cache/conftool/dbconfig/20220131-090439-root.json
  • 09:03 jayme: published image docker-registry.discovery.wmnet/echoserver:1.10.0-2
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P19594 and previous config saved to /var/cache/conftool/dbconfig/20220131-084937-marostegui.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19593 and previous config saved to /var/cache/conftool/dbconfig/20220131-084936-root.json
  • 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2123.codfw.wmnet with OS bullseye
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 T297189', diff saved to https://phabricator.wikimedia.org/P19592 and previous config saved to /var/cache/conftool/dbconfig/20220131-084157-marostegui.json
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298559)', diff saved to https://phabricator.wikimedia.org/P19591 and previous config saved to /var/cache/conftool/dbconfig/20220131-083432-marostegui.json
  • 08:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 08:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T298559)', diff saved to https://phabricator.wikimedia.org/P19590 and previous config saved to /var/cache/conftool/dbconfig/20220131-082534-marostegui.json
  • 08:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:23 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 08:22 marostegui: Set innodb_adaptive_hash_index=OFF on es2020, es2024 T268869
  • 08:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 08:21 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 08:21 marostegui: Set innodb_adaptive_hash_index=OFF on es2028, es2029, es2026 T268869
  • 08:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2075.codfw.wmnet with OS bullseye
  • 08:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2123.codfw.wmnet with OS bullseye
  • 08:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 08:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 08:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298559)', diff saved to https://phabricator.wikimedia.org/P19589 and previous config saved to /var/cache/conftool/dbconfig/20220131-080803-marostegui.json
  • 08:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2111.codfw.wmnet with OS bullseye
  • 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2113.codfw.wmnet with OS bullseye
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P19588 and previous config saved to /var/cache/conftool/dbconfig/20220131-075258-marostegui.json
  • 07:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1010.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 07:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1010.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 07:39 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2075.codfw.wmnet with OS bullseye
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P19587 and previous config saved to /var/cache/conftool/dbconfig/20220131-073754-marostegui.json
  • 07:33 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2111.codfw.wmnet with OS bullseye
  • 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2128.codfw.wmnet with OS bullseye
  • 07:29 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2113.codfw.wmnet with OS bullseye
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2137.codfw.wmnet with OS bullseye
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298559)', diff saved to https://phabricator.wikimedia.org/P19586 and previous config saved to /var/cache/conftool/dbconfig/20220131-072249-marostegui.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19585 and previous config saved to /var/cache/conftool/dbconfig/20220131-071959-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19584 and previous config saved to /var/cache/conftool/dbconfig/20220131-071948-root.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T298559)', diff saved to https://phabricator.wikimedia.org/P19583 and previous config saved to /var/cache/conftool/dbconfig/20220131-071350-marostegui.json
  • 07:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 07:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19582 and previous config saved to /var/cache/conftool/dbconfig/20220131-070456-root.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19581 and previous config saved to /var/cache/conftool/dbconfig/20220131-070444-root.json
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 06:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298559)', diff saved to https://phabricator.wikimedia.org/P19580 and previous config saved to /var/cache/conftool/dbconfig/20220131-065733-marostegui.json
  • 06:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2128.codfw.wmnet with OS bullseye
  • 06:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2137.codfw.wmnet with OS bullseye
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19579 and previous config saved to /var/cache/conftool/dbconfig/20220131-064952-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19578 and previous config saved to /var/cache/conftool/dbconfig/20220131-064941-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P19577 and previous config saved to /var/cache/conftool/dbconfig/20220131-064228-marostegui.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19576 and previous config saved to /var/cache/conftool/dbconfig/20220131-063448-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19575 and previous config saved to /var/cache/conftool/dbconfig/20220131-063437-root.json
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1113.eqiad.wmnet with OS bullseye
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P19574 and previous config saved to /var/cache/conftool/dbconfig/20220131-062723-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298559)', diff saved to https://phabricator.wikimedia.org/P19573 and previous config saved to /var/cache/conftool/dbconfig/20220131-061219-marostegui.json
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager from s4 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P19572 and previous config saved to /var/cache/conftool/dbconfig/20220131-061121-marostegui.json
  • 06:04 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1113.eqiad.wmnet with OS bullseye
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T298559)', diff saved to https://phabricator.wikimedia.org/P19571 and previous config saved to /var/cache/conftool/dbconfig/20220131-060326-marostegui.json
  • 06:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 (s5,s6) T299479', diff saved to https://phabricator.wikimedia.org/P19570 and previous config saved to /var/cache/conftool/dbconfig/20220131-055947-marostegui.json
  • 05:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 05:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance

2022-01-29

  • 21:08 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices2003-dev.wikimedia.org with OS bullseye
  • 18:04 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices2003-dev.wikimedia.org with OS bullseye
  • 17:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2004-dev.wikimedia.org with OS bullseye
  • 16:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.wikimedia.org with OS bullseye
  • 13:53 hashar: contint1001 and contint2001 : pruning old reflog from Zuul merger git repositories: `sudo -u zuul find /srv/zuul/git -maxdepth 4 -type d -name .git -print -execdir git reflog expire --all --expire=now \;`
  • 05:25 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2003-dev.wikimedia.org with OS bullseye
  • 04:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2003-dev.wikimedia.org with OS bullseye
  • 00:14 ebernhardson: restart elasticsearch_6@production-search-psi-eqiad on elastic1049 to address CirrusSearchJVMGCOldPoolFlatlined alert

2022-01-28

  • 21:52 mutante: purging font packages from mwdebug* and scandium* T294378
  • 21:47 mutante: purging font packages from remaining appservers in codfw mw23* ranges.. T294378
  • 20:14 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 20:10 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 20:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 17:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1024.eqiad.wmnet with OS buster
  • 17:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1023.eqiad.wmnet
  • 17:17 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1024.eqiad.wmnet with OS buster
  • 17:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1023.eqiad.wmnet with OS buster
  • 16:41 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1023.eqiad.wmnet with OS buster
  • 16:41 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1022.eqiad.wmnet
  • 16:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1022.eqiad.wmnet with OS buster
  • 15:50 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 15:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 15:47 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1022.eqiad.wmnet with OS buster
  • 15:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1021.eqiad.wmnet
  • 15:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1021.eqiad.wmnet with OS buster
  • 15:41 vgutierrez: pool cp4031 using envoy as TLS termination layer - T271421
  • 15:14 Amir1: start of cleaning lint errors caused by content model changes (T298343)
  • 14:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1021.eqiad.wmnet with OS buster
  • 14:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1020.eqiad.wmnet
  • 14:47 vgutierrez: update varnish to version 6.0.10-1wm1 on cp4036 - T300264
  • 14:47 Amir1: optimizing dewiki.flaggedtemplates in db2113
  • 14:27 vgutierrez: update varnish to version 6.0.10-1wm1 on cp4034 - T300264
  • 13:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1020.eqiad.wmnet with OS buster
  • 13:01 moritzm: installing uriparser security updates
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19562 and previous config saved to /var/cache/conftool/dbconfig/20220128-123210-root.json
  • 12:30 moritzm: installing libseccomp bugfix updates from bullseye point release
  • 12:28 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1020.eqiad.wmnet with OS buster
  • 12:20 vgutierrez: upload varnish 6.0.10-1wm1 to apt.wm.o (buster component/varnish6) - T300264
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19561 and previous config saved to /var/cache/conftool/dbconfig/20220128-121706-root.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19560 and previous config saved to /var/cache/conftool/dbconfig/20220128-120201-root.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19559 and previous config saved to /var/cache/conftool/dbconfig/20220128-114658-root.json
  • 11:35 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19558 and previous config saved to /var/cache/conftool/dbconfig/20220128-113154-root.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19557 and previous config saved to /var/cache/conftool/dbconfig/20220128-111650-root.json
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19556 and previous config saved to /var/cache/conftool/dbconfig/20220128-110147-root.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19555 and previous config saved to /var/cache/conftool/dbconfig/20220128-104643-root.json
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19554 and previous config saved to /var/cache/conftool/dbconfig/20220128-103140-root.json
  • 10:29 mdipietro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.wikimedia.org with OS bullseye
  • 10:25 moritzm: draining ganeti1010 for eventual reimage
  • 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1019.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 10:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1019.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 10:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1168.eqiad.wmnet with OS bullseye
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19553 and previous config saved to /var/cache/conftool/dbconfig/20220128-101636-root.json
  • 09:50 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1168.eqiad.wmnet with OS bullseye
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 T299479', diff saved to https://phabricator.wikimedia.org/P19552 and previous config saved to /var/cache/conftool/dbconfig/20220128-094636-marostegui.json
  • 09:46 moritzm: installing brltty bugfix updates from bullseye point release
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19551 and previous config saved to /var/cache/conftool/dbconfig/20220128-094430-root.json
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19550 and previous config saved to /var/cache/conftool/dbconfig/20220128-094422-root.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19549 and previous config saved to /var/cache/conftool/dbconfig/20220128-092927-root.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19548 and previous config saved to /var/cache/conftool/dbconfig/20220128-092918-root.json
  • 09:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:23 hashar@deploy1002: Synchronized wmf-config/CommonSettings.php: GrowthExperiments: Disable mobile quality gate - T298122 T300336 (duration: 00m 50s)
  • 09:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:17 godog: pool prometheus2005 and depool prometheus2003 - T296199
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19547 and previous config saved to /var/cache/conftool/dbconfig/20220128-091423-root.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19546 and previous config saved to /var/cache/conftool/dbconfig/20220128-091415-root.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19545 and previous config saved to /var/cache/conftool/dbconfig/20220128-085919-root.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19544 and previous config saved to /var/cache/conftool/dbconfig/20220128-085911-root.json
  • 08:44 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1001.eqiad.wmnet
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19543 and previous config saved to /var/cache/conftool/dbconfig/20220128-084416-root.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19542 and previous config saved to /var/cache/conftool/dbconfig/20220128-084407-root.json
  • 08:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
  • 08:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1002.eqiad.wmnet
  • 08:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19541 and previous config saved to /var/cache/conftool/dbconfig/20220128-082912-root.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19540 and previous config saved to /var/cache/conftool/dbconfig/20220128-082904-root.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19539 and previous config saved to /var/cache/conftool/dbconfig/20220128-081408-root.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19538 and previous config saved to /var/cache/conftool/dbconfig/20220128-081400-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19537 and previous config saved to /var/cache/conftool/dbconfig/20220128-075905-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19536 and previous config saved to /var/cache/conftool/dbconfig/20220128-075856-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19535 and previous config saved to /var/cache/conftool/dbconfig/20220128-074401-root.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19534 and previous config saved to /var/cache/conftool/dbconfig/20220128-074353-root.json
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1096.eqiad.wmnet with OS bullseye
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19533 and previous config saved to /var/cache/conftool/dbconfig/20220128-072858-root.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19532 and previous config saved to /var/cache/conftool/dbconfig/20220128-072849-root.json
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1096.eqiad.wmnet with OS bullseye
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 (s5,s6) T299479', diff saved to https://phabricator.wikimedia.org/P19531 and previous config saved to /var/cache/conftool/dbconfig/20220128-070112-marostegui.json
  • 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2133.codfw.wmnet with OS bullseye
  • 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2133.codfw.wmnet with OS bullseye
  • 04:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1068.eqiad.wmnet with OS stretch
  • 04:34 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1068.eqiad.wmnet with OS stretch
  • 04:33 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host elastic1068.eqiad.wmnet with OS stretch
  • 04:33 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1068.eqiad.wmnet with OS stretch
  • 01:47 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2001-dev.wikimedia.org with OS bullseye

2022-01-27

  • 23:22 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.wikimedia.org with OS bullseye
  • 23:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.wikimedia.org with OS buster
  • 23:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.wikimedia.org with OS buster
  • 23:06 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.wikimedia.org with OS buster
  • 22:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.wikimedia.org with OS buster
  • 21:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298559)', diff saved to https://phabricator.wikimedia.org/P19530 and previous config saved to /var/cache/conftool/dbconfig/20220127-205155-marostegui.json
  • 20:49 cstone: updated civicrm revision changed from 6f1eddce to 0513f1b7
  • 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P19529 and previous config saved to /var/cache/conftool/dbconfig/20220127-203650-marostegui.json
  • 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P19528 and previous config saved to /var/cache/conftool/dbconfig/20220127-202145-marostegui.json
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298559)', diff saved to https://phabricator.wikimedia.org/P19527 and previous config saved to /var/cache/conftool/dbconfig/20220127-200641-marostegui.json
  • 20:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.19 refs T293960
  • 20:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298559)', diff saved to https://phabricator.wikimedia.org/P19526 and previous config saved to /var/cache/conftool/dbconfig/20220127-200535-marostegui.json
  • 20:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 20:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 20:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298559)', diff saved to https://phabricator.wikimedia.org/P19525 and previous config saved to /var/cache/conftool/dbconfig/20220127-200523-marostegui.json
  • 20:03 brennen: train 1.38.0-wmf.19 (T293960): no current blockers; logs clean-ish, rolling train forward to group2
  • 20:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:01 urbanecm@deploy1002: Synchronized phpcs.xml: 1149860: Remove trusted-xff.php from wmf-config (T298243; 3/3) (duration: 00m 50s)
  • 20:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:59 urbanecm@deploy1002: Synchronized wmf-config/: 1149860: Remove trusted-xff.php from wmf-config (T298243; 2/3) (duration: 00m 51s)
  • 19:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:58 urbanecm@deploy1002: Synchronized docroot/noc/: 1149860: Remove trusted-xff.php from wmf-config (T298243; 1/3) (duration: 00m 50s)
  • 19:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:52 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 6fa62c5: Do not set wgTrustedXffFile (T298243) (duration: 00m 51s)
  • 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P19524 and previous config saved to /var/cache/conftool/dbconfig/20220127-195019-marostegui.json
  • 19:43 mutante: purging font packages from parse* (parsoid codfw)
  • 19:42 mutante: purging font packages from wtp* (parsoid eqiad)
  • 19:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2c8561c: Launch DiscussionTools new topic tool a/b test (T291308) (duration: 00m 51s)
  • 19:36 mutante: purging font* / xfont* packages from further eqiad appservers (mw14*) for T294378
  • 19:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P19521 and previous config saved to /var/cache/conftool/dbconfig/20220127-193514-marostegui.json
  • 19:27 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Start add image experiment for desktop users (T298122) (duration: 00m 51s)
  • 19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298559)', diff saved to https://phabricator.wikimedia.org/P19520 and previous config saved to /var/cache/conftool/dbconfig/20220127-192009-marostegui.json
  • 19:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298559)', diff saved to https://phabricator.wikimedia.org/P19519 and previous config saved to /var/cache/conftool/dbconfig/20220127-191902-marostegui.json
  • 19:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 19:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 19:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298559)', diff saved to https://phabricator.wikimedia.org/P19518 and previous config saved to /var/cache/conftool/dbconfig/20220127-191854-marostegui.json
  • 19:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T285149)', diff saved to https://phabricator.wikimedia.org/P19517 and previous config saved to /var/cache/conftool/dbconfig/20220127-191141-marostegui.json
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P19516 and previous config saved to /var/cache/conftool/dbconfig/20220127-190349-marostegui.json
  • 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P19515 and previous config saved to /var/cache/conftool/dbconfig/20220127-185637-marostegui.json
  • 18:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P19514 and previous config saved to /var/cache/conftool/dbconfig/20220127-184845-marostegui.json
  • 18:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:45 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync on production
  • 18:43 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync on canary
  • 18:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:43 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply on canary
  • 18:43 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply on production
  • 18:42 brennen@deploy1002: Synchronized php-1.38.0-wmf.19/extensions/WikibaseMediaInfo: Backport: Revert "Escape various messages in WikibaseMediaInfo" (T299289) (duration: 00m 52s)
  • 18:41 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase102[5-7].eqiad.wmnet
  • 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P19513 and previous config saved to /var/cache/conftool/dbconfig/20220127-184132-marostegui.json
  • 18:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298559)', diff saved to https://phabricator.wikimedia.org/P19512 and previous config saved to /var/cache/conftool/dbconfig/20220127-183340-marostegui.json
  • 18:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298559)', diff saved to https://phabricator.wikimedia.org/P19511 and previous config saved to /var/cache/conftool/dbconfig/20220127-183234-marostegui.json
  • 18:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 18:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 18:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298559)', diff saved to https://phabricator.wikimedia.org/P19510 and previous config saved to /var/cache/conftool/dbconfig/20220127-183226-marostegui.json
  • 18:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T285149)', diff saved to https://phabricator.wikimedia.org/P19509 and previous config saved to /var/cache/conftool/dbconfig/20220127-182627-marostegui.json
  • 18:25 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: sync on canary
  • 18:25 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply on production
  • 18:25 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply on canary
  • 18:25 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply on production
  • 18:24 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase102[5-7].eqiad.wmnet
  • 18:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase[1025-1027].eqiad.wmnet with reason: Firmware upgrade
  • 18:24 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase[1025-1027].eqiad.wmnet with reason: Firmware upgrade
  • 18:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1024.eqiad.wmnet
  • 18:20 mdipietro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.wikimedia.org with OS bullseye
  • 18:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P19508 and previous config saved to /var/cache/conftool/dbconfig/20220127-181722-marostegui.json
  • 18:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T285149)', diff saved to https://phabricator.wikimedia.org/P19507 and previous config saved to /var/cache/conftool/dbconfig/20220127-181656-marostegui.json
  • 18:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 18:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 18:07 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1024.eqiad.wmnet
  • 18:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase1024.eqiad.wmnet with reason: Firmware upgrade
  • 18:07 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase1024.eqiad.wmnet with reason: Firmware upgrade
  • 18:07 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1023.eqiad.wmnet
  • 18:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P19506 and previous config saved to /var/cache/conftool/dbconfig/20220127-180217-marostegui.json
  • 17:53 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1023.eqiad.wmnet
  • 17:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase1023.eqiad.wmnet with reason: Firmware upgrade
  • 17:52 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase1023.eqiad.wmnet with reason: Firmware upgrade
  • 17:52 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1022.eqiad.wmnet
  • 17:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298559)', diff saved to https://phabricator.wikimedia.org/P19505 and previous config saved to /var/cache/conftool/dbconfig/20220127-174712-marostegui.json
  • 17:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298559)', diff saved to https://phabricator.wikimedia.org/P19504 and previous config saved to /var/cache/conftool/dbconfig/20220127-174606-marostegui.json
  • 17:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298559)', diff saved to https://phabricator.wikimedia.org/P19503 and previous config saved to /var/cache/conftool/dbconfig/20220127-174527-marostegui.json
  • 17:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1022.eqiad.wmnet
  • 17:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase1022.eqiad.wmnet with reason: Firmware upgrade
  • 17:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase1022.eqiad.wmnet with reason: Firmware upgrade
  • 17:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1021.eqiad.wmnet
  • 17:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P19502 and previous config saved to /var/cache/conftool/dbconfig/20220127-173022-marostegui.json
  • 17:22 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 17:21 cmjohnson1: updating firmware restbase1021 T299652
  • 17:17 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1021.eqiad.wmnet
  • 17:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on restbase1021.eqiad.wmnet with reason: Firmware upgrade
  • 17:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on restbase1021.eqiad.wmnet with reason: Firmware upgrade
  • 17:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P19501 and previous config saved to /var/cache/conftool/dbconfig/20220127-171518-marostegui.json
  • 17:15 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1020.eqiad.wmnet
  • 17:01 cmjohnson1: updating firmware restbase1020 T299652
  • 17:00 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1020.eqiad.wmnet
  • 17:00 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: sync on canary
  • 17:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298559)', diff saved to https://phabricator.wikimedia.org/P19500 and previous config saved to /var/cache/conftool/dbconfig/20220127-170013-marostegui.json
  • 17:00 cmjohnson1: updating firmware ganeti1007 and ganeti1015 T299527
  • 17:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on restbase1020.eqiad.wmnet with reason: Firmware upgrade
  • 16:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on restbase1020.eqiad.wmnet with reason: Firmware upgrade
  • 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298559)', diff saved to https://phabricator.wikimedia.org/P19499 and previous config saved to /var/cache/conftool/dbconfig/20220127-165907-marostegui.json
  • 16:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298559)', diff saved to https://phabricator.wikimedia.org/P19498 and previous config saved to /var/cache/conftool/dbconfig/20220127-165859-marostegui.json
  • 16:51 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: sync on production
  • 16:50 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply on canary
  • 16:50 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply on production
  • 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P19497 and previous config saved to /var/cache/conftool/dbconfig/20220127-164354-marostegui.json
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P19496 and previous config saved to /var/cache/conftool/dbconfig/20220127-162849-marostegui.json
  • 16:27 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: sync on production
  • 16:27 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply on canary
  • 16:27 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply on production
  • 16:24 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: sync on main
  • 16:24 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 16:23 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply on canary
  • 16:23 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply on main
  • 16:22 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: sync on main
  • 16:21 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply on canary
  • 16:21 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply on main
  • 16:20 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 16:20 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync on main
  • 16:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 16:19 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply on canary
  • 16:19 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply on main
  • 16:15 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync on production
  • 16:15 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync on canary
  • 16:14 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply on production
  • 16:14 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply on canary
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298559)', diff saved to https://phabricator.wikimedia.org/P19495 and previous config saved to /var/cache/conftool/dbconfig/20220127-161344-marostegui.json
  • 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298559)', diff saved to https://phabricator.wikimedia.org/P19494 and previous config saved to /var/cache/conftool/dbconfig/20220127-161239-marostegui.json
  • 16:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298559)', diff saved to https://phabricator.wikimedia.org/P19493 and previous config saved to /var/cache/conftool/dbconfig/20220127-161231-marostegui.json
  • 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T285149)', diff saved to https://phabricator.wikimedia.org/P19492 and previous config saved to /var/cache/conftool/dbconfig/20220127-160749-marostegui.json
  • 16:03 dcausse: restarting blazegraph on wdqs1005 (jvm stuck for 2hours)
  • 16:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 15:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P19491 and previous config saved to /var/cache/conftool/dbconfig/20220127-155726-marostegui.json
  • 15:57 brennen@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.19 refs T293960 (duration: 00m 51s)
  • 15:56 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.19 refs T293960
  • 15:54 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync on canary
  • 15:53 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync on production
  • 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P19490 and previous config saved to /var/cache/conftool/dbconfig/20220127-155244-marostegui.json
  • 15:52 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply on production
  • 15:52 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply on canary
  • 15:52 brennen: train 1.38.0-wmf.19 (T293960): no current blockers; rolling train forward to group1 before log triage meeting
  • 15:45 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync on production
  • 15:45 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync on canary
  • 15:45 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply on canary
  • 15:45 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply on production
  • 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P19489 and previous config saved to /var/cache/conftool/dbconfig/20220127-154222-marostegui.json
  • 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P19488 and previous config saved to /var/cache/conftool/dbconfig/20220127-153739-marostegui.json
  • 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298559)', diff saved to https://phabricator.wikimedia.org/P19487 and previous config saved to /var/cache/conftool/dbconfig/20220127-152717-marostegui.json
  • 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T285149)', diff saved to https://phabricator.wikimedia.org/P19486 and previous config saved to /var/cache/conftool/dbconfig/20220127-152235-marostegui.json
  • 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298559)', diff saved to https://phabricator.wikimedia.org/P19485 and previous config saved to /var/cache/conftool/dbconfig/20220127-151709-marostegui.json
  • 15:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 15:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298559)', diff saved to https://phabricator.wikimedia.org/P19484 and previous config saved to /var/cache/conftool/dbconfig/20220127-151701-marostegui.json
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 (T300006)', diff saved to https://phabricator.wikimedia.org/P19483 and previous config saved to /var/cache/conftool/dbconfig/20220127-151032-ladsgroup.json
  • 15:09 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync on production
  • 15:08 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync on canary
  • 15:07 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply on production
  • 15:07 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply on canary
  • 15:04 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync on production
  • 15:04 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync on canary
  • 15:03 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply on canary
  • 15:03 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply on production
  • 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P19482 and previous config saved to /var/cache/conftool/dbconfig/20220127-150156-marostegui.json
  • 14:59 mmandere@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh6002.wikimedia.org
  • 14:58 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync on production
  • 14:57 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply on canary
  • 14:57 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply on production
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P19481 and previous config saved to /var/cache/conftool/dbconfig/20220127-145527-ladsgroup.json
  • 14:54 ottomata: continuing deployments of eventgate-main and eventgate-analytics to pick up CA cert changes - T296064 (also deploying eventgate-main for a schema repo bump for search)
  • 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P19480 and previous config saved to /var/cache/conftool/dbconfig/20220127-144652-marostegui.json
  • 14:46 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh6002.wikimedia.org
  • 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1028.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P19479 and previous config saved to /var/cache/conftool/dbconfig/20220127-144022-ladsgroup.json
  • 14:39 moritzm: added ganeti1028 to Ganeti eqiad cluster T293909
  • 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298559)', diff saved to https://phabricator.wikimedia.org/P19478 and previous config saved to /var/cache/conftool/dbconfig/20220127-143147-marostegui.json
  • 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298559)', diff saved to https://phabricator.wikimedia.org/P19477 and previous config saved to /var/cache/conftool/dbconfig/20220127-142841-marostegui.json
  • 14:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 14:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 14:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 14:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298559)', diff saved to https://phabricator.wikimedia.org/P19476 and previous config saved to /var/cache/conftool/dbconfig/20220127-142829-marostegui.json
  • 14:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1028.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 (T300006)', diff saved to https://phabricator.wikimedia.org/P19475 and previous config saved to /var/cache/conftool/dbconfig/20220127-142517-ladsgroup.json
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T285149)', diff saved to https://phabricator.wikimedia.org/P19474 and previous config saved to /var/cache/conftool/dbconfig/20220127-142214-marostegui.json
  • 14:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 14:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T285149)', diff saved to https://phabricator.wikimedia.org/P19473 and previous config saved to /var/cache/conftool/dbconfig/20220127-142206-marostegui.json
  • 14:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1023.eqiad.wmnet with OS bullseye
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P19471 and previous config saved to /var/cache/conftool/dbconfig/20220127-141324-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P19470 and previous config saved to /var/cache/conftool/dbconfig/20220127-140702-marostegui.json
  • 14:05 moritzm: installing apache security updates
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P19469 and previous config saved to /var/cache/conftool/dbconfig/20220127-135820-marostegui.json
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
  • 13:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
  • 13:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P19468 and previous config saved to /var/cache/conftool/dbconfig/20220127-135157-marostegui.json
  • 13:46 moritzm: imported elasticsearch-oss/kibana-oss/logstash-oss 6.8.23 to thirdparty/elastic68 for stretch and bullseye
  • 13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es1023.eqiad.wmnet with OS bullseye
  • 13:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
  • 13:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1023.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 13:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298559)', diff saved to https://phabricator.wikimedia.org/P19467 and previous config saved to /var/cache/conftool/dbconfig/20220127-134315-marostegui.json
  • 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298559)', diff saved to https://phabricator.wikimedia.org/P19466 and previous config saved to /var/cache/conftool/dbconfig/20220127-134209-marostegui.json
  • 13:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 13:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 13:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 13:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298559)', diff saved to https://phabricator.wikimedia.org/P19465 and previous config saved to /var/cache/conftool/dbconfig/20220127-134158-marostegui.json
  • 13:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2002.codfw.wmnet
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19464 and previous config saved to /var/cache/conftool/dbconfig/20220127-133715-root.json
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T285149)', diff saved to https://phabricator.wikimedia.org/P19463 and previous config saved to /var/cache/conftool/dbconfig/20220127-133652-marostegui.json
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19462 and previous config saved to /var/cache/conftool/dbconfig/20220127-133631-root.json
  • 13:32 marostegui@cumin1001: START - Cookbook sre.hosts.provision for host es1023.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 13:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P19461 and previous config saved to /var/cache/conftool/dbconfig/20220127-132653-marostegui.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T285149)', diff saved to https://phabricator.wikimedia.org/P19460 and previous config saved to /var/cache/conftool/dbconfig/20220127-132624-marostegui.json
  • 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19459 and previous config saved to /var/cache/conftool/dbconfig/20220127-132212-root.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19458 and previous config saved to /var/cache/conftool/dbconfig/20220127-132128-root.json
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P19457 and previous config saved to /var/cache/conftool/dbconfig/20220127-131148-marostegui.json
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19456 and previous config saved to /var/cache/conftool/dbconfig/20220127-130708-root.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19455 and previous config saved to /var/cache/conftool/dbconfig/20220127-130624-root.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298559)', diff saved to https://phabricator.wikimedia.org/P19454 and previous config saved to /var/cache/conftool/dbconfig/20220127-125644-marostegui.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298559)', diff saved to https://phabricator.wikimedia.org/P19453 and previous config saved to /var/cache/conftool/dbconfig/20220127-125538-marostegui.json
  • 12:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19452 and previous config saved to /var/cache/conftool/dbconfig/20220127-125205-root.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19451 and previous config saved to /var/cache/conftool/dbconfig/20220127-125120-root.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19450 and previous config saved to /var/cache/conftool/dbconfig/20220127-123701-root.json
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19449 and previous config saved to /var/cache/conftool/dbconfig/20220127-123617-root.json
  • 12:26 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts restbase2011.codfw.wmnet
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298559)', diff saved to https://phabricator.wikimedia.org/P19448 and previous config saved to /var/cache/conftool/dbconfig/20220127-122558-marostegui.json
  • 12:25 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase2011.codfw.wmnet
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19447 and previous config saved to /var/cache/conftool/dbconfig/20220127-122157-root.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19446 and previous config saved to /var/cache/conftool/dbconfig/20220127-122113-root.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19445 and previous config saved to /var/cache/conftool/dbconfig/20220127-121053-marostegui.json
  • 12:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4031.ulsfo.wmnet with OS buster
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1023 (T300006)', diff saved to https://phabricator.wikimedia.org/P19444 and previous config saved to /var/cache/conftool/dbconfig/20220127-120648-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19443 and previous config saved to /var/cache/conftool/dbconfig/20220127-120608-root.json
  • 12:01 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19442 and previous config saved to /var/cache/conftool/dbconfig/20220127-115548-marostegui.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19441 and previous config saved to /var/cache/conftool/dbconfig/20220127-115105-root.json
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298559)', diff saved to https://phabricator.wikimedia.org/P19440 and previous config saved to /var/cache/conftool/dbconfig/20220127-114044-marostegui.json
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298559)', diff saved to https://phabricator.wikimedia.org/P19439 and previous config saved to /var/cache/conftool/dbconfig/20220127-113931-marostegui.json
  • 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298559)', diff saved to https://phabricator.wikimedia.org/P19438 and previous config saved to /var/cache/conftool/dbconfig/20220127-113924-marostegui.json
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19437 and previous config saved to /var/cache/conftool/dbconfig/20220127-113600-root.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T285149)', diff saved to https://phabricator.wikimedia.org/P19436 and previous config saved to /var/cache/conftool/dbconfig/20220127-113140-marostegui.json
  • 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T285149)', diff saved to https://phabricator.wikimedia.org/P19435 and previous config saved to /var/cache/conftool/dbconfig/20220127-113132-marostegui.json
  • 11:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4031.ulsfo.wmnet with OS buster
  • 11:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2023.codfw.wmnet with OS bullseye
  • 11:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1165.eqiad.wmnet with OS bullseye
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19434 and previous config saved to /var/cache/conftool/dbconfig/20220127-112418-marostegui.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19433 and previous config saved to /var/cache/conftool/dbconfig/20220127-112057-root.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P19432 and previous config saved to /var/cache/conftool/dbconfig/20220127-111628-marostegui.json
  • 11:12 vgutierrez: depool cp4031 to be reimaged as cache::text_envoy - T271421
  • 11:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1159.eqiad.wmnet with OS bullseye
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19431 and previous config saved to /var/cache/conftool/dbconfig/20220127-110913-marostegui.json
  • 11:07 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh6001.wikimedia.org
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P19429 and previous config saved to /var/cache/conftool/dbconfig/20220127-110123-marostegui.json
  • 10:56 sukhe@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh6001.wikimedia.org
  • 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1165.eqiad.wmnet with OS bullseye
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298559)', diff saved to https://phabricator.wikimedia.org/P19428 and previous config saved to /var/cache/conftool/dbconfig/20220127-105408-marostegui.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 T299479', diff saved to https://phabricator.wikimedia.org/P19427 and previous config saved to /var/cache/conftool/dbconfig/20220127-105223-marostegui.json
  • 10:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2023.codfw.wmnet with OS bullseye
  • 10:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2023.codfw.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2023.codfw.wmnet with reason: Maintenance
  • 10:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es[2024-2025].codfw.wmnet with reason: Reimage of the master T300006
  • 10:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es[2024-2025].codfw.wmnet with reason: Reimage of the master T300006
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298559)', diff saved to https://phabricator.wikimedia.org/P19426 and previous config saved to /var/cache/conftool/dbconfig/20220127-104654-marostegui.json
  • 10:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298559)', diff saved to https://phabricator.wikimedia.org/P19425 and previous config saved to /var/cache/conftool/dbconfig/20220127-104641-marostegui.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T285149)', diff saved to https://phabricator.wikimedia.org/P19424 and previous config saved to /var/cache/conftool/dbconfig/20220127-104618-marostegui.json
  • 10:38 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1159.eqiad.wmnet with OS bullseye
  • 10:35 Amir1: creating linktarget table everywhere (T299416)
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19423 and previous config saved to /var/cache/conftool/dbconfig/20220127-103136-marostegui.json
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T285149)', diff saved to https://phabricator.wikimedia.org/P19422 and previous config saved to /var/cache/conftool/dbconfig/20220127-102049-marostegui.json
  • 10:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:17 jynus: Started Bacula Director Daemon service at backup1001 T299624
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19421 and previous config saved to /var/cache/conftool/dbconfig/20220127-101631-marostegui.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19420 and previous config saved to /var/cache/conftool/dbconfig/20220127-100802-root.json
  • 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 10:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T285149)', diff saved to https://phabricator.wikimedia.org/P19419 and previous config saved to /var/cache/conftool/dbconfig/20220127-100155-marostegui.json
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298559)', diff saved to https://phabricator.wikimedia.org/P19418 and previous config saved to /var/cache/conftool/dbconfig/20220127-100127-marostegui.json
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298559)', diff saved to https://phabricator.wikimedia.org/P19417 and previous config saved to /var/cache/conftool/dbconfig/20220127-100014-marostegui.json
  • 10:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298559)', diff saved to https://phabricator.wikimedia.org/P19416 and previous config saved to /var/cache/conftool/dbconfig/20220127-100007-marostegui.json
  • 10:00 marostegui: Failover m1 from db1159 to db1128 - T299624
  • 09:57 jynus: Stopped Bacula Director Daemon service at backup1001 T299624
  • 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1027.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 09:53 moritzm: added ganeti1027 to Ganeti eqiad cluster T293909
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19415 and previous config saved to /var/cache/conftool/dbconfig/20220127-095258-root.json
  • 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1027.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 09:50 hnowlan@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 14s)
  • 09:50 hnowlan@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P19414 and previous config saved to /var/cache/conftool/dbconfig/20220127-094651-marostegui.json
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19413 and previous config saved to /var/cache/conftool/dbconfig/20220127-094502-marostegui.json
  • 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19412 and previous config saved to /var/cache/conftool/dbconfig/20220127-093755-root.json
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P19411 and previous config saved to /var/cache/conftool/dbconfig/20220127-093146-marostegui.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19410 and previous config saved to /var/cache/conftool/dbconfig/20220127-092957-marostegui.json
  • 09:27 filippo@puppetmaster1001: conftool action : set/weight=10; selector: name=prometheus2005.codfw.wmnet
  • 09:27 filippo@puppetmaster1001: conftool action : set/weight=10; selector: name=prometheus2006.codfw.wmnet
  • 09:23 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2078,2132].codfw.wmnet,db[1117,1128,1159].eqiad.wmnet with reason: Primary switchover m1 T299624
  • 09:23 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2078,2132].codfw.wmnet,db[1117,1128,1159].eqiad.wmnet with reason: Primary switchover m1 T299624
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19409 and previous config saved to /var/cache/conftool/dbconfig/20220127-092251-root.json
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1007.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 09:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1007.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T285149)', diff saved to https://phabricator.wikimedia.org/P19408 and previous config saved to /var/cache/conftool/dbconfig/20220127-091641-marostegui.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298559)', diff saved to https://phabricator.wikimedia.org/P19407 and previous config saved to /var/cache/conftool/dbconfig/20220127-091453-marostegui.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298559)', diff saved to https://phabricator.wikimedia.org/P19406 and previous config saved to /var/cache/conftool/dbconfig/20220127-091440-marostegui.json
  • 09:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 09:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 09:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298559)', diff saved to https://phabricator.wikimedia.org/P19405 and previous config saved to /var/cache/conftool/dbconfig/20220127-091401-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19404 and previous config saved to /var/cache/conftool/dbconfig/20220127-090747-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19403 and previous config saved to /var/cache/conftool/dbconfig/20220127-085857-marostegui.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19402 and previous config saved to /var/cache/conftool/dbconfig/20220127-085244-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19401 and previous config saved to /var/cache/conftool/dbconfig/20220127-084352-marostegui.json
  • 08:41 jayme@deploy1002: Finished deploy [restbase/deploy@0848b15]: scap testing (duration: 00m 05s)
  • 08:40 jayme@deploy1002: Started deploy [restbase/deploy@0848b15]: scap testing
  • 08:38 jayme: updated scap to 4.2.1 on A:mw-canary, A:parsoid-canary, A:mw-jobrunner-canary, A:restbase-canary - T300058
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19400 and previous config saved to /var/cache/conftool/dbconfig/20220127-083740-root.json
  • 08:33 jayme: uploaded scap 4.2.1 to apt.wikimedia.org - T300058
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298559)', diff saved to https://phabricator.wikimedia.org/P19399 and previous config saved to /var/cache/conftool/dbconfig/20220127-082847-marostegui.json
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298559)', diff saved to https://phabricator.wikimedia.org/P19398 and previous config saved to /var/cache/conftool/dbconfig/20220127-082735-marostegui.json
  • 08:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298559)', diff saved to https://phabricator.wikimedia.org/P19397 and previous config saved to /var/cache/conftool/dbconfig/20220127-082728-marostegui.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19396 and previous config saved to /var/cache/conftool/dbconfig/20220127-082236-root.json
  • 08:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T285149)', diff saved to https://phabricator.wikimedia.org/P19395 and previous config saved to /var/cache/conftool/dbconfig/20220127-081622-marostegui.json
  • 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 08:13 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.19/includes/libs/rdbms/database/Database.php: Backport: Don't consider lock waits to be write queries (T300194) (duration: 00m 52s)
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19394 and previous config saved to /var/cache/conftool/dbconfig/20220127-081223-marostegui.json
  • 08:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19393 and previous config saved to /var/cache/conftool/dbconfig/20220127-080733-root.json
  • 07:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 07:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T285149)', diff saved to https://phabricator.wikimedia.org/P19392 and previous config saved to /var/cache/conftool/dbconfig/20220127-075909-marostegui.json
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19391 and previous config saved to /var/cache/conftool/dbconfig/20220127-075718-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19390 and previous config saved to /var/cache/conftool/dbconfig/20220127-075229-root.json
  • 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1131.eqiad.wmnet with OS bullseye
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P19389 and previous config saved to /var/cache/conftool/dbconfig/20220127-074404-marostegui.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298559)', diff saved to https://phabricator.wikimedia.org/P19388 and previous config saved to /var/cache/conftool/dbconfig/20220127-074214-marostegui.json
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298559)', diff saved to https://phabricator.wikimedia.org/P19387 and previous config saved to /var/cache/conftool/dbconfig/20220127-074101-marostegui.json
  • 07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298559)', diff saved to https://phabricator.wikimedia.org/P19386 and previous config saved to /var/cache/conftool/dbconfig/20220127-074033-marostegui.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P19385 and previous config saved to /var/cache/conftool/dbconfig/20220127-072900-marostegui.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19384 and previous config saved to /var/cache/conftool/dbconfig/20220127-072528-marostegui.json
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1131.eqiad.wmnet with OS bullseye
  • 07:17 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1131.eqiad.wmnet with OS bullseye
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T285149)', diff saved to https://phabricator.wikimedia.org/P19383 and previous config saved to /var/cache/conftool/dbconfig/20220127-071355-marostegui.json
  • 07:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1131.eqiad.wmnet with OS bullseye
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19382 and previous config saved to /var/cache/conftool/dbconfig/20220127-071023-marostegui.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 T299479', diff saved to https://phabricator.wikimedia.org/P19381 and previous config saved to /var/cache/conftool/dbconfig/20220127-070821-marostegui.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist from s8 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P19380 and previous config saved to /var/cache/conftool/dbconfig/20220127-070557-marostegui.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es1021', diff saved to https://phabricator.wikimedia.org/P19379 and previous config saved to /var/cache/conftool/dbconfig/20220127-070532-marostegui.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T285149)', diff saved to https://phabricator.wikimedia.org/P19378 and previous config saved to /var/cache/conftool/dbconfig/20220127-070428-marostegui.json
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298559)', diff saved to https://phabricator.wikimedia.org/P19377 and previous config saved to /var/cache/conftool/dbconfig/20220127-065519-marostegui.json
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298559)', diff saved to https://phabricator.wikimedia.org/P19376 and previous config saved to /var/cache/conftool/dbconfig/20220127-065406-marostegui.json
  • 06:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 06:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 04:01 Krinkle: grafana: Temporarily silence resourceloader alert for INM satisfaction ratio, pending T298520.
  • 00:58 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: commonswiki: Add leg.journals.isu.ac.ir to the wgCopyUploadsDomains allowlist (T300217) (duration: 00m 55s)
  • 00:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:24 thcipriani: restarting jenkins

2022-01-26

  • 23:34 brennen: train 1.38.0-wmf.19 (T293960): parking the train at group0 until US morning; we have a probable fix for T300194 but CI is having issues
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:21 brennen: train 1.38.0-wmf.19 (T293960): rolling back due to increase in DBTransactionSizeErrors
  • 20:20 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.19 refs T293960"
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:07 brennen@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.19 refs T293960 (duration: 00m 54s)
  • 20:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.19 refs T293960
  • 20:01 brennen: train 1.38.0-wmf.19 (T293960): all known blockers patched, logs for wmf.19 quiet - proceeding to group1
  • 20:00 mutante: mw131* - purging remaining font packages
  • 19:53 mutante: labweb1001, labweb1002, cloudweb2001-dev (wikitech hosts) - apt-get remove --purge fonts*; apt-get remove --purge xfonts* | purging font packages that had been installed as dependencies (T294378)
  • 19:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:50 accraze@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 19:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:46 Lucas_WMDE: UTC evening backport window done
  • 19:45 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: fawiki: Add unwatchedpages permission to eliminators (T300126) (duration: 00m 51s)
  • 19:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:42 accraze@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.19/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: Don't wrap unknown actions with confirmation (T300095) (duration: 00m 51s)
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:33 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.19/includes/skins/Skin.php: Backport: Fix empty div when there's no sitenotice. (T300096) (duration: 00m 51s)
  • 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:28 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: bgwiki: Add 'wgNamespaceRobotPolicies' for Draft (Talk) namespace (T299224) (duration: 00m 52s)
  • 19:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 19:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 19:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 19:22 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 19:22 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 19:20 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:20 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 19:19 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 19:19 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 19:19 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 19:19 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.19/skins/Timeless/includes/TimelessTemplate.php: Backport: Do not duplicate categories in primary action tabs space (T300100) (duration: 00m 51s)
  • 19:18 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 19:17 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 19:16 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 19:15 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [wmf-config] Undeploy gdi survey on cawiki beta (T299913) (no-op sync, beta only) (duration: 00m 52s)
  • 19:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298559)', diff saved to https://phabricator.wikimedia.org/P19374 and previous config saved to /var/cache/conftool/dbconfig/20220126-191002-marostegui.json
  • 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P19373 and previous config saved to /var/cache/conftool/dbconfig/20220126-185457-marostegui.json
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P19372 and previous config saved to /var/cache/conftool/dbconfig/20220126-183953-marostegui.json
  • 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298559)', diff saved to https://phabricator.wikimedia.org/P19371 and previous config saved to /var/cache/conftool/dbconfig/20220126-182448-marostegui.json
  • 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298559)', diff saved to https://phabricator.wikimedia.org/P19370 and previous config saved to /var/cache/conftool/dbconfig/20220126-182333-marostegui.json
  • 18:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298559)', diff saved to https://phabricator.wikimedia.org/P19369 and previous config saved to /var/cache/conftool/dbconfig/20220126-182325-marostegui.json
  • 18:14 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1019.eqiad.wmnet
  • 18:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1019.eqiad.wmnet with OS buster
  • 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P19368 and previous config saved to /var/cache/conftool/dbconfig/20220126-180819-marostegui.json
  • 18:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 17:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19366 and previous config saved to /var/cache/conftool/dbconfig/20220126-175405-root.json
  • 17:53 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P19365 and previous config saved to /var/cache/conftool/dbconfig/20220126-175315-marostegui.json
  • 17:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19364 and previous config saved to /var/cache/conftool/dbconfig/20220126-173901-root.json
  • 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298559)', diff saved to https://phabricator.wikimedia.org/P19363 and previous config saved to /var/cache/conftool/dbconfig/20220126-173810-marostegui.json
  • 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298559)', diff saved to https://phabricator.wikimedia.org/P19361 and previous config saved to /var/cache/conftool/dbconfig/20220126-173654-marostegui.json
  • 17:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298559)', diff saved to https://phabricator.wikimedia.org/P19360 and previous config saved to /var/cache/conftool/dbconfig/20220126-173647-marostegui.json
  • 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19359 and previous config saved to /var/cache/conftool/dbconfig/20220126-172358-root.json
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P19357 and previous config saved to /var/cache/conftool/dbconfig/20220126-172141-marostegui.json
  • 17:21 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1019.eqiad.wmnet with OS buster
  • 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19356 and previous config saved to /var/cache/conftool/dbconfig/20220126-170852-root.json
  • 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P19355 and previous config saved to /var/cache/conftool/dbconfig/20220126-170635-marostegui.json
  • 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19354 and previous config saved to /var/cache/conftool/dbconfig/20220126-165349-root.json
  • 16:53 jayme: published image docker-registry.discovery.wmnet/cfssl-issuer:0.2.1-1 - T299906
  • 16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298559)', diff saved to https://phabricator.wikimedia.org/P19353 and previous config saved to /var/cache/conftool/dbconfig/20220126-165130-marostegui.json
  • 16:51 ryankemper: [WCQS Deploy] Restarted updaters across fleet: `ryankemper@cumin1001:~$ sudo cumin -b 6 'wcqs*' 'sudo systemctl restart wcqs-updater'`
  • 16:47 moritzm: draining instances off ganeti1007 for reimage
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19352 and previous config saved to /var/cache/conftool/dbconfig/20220126-163845-root.json
  • 16:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1019.eqiad.wmnet
  • 16:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on restbase1019.eqiad.wmnet with reason: Firmware upgrade
  • 16:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on restbase1019.eqiad.wmnet with reason: Firmware upgrade
  • 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298559)', diff saved to https://phabricator.wikimedia.org/P19351 and previous config saved to /var/cache/conftool/dbconfig/20220126-162810-marostegui.json
  • 16:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 16:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298559)', diff saved to https://phabricator.wikimedia.org/P19350 and previous config saved to /var/cache/conftool/dbconfig/20220126-162756-marostegui.json
  • 16:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19349 and previous config saved to /var/cache/conftool/dbconfig/20220126-162342-root.json
  • 16:23 elukey: restart varnishkafka instances on cp1087
  • 16:17 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P19348 and previous config saved to /var/cache/conftool/dbconfig/20220126-161252-marostegui.json
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19347 and previous config saved to /var/cache/conftool/dbconfig/20220126-160838-root.json
  • 16:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P19346 and previous config saved to /var/cache/conftool/dbconfig/20220126-155747-marostegui.json
  • 15:54 vgutierrez: upgrading varnishkafka to version 1.1.0 on cp[6002,6005,6009-6013].drmrs.wmnet,cp1087.eqiad.wmnet,cp[4021,4033-4034,4036].ulsfo.wmnet
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19345 and previous config saved to /var/cache/conftool/dbconfig/20220126-155334-root.json
  • 15:47 vgutierrez: pool cp4035
  • 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298559)', diff saved to https://phabricator.wikimedia.org/P19344 and previous config saved to /var/cache/conftool/dbconfig/20220126-154242-marostegui.json
  • 15:42 vgutierrez: restarting varnish-frontend on cp4035
  • 15:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1025.eqiad.wmnet with OS bullseye
  • 15:40 vgutierrez: depool cp4035
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298559)', diff saved to https://phabricator.wikimedia.org/P19343 and previous config saved to /var/cache/conftool/dbconfig/20220126-154026-marostegui.json
  • 15:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298559)', diff saved to https://phabricator.wikimedia.org/P19342 and previous config saved to /var/cache/conftool/dbconfig/20220126-154019-marostegui.json
  • 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19341 and previous config saved to /var/cache/conftool/dbconfig/20220126-153831-root.json
  • 15:29 XioNoX: add pay-lvs1003/4 to pfw3-eqiad BGP
  • 15:25 joal@deploy1002: Finished deploy [analytics/refinery@ab7f732] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@ab7f732] (duration: 05m 30s)
  • 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P19340 and previous config saved to /var/cache/conftool/dbconfig/20220126-152514-marostegui.json
  • 15:24 ottomata: paused (for meetings) in deploying new CA certs for all eventgate services, still TODO: eventgate-analytics-external, eventgate-main - T296064
  • 15:20 joal@deploy1002: Started deploy [analytics/refinery@ab7f732] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@ab7f732]
  • 15:14 joal@deploy1002: Finished deploy [analytics/refinery@ab7f732] (thin): Regular analytics weekly train THIN [analytics/refinery@ab7f732] (duration: 00m 07s)
  • 15:14 joal@deploy1002: Started deploy [analytics/refinery@ab7f732] (thin): Regular analytics weekly train THIN [analytics/refinery@ab7f732]
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1006.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 15:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1006.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P19338 and previous config saved to /var/cache/conftool/dbconfig/20220126-151009-marostegui.json
  • 15:08 joal@deploy1002: Finished deploy [analytics/refinery@ab7f732]: Regular analytics weekly train [analytics/refinery@ab7f732] (duration: 16m 38s)
  • 15:08 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1025.eqiad.wmnet with OS bullseye
  • 15:06 elukey: elukey@cp4035:~$ sudo systemctl restart varnishkafka-eventlogging.service - metrics showing messages stuck for a poll()
  • 15:03 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1025.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:00 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync on production
  • 14:58 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync on canary
  • 14:58 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply on canary
  • 14:58 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply on production
  • 14:57 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync on production
  • 14:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1005.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 14:56 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync on canary
  • 14:56 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply on production
  • 14:56 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply on canary
  • 14:55 elukey: elukey@cp4035:~$ sudo systemctl restart varnishkafka-webrequest.service - metrics showing messages stuck for a poll()
  • 14:55 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: sync on production
  • 14:55 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply on canary
  • 14:55 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply on production
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298559)', diff saved to https://phabricator.wikimedia.org/P19337 and previous config saved to /var/cache/conftool/dbconfig/20220126-145505-marostegui.json
  • 14:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1005.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 14:54 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync on production
  • 14:54 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync on canary
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298559)', diff saved to https://phabricator.wikimedia.org/P19336 and previous config saved to /var/cache/conftool/dbconfig/20220126-145349-marostegui.json
  • 14:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 14:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298559)', diff saved to https://phabricator.wikimedia.org/P19335 and previous config saved to /var/cache/conftool/dbconfig/20220126-145342-marostegui.json
  • 14:53 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply on canary
  • 14:53 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply on production
  • 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:52 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync on production
  • 14:52 joal@deploy1002: Started deploy [analytics/refinery@ab7f732]: Regular analytics weekly train [analytics/refinery@ab7f732]
  • 14:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:50 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync on canary
  • 14:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:50 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply on canary
  • 14:50 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply on production
  • 14:50 volans@cumin1001: START - Cookbook sre.hosts.provision for host es1025.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:42 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: sync on production
  • 14:41 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply on canary
  • 14:41 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply on production
  • 14:41 ottomata: deploying new CA certs for all eventgate services... T296064
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P19334 and previous config saved to /var/cache/conftool/dbconfig/20220126-143837-marostegui.json
  • 14:38 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync on production
  • 14:37 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync on canary
  • 14:37 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync on canary
  • 14:37 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync on production
  • 14:37 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync on production
  • 14:36 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync on canary
  • 14:36 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: sync on canary
  • 14:36 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: sync on production
  • 14:35 ottomata: roll restarting eventgate-analytics to pick up stream config change https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/757122
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19333 and previous config saved to /var/cache/conftool/dbconfig/20220126-142620-root.json
  • 14:25 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync on canary
  • 14:25 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync on production
  • 14:24 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: sync on production
  • 14:24 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: sync on canary
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P19332 and previous config saved to /var/cache/conftool/dbconfig/20220126-142332-marostegui.json
  • 14:23 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply on canary
  • 14:23 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply on production
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T285149)', diff saved to https://phabricator.wikimedia.org/P19331 and previous config saved to /var/cache/conftool/dbconfig/20220126-142255-marostegui.json
  • 14:22 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply on canary
  • 14:22 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply on production
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19330 and previous config saved to /var/cache/conftool/dbconfig/20220126-141113-root.json
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298559)', diff saved to https://phabricator.wikimedia.org/P19329 and previous config saved to /var/cache/conftool/dbconfig/20220126-140827-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P19328 and previous config saved to /var/cache/conftool/dbconfig/20220126-140751-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298559)', diff saved to https://phabricator.wikimedia.org/P19327 and previous config saved to /var/cache/conftool/dbconfig/20220126-140712-marostegui.json
  • 14:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 14:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298559)', diff saved to https://phabricator.wikimedia.org/P19326 and previous config saved to /var/cache/conftool/dbconfig/20220126-140629-marostegui.json
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1025', diff saved to https://phabricator.wikimedia.org/P19325 and previous config saved to /var/cache/conftool/dbconfig/20220126-135635-marostegui.json
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1014.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19324 and previous config saved to /var/cache/conftool/dbconfig/20220126-135610-root.json
  • 13:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1014.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1015.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P19323 and previous config saved to /var/cache/conftool/dbconfig/20220126-135245-marostegui.json
  • 13:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1015.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P19322 and previous config saved to /var/cache/conftool/dbconfig/20220126-135124-marostegui.json
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1006.eqiad.wmnet
  • 13:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1006.eqiad.wmnet
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19321 and previous config saved to /var/cache/conftool/dbconfig/20220126-134106-root.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T285149)', diff saved to https://phabricator.wikimedia.org/P19320 and previous config saved to /var/cache/conftool/dbconfig/20220126-133740-marostegui.json
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T285149)', diff saved to https://phabricator.wikimedia.org/P19319 and previous config saved to /var/cache/conftool/dbconfig/20220126-133634-marostegui.json
  • 13:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 13:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T285149)', diff saved to https://phabricator.wikimedia.org/P19318 and previous config saved to /var/cache/conftool/dbconfig/20220126-133627-marostegui.json
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P19317 and previous config saved to /var/cache/conftool/dbconfig/20220126-133619-marostegui.json
  • 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1005.eqiad.wmnet
  • 13:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1005.eqiad.wmnet
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19316 and previous config saved to /var/cache/conftool/dbconfig/20220126-132603-root.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges from s8 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P19315 and previous config saved to /var/cache/conftool/dbconfig/20220126-132600-marostegui.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P19314 and previous config saved to /var/cache/conftool/dbconfig/20220126-132122-marostegui.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298559)', diff saved to https://phabricator.wikimedia.org/P19313 and previous config saved to /var/cache/conftool/dbconfig/20220126-132114-marostegui.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298559)', diff saved to https://phabricator.wikimedia.org/P19311 and previous config saved to /var/cache/conftool/dbconfig/20220126-131959-marostegui.json
  • 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:17 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
  • 13:16 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 13:16 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 13:16 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19310 and previous config saved to /var/cache/conftool/dbconfig/20220126-131047-root.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P19309 and previous config saved to /var/cache/conftool/dbconfig/20220126-130611-marostegui.json
  • 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19308 and previous config saved to /var/cache/conftool/dbconfig/20220126-130527-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19307 and previous config saved to /var/cache/conftool/dbconfig/20220126-125543-root.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T285149)', diff saved to https://phabricator.wikimedia.org/P19306 and previous config saved to /var/cache/conftool/dbconfig/20220126-125107-marostegui.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19305 and previous config saved to /var/cache/conftool/dbconfig/20220126-125023-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T285149)', diff saved to https://phabricator.wikimedia.org/P19304 and previous config saved to /var/cache/conftool/dbconfig/20220126-125001-marostegui.json
  • 12:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T285149)', diff saved to https://phabricator.wikimedia.org/P19303 and previous config saved to /var/cache/conftool/dbconfig/20220126-124953-marostegui.json
  • 12:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:43 dcausse: UTC morning backport done
  • 12:41 dcausse@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Correct wcqs event stream name (duration: 00m 51s)
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19302 and previous config saved to /var/cache/conftool/dbconfig/20220126-124040-root.json
  • 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024 (T300006)', diff saved to https://phabricator.wikimedia.org/P19301 and previous config saved to /var/cache/conftool/dbconfig/20220126-123839-ladsgroup.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19300 and previous config saved to /var/cache/conftool/dbconfig/20220126-123520-root.json
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P19299 and previous config saved to /var/cache/conftool/dbconfig/20220126-123448-marostegui.json
  • 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:32 dcausse@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: commonswiki: Add www.kew.org to the wgCopyUploadsDomains allowlist (T300101) (duration: 00m 51s)
  • 12:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:28 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on staging
  • 12:28 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 12:28 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 12:28 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 12:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19298 and previous config saved to /var/cache/conftool/dbconfig/20220126-122536-root.json
  • 12:25 dcausse@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: fawiki: Add unwatchedpages permission to patrollers (T300126) (duration: 00m 51s)
  • 12:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:22 moritzm: installing apache security updates
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19297 and previous config saved to /var/cache/conftool/dbconfig/20220126-122016-root.json
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P19296 and previous config saved to /var/cache/conftool/dbconfig/20220126-121944-marostegui.json
  • 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2024.codfw.wmnet with OS bullseye
  • 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:10 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19294 and previous config saved to /var/cache/conftool/dbconfig/20220126-121032-root.json
  • 12:09 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 12:09 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 12:09 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 12:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1137.eqiad.wmnet with OS bullseye
  • 12:09 dcausse@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Deal with change in MachineVision handler constructor (duration: 00m 51s)
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19293 and previous config saved to /var/cache/conftool/dbconfig/20220126-120513-root.json
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T285149)', diff saved to https://phabricator.wikimedia.org/P19292 and previous config saved to /var/cache/conftool/dbconfig/20220126-120439-marostegui.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T285149)', diff saved to https://phabricator.wikimedia.org/P19291 and previous config saved to /var/cache/conftool/dbconfig/20220126-120132-marostegui.json
  • 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T285149)', diff saved to https://phabricator.wikimedia.org/P19290 and previous config saved to /var/cache/conftool/dbconfig/20220126-120125-marostegui.json
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19288 and previous config saved to /var/cache/conftool/dbconfig/20220126-115009-root.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P19287 and previous config saved to /var/cache/conftool/dbconfig/20220126-114619-marostegui.json
  • 11:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:44 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1137.eqiad.wmnet with OS bullseye
  • 11:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 T300099', diff saved to https://phabricator.wikimedia.org/P19286 and previous config saved to /var/cache/conftool/dbconfig/20220126-114236-marostegui.json
  • 11:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2024.codfw.wmnet with OS bullseye
  • 11:41 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.19/includes/libs/rdbms/database/Database.php: Backport: rdbms: Pass commented SQL to the GeneralizedSql for logging (T298687) (duration: 00m 54s)
  • 11:41 moritzm: installing libxfont security updates
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19285 and previous config saved to /var/cache/conftool/dbconfig/20220126-113730-root.json
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2024 (T300006)', diff saved to https://phabricator.wikimedia.org/P19284 and previous config saved to /var/cache/conftool/dbconfig/20220126-113626-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2024.codfw.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2024.codfw.wmnet with reason: Maintenance
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19283 and previous config saved to /var/cache/conftool/dbconfig/20220126-113505-root.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P19282 and previous config saved to /var/cache/conftool/dbconfig/20220126-113115-marostegui.json
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025 (T300006)', diff saved to https://phabricator.wikimedia.org/P19281 and previous config saved to /var/cache/conftool/dbconfig/20220126-112719-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025 (T300006)', diff saved to https://phabricator.wikimedia.org/P19280 and previous config saved to /var/cache/conftool/dbconfig/20220126-112439-ladsgroup.json
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19279 and previous config saved to /var/cache/conftool/dbconfig/20220126-112227-root.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19278 and previous config saved to /var/cache/conftool/dbconfig/20220126-112002-root.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T285149)', diff saved to https://phabricator.wikimedia.org/P19277 and previous config saved to /var/cache/conftool/dbconfig/20220126-111610-marostegui.json
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T285149)', diff saved to https://phabricator.wikimedia.org/P19276 and previous config saved to /var/cache/conftool/dbconfig/20220126-111504-marostegui.json
  • 11:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T285149)', diff saved to https://phabricator.wikimedia.org/P19275 and previous config saved to /var/cache/conftool/dbconfig/20220126-111425-marostegui.json
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19274 and previous config saved to /var/cache/conftool/dbconfig/20220126-110723-root.json
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19273 and previous config saved to /var/cache/conftool/dbconfig/20220126-110458-root.json
  • 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2025.codfw.wmnet with OS bullseye
  • 11:03 hnowlan@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 22m 16s)
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P19272 and previous config saved to /var/cache/conftool/dbconfig/20220126-105921-marostegui.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19271 and previous config saved to /var/cache/conftool/dbconfig/20220126-105220-root.json
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19270 and previous config saved to /var/cache/conftool/dbconfig/20220126-104955-root.json
  • 10:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1020.eqiad.wmnet with OS bullseye
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P19269 and previous config saved to /var/cache/conftool/dbconfig/20220126-104416-marostegui.json
  • 10:41 hnowlan@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 10:41 btullis: re-enabled puppet on all cp-* nodes.
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19268 and previous config saved to /var/cache/conftool/dbconfig/20220126-103716-root.json
  • 10:34 oblivian@deploy1002: Finished deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001 (duration: 00m 33s)
  • 10:34 oblivian@deploy1002: Started deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001
  • 10:33 oblivian@deploy1002: Finished deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001 (duration: 01m 05s)
  • 10:32 oblivian@deploy1002: Started deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T285149)', diff saved to https://phabricator.wikimedia.org/P19267 and previous config saved to /var/cache/conftool/dbconfig/20220126-102911-marostegui.json
  • 10:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2025.codfw.wmnet with OS bullseye
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T285149)', diff saved to https://phabricator.wikimedia.org/P19266 and previous config saved to /var/cache/conftool/dbconfig/20220126-102805-marostegui.json
  • 10:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 10:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T285149)', diff saved to https://phabricator.wikimedia.org/P19265 and previous config saved to /var/cache/conftool/dbconfig/20220126-102758-marostegui.json
  • 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1006.eqiad.wmnet with OS buster
  • 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2025 (T300006)', diff saved to https://phabricator.wikimedia.org/P19264 and previous config saved to /var/cache/conftool/dbconfig/20220126-102445-ladsgroup.json
  • 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: Maintenance
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19263 and previous config saved to /var/cache/conftool/dbconfig/20220126-102213-root.json
  • 10:16 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1020.eqiad.wmnet with OS bullseye
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P19261 and previous config saved to /var/cache/conftool/dbconfig/20220126-101253-marostegui.json
  • 10:12 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1020.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19260 and previous config saved to /var/cache/conftool/dbconfig/20220126-100709-root.json
  • 10:01 volans@cumin1001: START - Cookbook sre.hosts.provision for host es1020.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P19259 and previous config saved to /var/cache/conftool/dbconfig/20220126-095749-marostegui.json
  • 09:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1006.eqiad.wmnet with OS buster
  • 09:53 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
  • 09:52 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 09:52 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 09:52 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19258 and previous config saved to /var/cache/conftool/dbconfig/20220126-095205-root.json
  • 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1005.eqiad.wmnet with OS buster
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T285149)', diff saved to https://phabricator.wikimedia.org/P19257 and previous config saved to /var/cache/conftool/dbconfig/20220126-094244-marostegui.json
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T285149)', diff saved to https://phabricator.wikimedia.org/P19256 and previous config saved to /var/cache/conftool/dbconfig/20220126-094138-marostegui.json
  • 09:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 09:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T285149)', diff saved to https://phabricator.wikimedia.org/P19255 and previous config saved to /var/cache/conftool/dbconfig/20220126-094131-marostegui.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19254 and previous config saved to /var/cache/conftool/dbconfig/20220126-093702-root.json
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1005.eqiad.wmnet with OS buster
  • 09:32 jayme: updated scap to 4.2.0 on A:restbase-canary - T300058
  • 09:28 godog: begin rsync prometheus2004 -> 2005 - T296199
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P19252 and previous config saved to /var/cache/conftool/dbconfig/20220126-092626-marostegui.json
  • 09:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1005.eqiad.wmnet with OS buster
  • 09:25 jayme: updated scap to 4.2.0 on A:mw-canary, A:parsoid-canary, A:mw-jobrunner-canary - T300058
  • 09:24 jayme: uploaded scap 4.2.0 to apt.wikimedia.org - T300058
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19251 and previous config saved to /var/cache/conftool/dbconfig/20220126-092158-root.json
  • 09:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1120.eqiad.wmnet with OS bullseye
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P19250 and previous config saved to /var/cache/conftool/dbconfig/20220126-091121-marostegui.json
  • 09:06 jayme: uploaded scap 4.2.0 to apt.wikimedia.org
  • 09:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1005.eqiad.wmnet with OS buster
  • 09:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1120.eqiad.wmnet with OS bullseye
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T300099', diff saved to https://phabricator.wikimedia.org/P19249 and previous config saved to /var/cache/conftool/dbconfig/20220126-085733-marostegui.json
  • 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1014.eqiad.wmnet with OS buster
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T285149)', diff saved to https://phabricator.wikimedia.org/P19248 and previous config saved to /var/cache/conftool/dbconfig/20220126-085616-marostegui.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T285149)', diff saved to https://phabricator.wikimedia.org/P19247 and previous config saved to /var/cache/conftool/dbconfig/20220126-085510-marostegui.json
  • 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 08:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T285149)', diff saved to https://phabricator.wikimedia.org/P19246 and previous config saved to /var/cache/conftool/dbconfig/20220126-085503-marostegui.json
  • 08:41 moritzm: draining instances off ganeti1015 for reimage
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P19245 and previous config saved to /var/cache/conftool/dbconfig/20220126-083958-marostegui.json
  • 08:31 jelto: sign puppet cert for gitlab-runner1001.eqiad.wmnet
  • 08:29 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1014.eqiad.wmnet with OS buster
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P19244 and previous config saved to /var/cache/conftool/dbconfig/20220126-082453-marostegui.json
  • 08:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1013.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 08:18 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1013.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T285149)', diff saved to https://phabricator.wikimedia.org/P19243 and previous config saved to /var/cache/conftool/dbconfig/20220126-080948-marostegui.json
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T285149)', diff saved to https://phabricator.wikimedia.org/P19242 and previous config saved to /var/cache/conftool/dbconfig/20220126-080842-marostegui.json
  • 08:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T285149)', diff saved to https://phabricator.wikimedia.org/P19241 and previous config saved to /var/cache/conftool/dbconfig/20220126-080831-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P19240 and previous config saved to /var/cache/conftool/dbconfig/20220126-075326-marostegui.json
  • 07:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2131.codfw.wmnet with OS bullseye
  • 07:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 07:49 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1020.eqiad.wmnet with OS bullseye
  • 07:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 07:45 taavi@deploy1002: Synchronized wmf-config/interwiki.php: Config: Update interwiki cache (duration: 00m 52s)
  • 07:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1020.eqiad.wmnet with OS bullseye
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P19239 and previous config saved to /var/cache/conftool/dbconfig/20220126-073822-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T285149)', diff saved to https://phabricator.wikimedia.org/P19238 and previous config saved to /var/cache/conftool/dbconfig/20220126-072317-marostegui.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T285149)', diff saved to https://phabricator.wikimedia.org/P19237 and previous config saved to /var/cache/conftool/dbconfig/20220126-072211-marostegui.json
  • 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T285149)', diff saved to https://phabricator.wikimedia.org/P19236 and previous config saved to /var/cache/conftool/dbconfig/20220126-072200-marostegui.json
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2115.codfw.wmnet with OS bullseye
  • 07:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2131.codfw.wmnet with OS bullseye
  • 07:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2096.codfw.wmnet with OS bullseye
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P19235 and previous config saved to /var/cache/conftool/dbconfig/20220126-070654-marostegui.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P19234 and previous config saved to /var/cache/conftool/dbconfig/20220126-065149-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked from s8 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P19233 and previous config saved to /var/cache/conftool/dbconfig/20220126-064653-marostegui.json
  • 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2115.codfw.wmnet with OS bullseye
  • 06:41 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2096.codfw.wmnet with OS bullseye
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T285149)', diff saved to https://phabricator.wikimedia.org/P19232 and previous config saved to /var/cache/conftool/dbconfig/20220126-063644-marostegui.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020 T300005', diff saved to https://phabricator.wikimedia.org/P19231 and previous config saved to /var/cache/conftool/dbconfig/20220126-063149-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T285149)', diff saved to https://phabricator.wikimedia.org/P19230 and previous config saved to /var/cache/conftool/dbconfig/20220126-063037-marostegui.json
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086 (s7,s8) T299882', diff saved to https://phabricator.wikimedia.org/P19229 and previous config saved to /var/cache/conftool/dbconfig/20220126-062406-marostegui.json
  • 05:02 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@dc7c5ac] (wcqs): Deploy 0.3.100 to WCQS (duration: 02m 21s)
  • 04:59 ryankemper@deploy1002: Started deploy [wdqs/wdqs@dc7c5ac] (wcqs): Deploy 0.3.100 to WCQS
  • 04:56 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 03:42 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 03:42 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 03:42 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 03:40 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@dc7c5ac]: 0.3.100 (duration: 08m 35s)
  • 03:32 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.100` on canary `wdqs1003`; proceeding to rest of fleet
  • 03:31 ryankemper@deploy1002: Started deploy [wdqs/wdqs@dc7c5ac]: 0.3.100
  • 03:30 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.100`. Pre-deploy tests passing on canary `wdqs1003`
  • 02:49 ryankemper: [WDQS] T299098 `ryankemper@wdqs2003:~$ sudo pool` (forgot to pool after dcops fixed hw issue)
  • 01:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:04 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable migration mode on Italian and MediaWIki.org (T299927) (duration: 00m 54s)
  • 01:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:00 catrope@deploy1002: Synchronized php-1.38.0-wmf.18/skins/Vector/: Backport: Do not load common.js twice (T300070) and Fix bug in SkinVersionLookup (T299971) (duration: 00m 51s)
  • 01:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:56 catrope@deploy1002: Synchronized php-1.38.0-wmf.19/skins/Vector/: Backport: Do not load common.js twice (T300070) (duration: 02m 43s)
  • 00:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:11 ryankemper: T294805 Reverted https://gerrit.wikimedia.org/r/c/operations/puppet/+/757003 (elasticsearch-oss dependency issues, will pick this back up tomorrow); re-enabling puppet across elastic1*
  • 00:03 ryankemper: T294805 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/757003; running puppet on `elastic1068` to make it join the fleet

2022-01-25

  • 23:42 ryankemper: T294805 [Elastic] Step 2: Disabling puppet in advance of merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/736117
  • 23:20 ryankemper: T294805 [Elastic] Merged https://gerrit.wikimedia.org/r/736116, step 1 of bringing new eqiad 10G refresh hosts into service
  • 21:20 bblack@cumin1001: conftool action : set/weight=100; selector: dc=drmrs,service=ats-be
  • 21:20 bblack@cumin1001: conftool action : set/weight=1; selector: dc=drmrs,service=varnish-fe
  • 21:20 bblack@cumin1001: conftool action : set/weight=1; selector: dc=drmrs,service=ats-tls
  • 21:03 cwhite: end transition to logstash output opensearch plugin T299168
  • 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:17 cwhite: begin transition to logstash output opensearch plugin T299168
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.19 refs T293960
  • 20:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1008.eqiad.wmnet with OS buster
  • 20:01 brennen: train 1.38.0-wmf.19 (T293960): testwiki sync finished, still no open blockers, proceeding to group0
  • 19:50 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.19 refs T293960 (duration: 52m 01s)
  • 19:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster
  • 19:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:35 cmjohnson1: updating firmware ganeti1006 T299527
  • 19:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Make es1028 master of es3 T299911', diff saved to https://phabricator.wikimedia.org/P19221 and previous config saved to /var/cache/conftool/dbconfig/20220125-191238-ladsgroup.json
  • 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T299911)', diff saved to https://phabricator.wikimedia.org/P19220 and previous config saved to /var/cache/conftool/dbconfig/20220125-190949-ladsgroup.json
  • 19:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1006.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 19:04 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1006.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:58 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.19 refs T293960
  • 18:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P19219 and previous config saved to /var/cache/conftool/dbconfig/20220125-185444-ladsgroup.json
  • 18:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19218 and previous config saved to /var/cache/conftool/dbconfig/20220125-184714-root.json
  • 18:44 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.eqiad.wmnet
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P19217 and previous config saved to /var/cache/conftool/dbconfig/20220125-183940-ladsgroup.json
  • 18:38 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: sync on production
  • 18:34 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply on production
  • 18:33 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: sync on production
  • 18:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19216 and previous config saved to /var/cache/conftool/dbconfig/20220125-183210-root.json
  • 18:31 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply on production
  • 18:30 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: sync on production
  • 18:29 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply on production
  • 18:28 moritzm: installing policykit-1 security updates on buster
  • 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T299911)', diff saved to https://phabricator.wikimedia.org/P19215 and previous config saved to /var/cache/conftool/dbconfig/20220125-182435-ladsgroup.json
  • 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1028.eqiad.wmnet with OS bullseye
  • 18:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19214 and previous config saved to /var/cache/conftool/dbconfig/20220125-181706-root.json
  • 18:14 brennen: train 1.38.0-wmf.19 (T293960): no open blockers, starting stage-train script shortly
  • 18:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19213 and previous config saved to /var/cache/conftool/dbconfig/20220125-180203-root.json
  • 18:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19212 and previous config saved to /var/cache/conftool/dbconfig/20220125-174659-root.json
  • 17:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es1028.eqiad.wmnet with OS bullseye
  • 17:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19211 and previous config saved to /var/cache/conftool/dbconfig/20220125-173156-root.json
  • 17:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19210 and previous config saved to /var/cache/conftool/dbconfig/20220125-171652-root.json
  • 17:02 cwhite: upgrade elasticsearch-curator on apifeatureusage1001
  • 17:01 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19209 and previous config saved to /var/cache/conftool/dbconfig/20220125-170148-root.json
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1028 (T299911)', diff saved to https://phabricator.wikimedia.org/P19208 and previous config saved to /var/cache/conftool/dbconfig/20220125-164900-ladsgroup.json
  • 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
  • 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19207 and previous config saved to /var/cache/conftool/dbconfig/20220125-164645-root.json
  • 16:46 taavi: deploy updated patch for T285116
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Make es1031 master of es3 T299911', diff saved to https://phabricator.wikimedia.org/P19206 and previous config saved to /var/cache/conftool/dbconfig/20220125-164324-ladsgroup.json
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 (T299911)', diff saved to https://phabricator.wikimedia.org/P19204 and previous config saved to /var/cache/conftool/dbconfig/20220125-164118-ladsgroup.json
  • 16:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T285149)', diff saved to https://phabricator.wikimedia.org/P19203 and previous config saved to /var/cache/conftool/dbconfig/20220125-163721-marostegui.json
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19202 and previous config saved to /var/cache/conftool/dbconfig/20220125-163141-root.json
  • 16:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19201 and previous config saved to /var/cache/conftool/dbconfig/20220125-163054-root.json
  • 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P19200 and previous config saved to /var/cache/conftool/dbconfig/20220125-162613-ladsgroup.json
  • 16:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P19199 and previous config saved to /var/cache/conftool/dbconfig/20220125-162217-marostegui.json
  • 16:21 cmjohnson1: updating firmware ganeti1005 T299527
  • 16:18 cmjohnson1: updating firmware ganeti1014 T299527
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19198 and previous config saved to /var/cache/conftool/dbconfig/20220125-161550-root.json
  • 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P19197 and previous config saved to /var/cache/conftool/dbconfig/20220125-161108-ladsgroup.json
  • 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P19196 and previous config saved to /var/cache/conftool/dbconfig/20220125-160712-marostegui.json
  • 16:06 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-test-coord1001.eqiad.wmnet with reason: Still troubleshooting mariadb issues
  • 16:06 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-test-coord1001.eqiad.wmnet with reason: Still troubleshooting mariadb issues
  • 16:05 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1022.eqiad.wmnet with OS bullseye
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T299827)', diff saved to https://phabricator.wikimedia.org/P19195 and previous config saved to /var/cache/conftool/dbconfig/20220125-160522-marostegui.json
  • 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19194 and previous config saved to /var/cache/conftool/dbconfig/20220125-160047-root.json
  • 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 (T299911)', diff saved to https://phabricator.wikimedia.org/P19193 and previous config saved to /var/cache/conftool/dbconfig/20220125-155604-ladsgroup.json
  • 15:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1034.eqiad.wmnet with OS bullseye
  • 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T285149)', diff saved to https://phabricator.wikimedia.org/P19192 and previous config saved to /var/cache/conftool/dbconfig/20220125-155207-marostegui.json
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T285149)', diff saved to https://phabricator.wikimedia.org/P19191 and previous config saved to /var/cache/conftool/dbconfig/20220125-155101-marostegui.json
  • 15:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T285149)', diff saved to https://phabricator.wikimedia.org/P19190 and previous config saved to /var/cache/conftool/dbconfig/20220125-155053-marostegui.json
  • 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19189 and previous config saved to /var/cache/conftool/dbconfig/20220125-155017-marostegui.json
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19187 and previous config saved to /var/cache/conftool/dbconfig/20220125-154543-root.json
  • 15:38 mmandere@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir6002.drmrs.wmnet
  • 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P19186 and previous config saved to /var/cache/conftool/dbconfig/20220125-153548-marostegui.json
  • 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P19185 and previous config saved to /var/cache/conftool/dbconfig/20220125-153511-marostegui.json
  • 15:34 volans@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
  • 15:32 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.eqiad.wmnet
  • 15:31 godog: centrallog1001:~# lvextend --resizefs --size +23G /dev/centrallog1001-vg/data
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19184 and previous config saved to /var/cache/conftool/dbconfig/20220125-153040-root.json
  • 15:24 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host ncredir6002.drmrs.wmnet
  • 15:21 mmandere@cumin1001: conftool action : set/pooled=yes; selector: name=ncredir6002.*
  • 15:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es1034.eqiad.wmnet with OS bullseye
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P19183 and previous config saved to /var/cache/conftool/dbconfig/20220125-152044-marostegui.json
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T299827)', diff saved to https://phabricator.wikimedia.org/P19182 and previous config saved to /var/cache/conftool/dbconfig/20220125-152006-marostegui.json
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T299827)', diff saved to https://phabricator.wikimedia.org/P19181 and previous config saved to /var/cache/conftool/dbconfig/20220125-151900-marostegui.json
  • 15:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 15:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T299827)', diff saved to https://phabricator.wikimedia.org/P19180 and previous config saved to /var/cache/conftool/dbconfig/20220125-151852-marostegui.json
  • 15:18 mmandere@cumin1001: conftool action : select; selector: cluster=necredir,dc=drmrs
  • 15:17 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 15:17 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19179 and previous config saved to /var/cache/conftool/dbconfig/20220125-151536-root.json
  • 15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
  • 15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1009.eqiad.wmnet
  • 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T285149)', diff saved to https://phabricator.wikimedia.org/P19178 and previous config saved to /var/cache/conftool/dbconfig/20220125-150539-marostegui.json
  • 15:04 bblack: lvs6002: restarting pybal
  • 15:03 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19177 and previous config saved to /var/cache/conftool/dbconfig/20220125-150348-marostegui.json
  • 15:03 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 15:03 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 15:03 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 15:03 bblack: lvs600[13]: restarting pybal
  • 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1034 (T299911)', diff saved to https://phabricator.wikimedia.org/P19176 and previous config saved to /var/cache/conftool/dbconfig/20220125-150256-ladsgroup.json
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
  • 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031 (T299911)', diff saved to https://phabricator.wikimedia.org/P19175 and previous config saved to /var/cache/conftool/dbconfig/20220125-150052-ladsgroup.json
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19174 and previous config saved to /var/cache/conftool/dbconfig/20220125-150031-root.json
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P19173 and previous config saved to /var/cache/conftool/dbconfig/20220125-144843-marostegui.json
  • 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P19172 and previous config saved to /var/cache/conftool/dbconfig/20220125-144548-ladsgroup.json
  • 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19171 and previous config saved to /var/cache/conftool/dbconfig/20220125-144528-root.json
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T299827)', diff saved to https://phabricator.wikimedia.org/P19170 and previous config saved to /var/cache/conftool/dbconfig/20220125-143338-marostegui.json
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T299827)', diff saved to https://phabricator.wikimedia.org/P19169 and previous config saved to /var/cache/conftool/dbconfig/20220125-143232-marostegui.json
  • 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T299827)', diff saved to https://phabricator.wikimedia.org/P19168 and previous config saved to /var/cache/conftool/dbconfig/20220125-143218-marostegui.json
  • 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P19167 and previous config saved to /var/cache/conftool/dbconfig/20220125-143043-ladsgroup.json
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19166 and previous config saved to /var/cache/conftool/dbconfig/20220125-143024-root.json
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager from s8 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P19165 and previous config saved to /var/cache/conftool/dbconfig/20220125-142614-marostegui.json
  • 14:23 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab-runner1001.eqiad.wmnet on all recursors
  • 14:23 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache gitlab-runner1001.eqiad.wmnet on all recursors
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19164 and previous config saved to /var/cache/conftool/dbconfig/20220125-141714-marostegui.json
  • 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031 (T299911)', diff saved to https://phabricator.wikimedia.org/P19163 and previous config saved to /var/cache/conftool/dbconfig/20220125-141538-ladsgroup.json
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19162 and previous config saved to /var/cache/conftool/dbconfig/20220125-141520-root.json
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T285149)', diff saved to https://phabricator.wikimedia.org/P19161 and previous config saved to /var/cache/conftool/dbconfig/20220125-141520-marostegui.json
  • 14:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 14:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T285149)', diff saved to https://phabricator.wikimedia.org/P19160 and previous config saved to /var/cache/conftool/dbconfig/20220125-141513-marostegui.json
  • 14:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1026.eqiad.wmnet with OS bullseye
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P19159 and previous config saved to /var/cache/conftool/dbconfig/20220125-140209-marostegui.json
  • 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P19158 and previous config saved to /var/cache/conftool/dbconfig/20220125-140008-marostegui.json
  • 13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1031.eqiad.wmnet with OS bullseye
  • 13:55 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1022.eqiad.wmnet with OS bullseye
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086 (s7,s8) T299882', diff saved to https://phabricator.wikimedia.org/P19157 and previous config saved to /var/cache/conftool/dbconfig/20220125-135212-marostegui.json
  • 13:50 volans@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
  • 13:48 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1022.eqiad.wmnet with OS bullseye
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T299827)', diff saved to https://phabricator.wikimedia.org/P19156 and previous config saved to /var/cache/conftool/dbconfig/20220125-134704-marostegui.json
  • 13:46 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.eqiad.wmnet
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T299827)', diff saved to https://phabricator.wikimedia.org/P19155 and previous config saved to /var/cache/conftool/dbconfig/20220125-134557-marostegui.json
  • 13:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T299827)', diff saved to https://phabricator.wikimedia.org/P19154 and previous config saved to /var/cache/conftool/dbconfig/20220125-134547-marostegui.json
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P19153 and previous config saved to /var/cache/conftool/dbconfig/20220125-134503-marostegui.json
  • 13:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1026.eqiad.wmnet with OS bullseye
  • 13:38 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
  • 13:33 _joe_: restarted pybal on lvs6003
  • 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1005.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 13:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1005.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 13:31 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=ncredir,name=ncredir6001.drmrs.wmnet
  • 13:30 oblivian@puppetmaster1001: conftool action : set/weight=1; selector: dc=drmrs,cluster=ncredir
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19151 and previous config saved to /var/cache/conftool/dbconfig/20220125-133042-marostegui.json
  • 13:30 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on gitlab-runner1001.eqiad.wmnet with reason: move gitlab-runner1001 to new ganeti row
  • 13:30 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on gitlab-runner1001.eqiad.wmnet with reason: move gitlab-runner1001 to new ganeti row
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T285149)', diff saved to https://phabricator.wikimedia.org/P19150 and previous config saved to /var/cache/conftool/dbconfig/20220125-132958-marostegui.json
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T285149)', diff saved to https://phabricator.wikimedia.org/P19149 and previous config saved to /var/cache/conftool/dbconfig/20220125-132852-marostegui.json
  • 13:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T285149)', diff saved to https://phabricator.wikimedia.org/P19148 and previous config saved to /var/cache/conftool/dbconfig/20220125-132844-marostegui.json
  • 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:26 volans@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
  • 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es1031.eqiad.wmnet with OS bullseye
  • 13:22 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
  • 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:20 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 13:20 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 13:20 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 13:19 taavi@deploy1002: Synchronized wmf-config/wikitech.php: Config: wikitech: use ldap-rw.$SITE for ldap access (T295150) (duration: 00m 49s)
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1026 T299889', diff saved to https://phabricator.wikimedia.org/P19147 and previous config saved to /var/cache/conftool/dbconfig/20220125-131727-marostegui.json
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1030 to es2 master T299889', diff saved to https://phabricator.wikimedia.org/P19146 and previous config saved to /var/cache/conftool/dbconfig/20220125-131622-marostegui.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P19145 and previous config saved to /var/cache/conftool/dbconfig/20220125-131537-marostegui.json
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P19144 and previous config saved to /var/cache/conftool/dbconfig/20220125-131340-marostegui.json
  • 13:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: reimage for upgrade - T299911
  • 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: reimage for upgrade - T299911
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T299827)', diff saved to https://phabricator.wikimedia.org/P19143 and previous config saved to /var/cache/conftool/dbconfig/20220125-130032-marostegui.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T299827)', diff saved to https://phabricator.wikimedia.org/P19142 and previous config saved to /var/cache/conftool/dbconfig/20220125-125923-marostegui.json
  • 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T299827)', diff saved to https://phabricator.wikimedia.org/P19141 and previous config saved to /var/cache/conftool/dbconfig/20220125-125857-marostegui.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P19140 and previous config saved to /var/cache/conftool/dbconfig/20220125-125835-marostegui.json
  • 12:56 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync on production
  • 12:55 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync on staging
  • 12:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync on production
  • 12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync on production
  • 12:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync on staging
  • 12:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync on production
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19139 and previous config saved to /var/cache/conftool/dbconfig/20220125-124352-marostegui.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T285149)', diff saved to https://phabricator.wikimedia.org/P19138 and previous config saved to /var/cache/conftool/dbconfig/20220125-124330-marostegui.json
  • 12:38 Lucas_WMDE: UTC morning backport window done
  • 12:37 kharlan@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/GrowthExperiments/modules: Backport (2/2): Add an image: update onboarding images for desktop (T298109) (duration: 00m 49s)
  • 12:36 kharlan@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/GrowthExperiments/images: Backport (1/2): Add an image: update onboarding images for desktop (T298109) (duration: 00m 50s)
  • 12:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool es1031 (T299911)', diff saved to https://phabricator.wikimedia.org/P19136 and previous config saved to /var/cache/conftool/dbconfig/20220125-123303-ladsgroup.json
  • 12:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P19135 and previous config saved to /var/cache/conftool/dbconfig/20220125-122848-marostegui.json
  • 12:17 hnowlan: removal of restbase2011 from cassandra cluster complete
  • 12:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T299827)', diff saved to https://phabricator.wikimedia.org/P19134 and previous config saved to /var/cache/conftool/dbconfig/20220125-121343-marostegui.json
  • 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable statement usage tracking for Armenian Wikipedia (hywiki) (T296382) (duration: 00m 50s)
  • 12:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T299827)', diff saved to https://phabricator.wikimedia.org/P19133 and previous config saved to /var/cache/conftool/dbconfig/20220125-120632-marostegui.json
  • 12:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T299827)', diff saved to https://phabricator.wikimedia.org/P19132 and previous config saved to /var/cache/conftool/dbconfig/20220125-120625-marostegui.json
  • 11:57 oblivian@puppetmaster1001: conftool action : set/weight=1; selector: dc=eqiad,cluster=appserver,service=canary
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19131 and previous config saved to /var/cache/conftool/dbconfig/20220125-115120-marostegui.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T285149)', diff saved to https://phabricator.wikimedia.org/P19130 and previous config saved to /var/cache/conftool/dbconfig/20220125-114311-marostegui.json
  • 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T285149)', diff saved to https://phabricator.wikimedia.org/P19129 and previous config saved to /var/cache/conftool/dbconfig/20220125-114258-marostegui.json
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P19128 and previous config saved to /var/cache/conftool/dbconfig/20220125-113616-marostegui.json
  • 11:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2021.codfw.wmnet with OS bullseye
  • 11:29 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1011.eqiad.wmnet
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P19127 and previous config saved to /var/cache/conftool/dbconfig/20220125-112753-marostegui.json
  • 11:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2027.codfw.wmnet with OS bullseye
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T299827)', diff saved to https://phabricator.wikimedia.org/P19126 and previous config saved to /var/cache/conftool/dbconfig/20220125-112111-marostegui.json
  • 11:19 moritzm: installing apache security updates
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P19125 and previous config saved to /var/cache/conftool/dbconfig/20220125-111249-marostegui.json
  • 11:07 godog: temp disable alerting on prometheus200[56] - T296199
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T285149)', diff saved to https://phabricator.wikimedia.org/P19124 and previous config saved to /var/cache/conftool/dbconfig/20220125-105744-marostegui.json
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T285149)', diff saved to https://phabricator.wikimedia.org/P19123 and previous config saved to /var/cache/conftool/dbconfig/20220125-105636-marostegui.json
  • 10:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T285149)', diff saved to https://phabricator.wikimedia.org/P19122 and previous config saved to /var/cache/conftool/dbconfig/20220125-105628-marostegui.json
  • 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2021.codfw.wmnet with OS bullseye
  • 10:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2027.codfw.wmnet with OS bullseye
  • 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2027.codfw.wmnet with reason: reimage for upgrade - T299911
  • 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2027.codfw.wmnet with reason: reimage for upgrade - T299911
  • 10:50 hnowlan: disabling puppet on all maps hosts to test cassandra removal
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2011.eqiad.wmnet
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es2020', diff saved to https://phabricator.wikimedia.org/P19121 and previous config saved to /var/cache/conftool/dbconfig/20220125-104331-marostegui.json
  • 10:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2020.codfw.wmnet with OS bullseye
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P19120 and previous config saved to /var/cache/conftool/dbconfig/20220125-104124-marostegui.json
  • 10:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2029.codfw.wmnet with OS bullseye
  • 10:36 hnowlan: nodetool removenode for restbase2011-c
  • 10:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 T299123', diff saved to https://phabricator.wikimedia.org/P19119 and previous config saved to /var/cache/conftool/dbconfig/20220125-102912-marostegui.json
  • 10:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P19118 and previous config saved to /var/cache/conftool/dbconfig/20220125-102619-marostegui.json
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T299827)', diff saved to https://phabricator.wikimedia.org/P19117 and previous config saved to /var/cache/conftool/dbconfig/20220125-102448-marostegui.json
  • 10:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 10:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T299827)', diff saved to https://phabricator.wikimedia.org/P19116 and previous config saved to /var/cache/conftool/dbconfig/20220125-102426-marostegui.json
  • 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
  • 10:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T285149)', diff saved to https://phabricator.wikimedia.org/P19115 and previous config saved to /var/cache/conftool/dbconfig/20220125-101114-marostegui.json
  • 10:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19114 and previous config saved to /var/cache/conftool/dbconfig/20220125-100921-marostegui.json
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T285149)', diff saved to https://phabricator.wikimedia.org/P19113 and previous config saved to /var/cache/conftool/dbconfig/20220125-100907-marostegui.json
  • 10:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 10:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T285149)', diff saved to https://phabricator.wikimedia.org/P19112 and previous config saved to /var/cache/conftool/dbconfig/20220125-100900-marostegui.json
  • 10:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:04 taavi@deploy1002: Synchronized wmf-config/extension-list: Config: Undeploy UserMerge (3) (T216089) (duration: 00m 48s)
  • 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2020.codfw.wmnet with OS bullseye
  • 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2029.codfw.wmnet with OS bullseye
  • 10:01 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Undeploy UserMerge (2) (T216089) (duration: 00m 49s)
  • 10:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: reimage for upgrade - T299911
  • 10:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2020', diff saved to https://phabricator.wikimedia.org/P19111 and previous config saved to /var/cache/conftool/dbconfig/20220125-100036-marostegui.json
  • 10:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: reimage for upgrade - T299911
  • 09:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:59 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Undeploy UserMerge (1) (T216089) (duration: 00m 49s)
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P19110 and previous config saved to /var/cache/conftool/dbconfig/20220125-095417-marostegui.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P19109 and previous config saved to /var/cache/conftool/dbconfig/20220125-095355-marostegui.json
  • 09:40 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:40 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:40 mmandere@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir6001.drmrs.wmnet
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T299827)', diff saved to https://phabricator.wikimedia.org/P19108 and previous config saved to /var/cache/conftool/dbconfig/20220125-093912-marostegui.json
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P19107 and previous config saved to /var/cache/conftool/dbconfig/20220125-093850-marostegui.json
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T299827)', diff saved to https://phabricator.wikimedia.org/P19106 and previous config saved to /var/cache/conftool/dbconfig/20220125-093806-marostegui.json
  • 09:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:23 mmandere@cumin1001: START - Cookbook sre.ganeti.makevm for new host ncredir6001.drmrs.wmnet
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T285149)', diff saved to https://phabricator.wikimedia.org/P19105 and previous config saved to /var/cache/conftool/dbconfig/20220125-092346-marostegui.json
  • 09:23 dcausse: restarting blazegraph on wdqs1004 (jvm stuck for 1h)
  • 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1013.eqiad.wmnet with OS buster
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19104 and previous config saved to /var/cache/conftool/dbconfig/20220125-085228-root.json
  • 08:45 moritzm: draining instances off ganeti1005 for reimage
  • 08:44 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1013.eqiad.wmnet with OS buster
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19103 and previous config saved to /var/cache/conftool/dbconfig/20220125-083724-root.json
  • 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:32 jayme: kubernetes staging migrated tainted worker node setup - T290967
  • 08:32 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster1001.eqiad.wmnet
  • 08:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:25 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Revert: Promote pc1013 to master in pc3 T299046 (duration: 00m 49s)
  • 08:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster1001.eqiad.wmnet
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T285149)', diff saved to https://phabricator.wikimedia.org/P19102 and previous config saved to /var/cache/conftool/dbconfig/20220125-082326-marostegui.json
  • 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T285149)', diff saved to https://phabricator.wikimedia.org/P19101 and previous config saved to /var/cache/conftool/dbconfig/20220125-082319-marostegui.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19100 and previous config saved to /var/cache/conftool/dbconfig/20220125-082220-root.json
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P19099 and previous config saved to /var/cache/conftool/dbconfig/20220125-080814-marostegui.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19098 and previous config saved to /var/cache/conftool/dbconfig/20220125-080717-root.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P19097 and previous config saved to /var/cache/conftool/dbconfig/20220125-075309-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19096 and previous config saved to /var/cache/conftool/dbconfig/20220125-075213-root.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T285149)', diff saved to https://phabricator.wikimedia.org/P19095 and previous config saved to /var/cache/conftool/dbconfig/20220125-073805-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19094 and previous config saved to /var/cache/conftool/dbconfig/20220125-073709-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T285149)', diff saved to https://phabricator.wikimedia.org/P19093 and previous config saved to /var/cache/conftool/dbconfig/20220125-073457-marostegui.json
  • 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T285149)', diff saved to https://phabricator.wikimedia.org/P19092 and previous config saved to /var/cache/conftool/dbconfig/20220125-073450-marostegui.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19091 and previous config saved to /var/cache/conftool/dbconfig/20220125-072206-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P19090 and previous config saved to /var/cache/conftool/dbconfig/20220125-071945-marostegui.json
  • 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1013.eqiad.wmnet with OS bullseye
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19089 and previous config saved to /var/cache/conftool/dbconfig/20220125-070702-root.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P19088 and previous config saved to /var/cache/conftool/dbconfig/20220125-070441-marostegui.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19087 and previous config saved to /var/cache/conftool/dbconfig/20220125-065158-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T285149)', diff saved to https://phabricator.wikimedia.org/P19086 and previous config saved to /var/cache/conftool/dbconfig/20220125-064936-marostegui.json
  • 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1013.eqiad.wmnet with OS bullseye
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T285149)', diff saved to https://phabricator.wikimedia.org/P19085 and previous config saved to /var/cache/conftool/dbconfig/20220125-064829-marostegui.json
  • 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T285149)', diff saved to https://phabricator.wikimedia.org/P19084 and previous config saved to /var/cache/conftool/dbconfig/20220125-064801-marostegui.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19083 and previous config saved to /var/cache/conftool/dbconfig/20220125-063655-root.json
  • 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1030.eqiad.wmnet with OS bullseye
  • 06:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 06:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 06:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P19082 and previous config saved to /var/cache/conftool/dbconfig/20220125-063256-marostegui.json
  • 06:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 06:26 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to master in pc3 T299046 (duration: 00m 49s)
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P19081 and previous config saved to /var/cache/conftool/dbconfig/20220125-061751-marostegui.json
  • 06:07 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1030.eqiad.wmnet with OS bullseye
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T285149)', diff saved to https://phabricator.wikimedia.org/P19080 and previous config saved to /var/cache/conftool/dbconfig/20220125-060247-marostegui.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1030 T299889', diff saved to https://phabricator.wikimedia.org/P19079 and previous config saved to /var/cache/conftool/dbconfig/20220125-060241-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T285149)', diff saved to https://phabricator.wikimedia.org/P19078 and previous config saved to /var/cache/conftool/dbconfig/20220125-060128-marostegui.json
  • 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:29 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Lower The Wikipedia Library editcount (duration: 00m 49s)
  • 00:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:23 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable wgMinervaEnableSiteNotice for bnwiki (T299529) (duration: 00m 49s)
  • 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:14 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: bgwiki: fix setup for Draft namespace (T299224) (duration: 00m 49s)
  • 00:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn

2022-01-24

  • 23:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 23:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 23:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 23:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 23:29 dancy@deploy1002: Synchronized multiversion/MWMultiVersion.php: Config: Revert "Choose wikiversions.php file relative to MWMultiVersion.php" (duration: 00m 49s)
  • 22:54 ryankemper: T280001 Removed downtime on `wcqs*`
  • 22:48 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS buster
  • 22:48 ryankemper: T280001 Moved `wcqs` service state into `production` by merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/756713; running puppet on authdns/alert hosts
  • 22:32 inflatador: T280001 T282117 Merged https://gerrit.wikimedia.org/r/c/operations/dns/+/755806 and ran `sudo -i authdns update` on `authdns1001.wikimedia.org`
  • 21:57 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS buster
  • 21:57 root@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudmetrics1003.eqiad.wmnet with OS bullseye
  • 21:18 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye
  • 21:18 btullis@deploy1002: Finished deploy [analytics/refinery@94ec386] (hadoop-test): (no justification provided) (duration: 00m 02s)
  • 21:18 btullis@deploy1002: Started deploy [analytics/refinery@94ec386] (hadoop-test): (no justification provided)
  • 20:56 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-test-coord1001.eqiad.wmnet with reason: Unmounting /srv to try to repair the filesystem
  • 20:56 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-test-coord1001.eqiad.wmnet with reason: Unmounting /srv to try to repair the filesystem
  • 20:05 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 20:05 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 20:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:57 dancy@deploy1002: Synchronized multiversion/MWMultiVersion.php: Config: MWMultiVersion.php: Reverse logic for wikiversions file selection (duration: 00m 49s)
  • 19:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:52 dancy@deploy1002: Synchronized multiversion/MWMultiVersion.php: Config: Choose wikiversions.php file relative to MWMultiVersion.php (duration: 00m 48s)
  • 19:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:47 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/VisualEditor/lib/ve/: a369e0a: Revert "Follow-up I0802440d9: Allow alien / s to be focused" (deployed via e09d79d; T298609; T299730) (duration: 00m 49s)
  • 19:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:38 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/VisualEditor/modules/ve-mw/ui/dialogs/: 531efd0: Fix showing caption and alt text fields in media and gallery dialogs (T299818) (duration: 00m 48s)
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:36 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 27c5ab3: Enable migration mode on euwiki (T299927) (duration: 00m 48s)
  • 19:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.18/skins/Vector/: 4f430a8: Respect useskin when operating in MigrationMode (T299171; 2/2) (duration: 00m 48s)
  • 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:34 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.18/skins/Vector/includes/Constants.php: 4f430a8: Respect useskin when operating in MigrationMode (T299171; 1/2) (duration: 00m 48s)
  • 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bf2981b: commonswiki: Add ala-images.s3.ap-southeast-2.amazonaws.com to the wgCopyUploadsDomains allowlist (T299825) (duration: 00m 49s)
  • 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2029c35: Disable RelatedArticles on ptwikinews (T299873) (duration: 00m 49s)
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:09 taavi: deleted centralauth.global_user_groups for 10 non-existent users T299650
  • 19:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: df86dd4: Create Draft namespace for bgwiki (T299224) (duration: 00m 49s)
  • 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:00 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:22 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-test-coord1001.eqiad.wmnet with reason: Unmounting /srv to try to repair the filesystem
  • 18:22 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on an-test-coord1001.eqiad.wmnet with reason: Unmounting /srv to try to repair the filesystem
  • 17:50 cmjohnson1: updating firmware on ganeti1013 T299527
  • 17:24 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync on staging
  • 17:24 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync on production
  • 17:24 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync on staging
  • 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T285149)', diff saved to https://phabricator.wikimedia.org/P19074 and previous config saved to /var/cache/conftool/dbconfig/20220124-170312-marostegui.json
  • 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P19072 and previous config saved to /var/cache/conftool/dbconfig/20220124-164807-marostegui.json
  • 16:48 hnowlan: Running nodetool removenode for restbase2011-a
  • 16:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:43 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 16:42 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 50s)
  • 16:35 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync on staging
  • 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P19071 and previous config saved to /var/cache/conftool/dbconfig/20220124-163302-marostegui.json
  • 16:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on restbase2011.codfw.wmnet with reason: bad disk
  • 16:28 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on restbase2011.codfw.wmnet with reason: bad disk
  • 16:25 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync on production
  • 16:25 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync on staging
  • 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T285149)', diff saved to https://phabricator.wikimedia.org/P19070 and previous config saved to /var/cache/conftool/dbconfig/20220124-161757-marostegui.json
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T285149)', diff saved to https://phabricator.wikimedia.org/P19069 and previous config saved to /var/cache/conftool/dbconfig/20220124-161549-marostegui.json
  • 16:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 16:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T285149)', diff saved to https://phabricator.wikimedia.org/P19068 and previous config saved to /var/cache/conftool/dbconfig/20220124-161540-marostegui.json
  • 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P19067 and previous config saved to /var/cache/conftool/dbconfig/20220124-160035-marostegui.json
  • 15:49 jbond: enable abuse_network blocking globally gerrit:756611
  • 15:48 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/AbuseFilter/includes/ServiceWiring.php: Backport: Use MainStash instead of db-replicated (T272512) (duration: 00m 49s)
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P19066 and previous config saved to /var/cache/conftool/dbconfig/20220124-154531-marostegui.json
  • 15:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 15:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 15:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T285149)', diff saved to https://phabricator.wikimedia.org/P19065 and previous config saved to /var/cache/conftool/dbconfig/20220124-153026-marostegui.json
  • 15:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T285149)', diff saved to https://phabricator.wikimedia.org/P19064 and previous config saved to /var/cache/conftool/dbconfig/20220124-152820-marostegui.json
  • 15:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 15:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 15:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 15:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 15:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T285149)', diff saved to https://phabricator.wikimedia.org/P19063 and previous config saved to /var/cache/conftool/dbconfig/20220124-152748-marostegui.json
  • 15:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 15:25 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 15:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:17 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Update wikitech etcd readonly exemption (duration: 00m 49s)
  • 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P19062 and previous config saved to /var/cache/conftool/dbconfig/20220124-151243-marostegui.json
  • 15:05 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 15:04 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P19061 and previous config saved to /var/cache/conftool/dbconfig/20220124-145738-marostegui.json
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19060 and previous config saved to /var/cache/conftool/dbconfig/20220124-145712-root.json
  • 14:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1030.eqiad.wmnet
  • 14:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1030.eqiad.wmnet with OS buster
  • 14:44 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T285149)', diff saved to https://phabricator.wikimedia.org/P19059 and previous config saved to /var/cache/conftool/dbconfig/20220124-144234-marostegui.json
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19058 and previous config saved to /var/cache/conftool/dbconfig/20220124-144208-root.json
  • 14:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2034.codfw.wmnet with OS bullseye
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19057 and previous config saved to /var/cache/conftool/dbconfig/20220124-142705-root.json
  • 14:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19056 and previous config saved to /var/cache/conftool/dbconfig/20220124-141201-root.json
  • 14:01 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS buster
  • 14:00 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1029.eqiad.wmnet
  • 14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2034.codfw.wmnet with OS bullseye
  • 14:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1029.eqiad.wmnet with OS buster
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19055 and previous config saved to /var/cache/conftool/dbconfig/20220124-135658-root.json
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T285149)', diff saved to https://phabricator.wikimedia.org/P19054 and previous config saved to /var/cache/conftool/dbconfig/20220124-135216-marostegui.json
  • 13:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 13:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T285149)', diff saved to https://phabricator.wikimedia.org/P19053 and previous config saved to /var/cache/conftool/dbconfig/20220124-135208-marostegui.json
  • 13:50 moritzm: installing util-linux security updates on bullseye
  • 13:42 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2034.codfw.wmnet with OS bullseye
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19052 and previous config saved to /var/cache/conftool/dbconfig/20220124-134154-root.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P19051 and previous config saved to /var/cache/conftool/dbconfig/20220124-133704-marostegui.json
  • 13:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1028.eqiad.wmnet
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19050 and previous config saved to /var/cache/conftool/dbconfig/20220124-132651-root.json
  • 13:26 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS buster
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P19049 and previous config saved to /var/cache/conftool/dbconfig/20220124-132159-marostegui.json
  • 13:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1028.eqiad.wmnet with OS buster
  • 13:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es2034.codfw.wmnet with OS bullseye
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19048 and previous config saved to /var/cache/conftool/dbconfig/20220124-131147-root.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T285149)', diff saved to https://phabricator.wikimedia.org/P19047 and previous config saved to /var/cache/conftool/dbconfig/20220124-130654-marostegui.json
  • 13:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: reimage for upgrade - T299911
  • 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: reimage for upgrade - T299911
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19046 and previous config saved to /var/cache/conftool/dbconfig/20220124-125643-root.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19045 and previous config saved to /var/cache/conftool/dbconfig/20220124-124140-root.json
  • 12:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1033.eqiad.wmnet with OS bullseye
  • 12:40 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1028.eqiad.wmnet with OS buster
  • 12:39 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1027.eqiad.wmnet with OS buster
  • 12:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1027.eqiad.wmnet
  • 12:39 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 12:39 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 12:39 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 12:38 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 12:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19044 and previous config saved to /var/cache/conftool/dbconfig/20220124-123609-root.json
  • 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:25 urbanecm: UTC morning B&C done
  • 12:25 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1027.eqiad.wmnet with OS buster
  • 12:24 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1026.eqiad.wmnet
  • 12:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 296fe16: Add mwcli.command_execute to wgEventStreams (T293583) (duration: 00m 48s)
  • 12:22 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1026.eqiad.wmnet with OS buster
  • 12:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: db340cc: 5424d69: Update wgCopyUploadsDomains allowlist (T299579, T299881) (duration: 00m 48s)
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19043 and previous config saved to /var/cache/conftool/dbconfig/20220124-122106-root.json
  • 12:21 moritzm: installing ICU security updates on stretch
  • 12:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2c7b45a: fawiki: Exempt draft namespace from robots control by users (T299850) (duration: 05m 39s)
  • 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:14 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1026.eqiad.wmnet with OS buster
  • 12:13 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1033.eqiad.wmnet with OS bullseye
  • 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:10 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase102[45].eqiad.wmnet
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1033 T299889', diff saved to https://phabricator.wikimedia.org/P19042 and previous config saved to /var/cache/conftool/dbconfig/20220124-121029-marostegui.json
  • 12:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 97d047b: Remove kea, nod, and sms from wmfGetVariantSettings (T299304; T296286; T298075; T298182) (duration: 00m 49s)
  • 12:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:08 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1025.eqiad.wmnet with OS buster
  • 12:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2026.codfw.wmnet with OS bullseye
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1014.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 12:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1014.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T285149)', diff saved to https://phabricator.wikimedia.org/P19041 and previous config saved to /var/cache/conftool/dbconfig/20220124-120635-marostegui.json
  • 12:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T285149)', diff saved to https://phabricator.wikimedia.org/P19040 and previous config saved to /var/cache/conftool/dbconfig/20220124-120627-marostegui.json
  • 12:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19039 and previous config saved to /var/cache/conftool/dbconfig/20220124-120602-root.json
  • 12:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1a46361: fawiki: Remove move-rootuserpages flag from users (T299847) (duration: 00m 49s)
  • 12:00 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1025.eqiad.wmnet with OS buster
  • 11:59 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1024.eqiad.wmnet with OS buster
  • 11:58 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 11:56 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions from s8 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P19038 and previous config saved to /var/cache/conftool/dbconfig/20220124-115334-marostegui.json
  • 11:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Remove special groups from s8 codfw T263127', diff saved to https://phabricator.wikimedia.org/P19037 and previous config saved to /var/cache/conftool/dbconfig/20220124-115236-marostegui.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P19036 and previous config saved to /var/cache/conftool/dbconfig/20220124-115123-marostegui.json
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19035 and previous config saved to /var/cache/conftool/dbconfig/20220124-115059-root.json
  • 11:50 vgutierrez: pool cp1088 using envoy as TLS termination layer - T271421
  • 11:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1024.eqiad.wmnet with OS buster
  • 11:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1088.eqiad.wmnet with OS buster
  • 11:48 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:37 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2026.codfw.wmnet with OS bullseye
  • 11:37 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P19034 and previous config saved to /var/cache/conftool/dbconfig/20220124-113618-marostegui.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19033 and previous config saved to /var/cache/conftool/dbconfig/20220124-113555-root.json
  • 11:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1024.eqiad.wmnet with OS buster
  • 11:31 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1024.eqiad.wmnet with OS buster
  • 11:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2031.codfw.wmnet with OS bullseye
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T285149)', diff saved to https://phabricator.wikimedia.org/P19032 and previous config saved to /var/cache/conftool/dbconfig/20220124-112113-marostegui.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19031 and previous config saved to /var/cache/conftool/dbconfig/20220124-112051-root.json
  • 11:19 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1024.eqiad.wmnet with OS buster
  • 11:19 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1023.eqiad.wmnet
  • 11:17 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1023.eqiad.wmnet with OS buster
  • 11:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2033.codfw.wmnet with OS bullseye
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19030 and previous config saved to /var/cache/conftool/dbconfig/20220124-110548-root.json
  • 11:04 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1023.eqiad.wmnet with OS buster
  • 11:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1088.eqiad.wmnet with OS buster
  • 11:02 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1022.eqiad.wmnet
  • 10:59 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1022.eqiad.wmnet with OS buster
  • 10:58 vgutierrez: depool cp1088 to be reimaged as cache::upload_envoy - T271421
  • 10:56 moritzm: installing modsecurity-apache security updates
  • 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2031.codfw.wmnet with OS bullseye
  • 10:51 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2031.codfw.wmnet with OS bullseye
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19028 and previous config saved to /var/cache/conftool/dbconfig/20220124-105044-root.json
  • 10:49 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1022.eqiad.wmnet with OS buster
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T285149)', diff saved to https://phabricator.wikimedia.org/P19027 and previous config saved to /var/cache/conftool/dbconfig/20220124-103958-marostegui.json
  • 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T285149)', diff saved to https://phabricator.wikimedia.org/P19026 and previous config saved to /var/cache/conftool/dbconfig/20220124-103945-marostegui.json
  • 10:35 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2033.codfw.wmnet with OS bullseye
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19025 and previous config saved to /var/cache/conftool/dbconfig/20220124-103540-root.json
  • 10:34 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2031.codfw.wmnet with OS bullseye
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P19024 and previous config saved to /var/cache/conftool/dbconfig/20220124-102440-marostegui.json
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19023 and previous config saved to /var/cache/conftool/dbconfig/20220124-102037-root.json
  • 10:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1027.eqiad.wmnet with OS bullseye
  • 10:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2040.codfw.wmnet with OS buster
  • 10:15 vgutierrez: pool cp2040 using envoy as TLS termination layer - T271421
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P19022 and previous config saved to /var/cache/conftool/dbconfig/20220124-100935-marostegui.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19021 and previous config saved to /var/cache/conftool/dbconfig/20220124-095605-root.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T285149)', diff saved to https://phabricator.wikimedia.org/P19020 and previous config saved to /var/cache/conftool/dbconfig/20220124-095430-marostegui.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T285149)', diff saved to https://phabricator.wikimedia.org/P19019 and previous config saved to /var/cache/conftool/dbconfig/20220124-095324-marostegui.json
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T285149)', diff saved to https://phabricator.wikimedia.org/P19018 and previous config saved to /var/cache/conftool/dbconfig/20220124-095317-marostegui.json
  • 09:50 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1027.eqiad.wmnet with OS bullseye
  • 09:46 kormat: Deploying wmfmariadbpy 0.8.1 T299753
  • 09:46 kormat: uploaded wmfmariadbpy 0.8.1 to apt.wm.o T299753
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1027 T299741', diff saved to https://phabricator.wikimedia.org/P19017 and previous config saved to /var/cache/conftool/dbconfig/20220124-094504-marostegui.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1029 as es1 master T299741', diff saved to https://phabricator.wikimedia.org/P19016 and previous config saved to /var/cache/conftool/dbconfig/20220124-094300-marostegui.json
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19015 and previous config saved to /var/cache/conftool/dbconfig/20220124-094102-root.json
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P19014 and previous config saved to /var/cache/conftool/dbconfig/20220124-093812-marostegui.json
  • 09:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2040.codfw.wmnet with OS buster
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19013 and previous config saved to /var/cache/conftool/dbconfig/20220124-093608-root.json
  • 09:30 vgutierrez: depool cp2040 to be reimaged as cache::upload_envoy - T271421
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19012 and previous config saved to /var/cache/conftool/dbconfig/20220124-092558-root.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P19011 and previous config saved to /var/cache/conftool/dbconfig/20220124-092307-marostegui.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19010 and previous config saved to /var/cache/conftool/dbconfig/20220124-092105-root.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19009 and previous config saved to /var/cache/conftool/dbconfig/20220124-091054-root.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T285149)', diff saved to https://phabricator.wikimedia.org/P19008 and previous config saved to /var/cache/conftool/dbconfig/20220124-090803-marostegui.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T285149)', diff saved to https://phabricator.wikimedia.org/P19007 and previous config saved to /var/cache/conftool/dbconfig/20220124-090657-marostegui.json
  • 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 09:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T285149)', diff saved to https://phabricator.wikimedia.org/P19006 and previous config saved to /var/cache/conftool/dbconfig/20220124-090649-marostegui.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19005 and previous config saved to /var/cache/conftool/dbconfig/20220124-090601-root.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19004 and previous config saved to /var/cache/conftool/dbconfig/20220124-085551-root.json
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1026.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P19003 and previous config saved to /var/cache/conftool/dbconfig/20220124-085144-marostegui.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19002 and previous config saved to /var/cache/conftool/dbconfig/20220124-085057-root.json
  • 08:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1026.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P19001 and previous config saved to /var/cache/conftool/dbconfig/20220124-084047-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P19000 and previous config saved to /var/cache/conftool/dbconfig/20220124-083640-marostegui.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18999 and previous config saved to /var/cache/conftool/dbconfig/20220124-083554-root.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18998 and previous config saved to /var/cache/conftool/dbconfig/20220124-082543-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T285149)', diff saved to https://phabricator.wikimedia.org/P18997 and previous config saved to /var/cache/conftool/dbconfig/20220124-082135-marostegui.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18996 and previous config saved to /var/cache/conftool/dbconfig/20220124-082050-root.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T285149)', diff saved to https://phabricator.wikimedia.org/P18995 and previous config saved to /var/cache/conftool/dbconfig/20220124-082029-marostegui.json
  • 08:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 08:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T285149)', diff saved to https://phabricator.wikimedia.org/P18994 and previous config saved to /var/cache/conftool/dbconfig/20220124-082022-marostegui.json
  • 08:15 moritzm: draining instances off ganeti1014 for reimage
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18993 and previous config saved to /var/cache/conftool/dbconfig/20220124-081040-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18992 and previous config saved to /var/cache/conftool/dbconfig/20220124-080546-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P18991 and previous config saved to /var/cache/conftool/dbconfig/20220124-080517-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18990 and previous config saved to /var/cache/conftool/dbconfig/20220124-075536-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18989 and previous config saved to /var/cache/conftool/dbconfig/20220124-075043-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P18988 and previous config saved to /var/cache/conftool/dbconfig/20220124-075012-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18987 and previous config saved to /var/cache/conftool/dbconfig/20220124-073539-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T285149)', diff saved to https://phabricator.wikimedia.org/P18986 and previous config saved to /var/cache/conftool/dbconfig/20220124-073507-marostegui.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18985 and previous config saved to /var/cache/conftool/dbconfig/20220124-072035-root.json
  • 06:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1022.eqiad.wmnet with OS bullseye
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T285149)', diff saved to https://phabricator.wikimedia.org/P18984 and previous config saved to /var/cache/conftool/dbconfig/20220124-063448-marostegui.json
  • 06:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T285149)', diff saved to https://phabricator.wikimedia.org/P18983 and previous config saved to /var/cache/conftool/dbconfig/20220124-063440-marostegui.json
  • 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1029.eqiad.wmnet with OS bullseye
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P18982 and previous config saved to /var/cache/conftool/dbconfig/20220124-061936-marostegui.json
  • 06:05 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P18981 and previous config saved to /var/cache/conftool/dbconfig/20220124-060431-marostegui.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 T299123', diff saved to https://phabricator.wikimedia.org/P18980 and previous config saved to /var/cache/conftool/dbconfig/20220124-060248-marostegui.json
  • 05:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1029.eqiad.wmnet with OS bullseye
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T285149)', diff saved to https://phabricator.wikimedia.org/P18979 and previous config saved to /var/cache/conftool/dbconfig/20220124-054926-marostegui.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029 for reimage T299741', diff saved to https://phabricator.wikimedia.org/P18978 and previous config saved to /var/cache/conftool/dbconfig/20220124-054349-marostegui.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T285149)', diff saved to https://phabricator.wikimedia.org/P18977 and previous config saved to /var/cache/conftool/dbconfig/20220124-054218-marostegui.json
  • 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance

2022-01-23

  • 22:02 ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@37937f6]: (no justification provided) (duration: 00m 08s)
  • 22:02 ebysans@deploy1002: Started deploy [airflow-dags/analytics-test@37937f6]: (no justification provided)
  • 21:27 ebysans@deploy1002: Finished deploy [airflow-dags/analytics-test@fa62e75]: (no justification provided) (duration: 00m 09s)
  • 21:26 ebysans@deploy1002: Started deploy [airflow-dags/analytics-test@fa62e75]: (no justification provided)

2022-01-22

  • 22:38 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 22:38 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 14:51 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 14:51 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 08:35 elukey: `apt-get clean` on an-test-coord1001 to free some space
  • 08:25 elukey: remove the `--debug=true` etcd daemon arg from ml-etcd2002 (only node having it, probably a manual test done in the past) and cleaned up spammy etcd logs to free space
  • 01:30 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 01:30 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 00:27 dzahn@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=miscweb

2022-01-21

  • 22:23 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 22:23 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:38 brennen@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/VisualEditor/modules/ve-mw: Backport: Revert "Re-duplicate deduplicated TemplateStyles" (T287675 T299251 T299767) (duration: 00m 49s)
  • 21:21 topranks: Running homer against cr1-eqiad and cr2-eqiad to remove entries on analytics-in4/6 filters that refer to decommissioned deb mirror host sodium.
  • 19:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:10 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 19:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:01 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 18:46 herron: restarting pybal on lvs1015,lvs1020,lvs2009,lvs2010 to remove legacy elk5 services T299700
  • 18:39 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:36 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:15 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 17:42 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/python3-imagecatalog/imagecatalog_0.0.4-1_amd64.changes
  • 16:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1021.eqiad.wmnet
  • 16:55 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1021.eqiad.wmnet with OS buster
  • 16:47 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1021.eqiad.wmnet with OS buster
  • 16:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1020.eqiad.wmnet
  • 16:46 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1020.eqiad.wmnet with OS buster
  • 16:26 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sodium.wikimedia.org
  • 16:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1020.eqiad.wmnet with OS buster
  • 16:18 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 16:18 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 16:05 jhathaway@cumin1001: START - Cookbook sre.hosts.decommission for hosts sodium.wikimedia.org
  • 16:04 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1019.eqiad.wmnet
  • 16:03 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1019.eqiad.wmnet with OS buster
  • 16:02 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2000 days, 0:00:00 on sodium.wikimedia.org with reason: decom
  • 16:02 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2000 days, 0:00:00 on sodium.wikimedia.org with reason: decom
  • 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1013.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 15:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1013.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1018.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 15:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1018.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 15:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1025.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 15:50 moritzm: added ganeti1025 to Ganeti eqiad cluster T293909
  • 15:29 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 15:29 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on mx1001.wikimedia.org with reason: kernel testing
  • 15:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2026.codfw.wmnet with OS buster
  • 15:24 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1019.eqiad.wmnet with OS buster
  • 15:24 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1018.eqiad.wmnet
  • 15:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1018.eqiad.wmnet with OS buster
  • 15:07 herron: removing kibana.discovery.wmnet record and switching legacy elk LVS instances to state: lvs_setup T299700
  • 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 14:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:40 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:35 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2026.codfw.wmnet with OS buster
  • 14:35 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 07s)
  • 14:35 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1018.eqiad.wmnet with OS buster
  • 14:35 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 13:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2025.codfw.wmnet with OS buster
  • 13:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1017.eqiad.wmnet with OS buster
  • 13:07 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1017.eqiad.wmnet
  • 13:05 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2025.codfw.wmnet
  • 13:01 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 13:01 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1016.eqiad.wmnet
  • 12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2024.codfw.wmnet
  • 12:25 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1017.eqiad.wmnet with OS buster
  • 12:25 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2025.codfw.wmnet with OS buster
  • 12:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2024.codfw.wmnet with OS buster
  • 12:11 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1016.eqiad.wmnet with OS buster
  • 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1025.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
  • 11:38 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 11:38 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 11:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:31 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1016.eqiad.wmnet with OS buster
  • 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2024.codfw.wmnet with OS buster
  • 11:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2023.codfw.wmnet
  • 11:15 vgutierrez: pool cp3063 running envoy as TLS termination layer - T271421
  • 11:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2023.codfw.wmnet with OS buster
  • 10:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3063.esams.wmnet with OS buster
  • 10:33 moritzm: migrate primary/secondary instances off ganeti1013
  • 10:14 moritzm: switch kubetcd1006 back to plain disks
  • 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1006.eqiad.wmnet with reason: Switch back to plain disks
  • 10:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1006.eqiad.wmnet with reason: Switch back to plain disks
  • 10:09 moritzm: switch kubetcd1005 back to plain disks
  • 10:08 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2023.codfw.wmnet with OS buster
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1005.eqiad.wmnet with reason: Switch back to plain disks
  • 10:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1005.eqiad.wmnet with reason: Switch back to plain disks
  • 09:51 moritzm: switch kubetcd1004 back to plain disks
  • 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1004.eqiad.wmnet with reason: Switch back to plain disks
  • 09:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1004.eqiad.wmnet with reason: Switch back to plain disks
  • 09:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3063.esams.wmnet with OS buster
  • 09:40 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3063.esams.wmnet with OS buster
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18970 and previous config saved to /var/cache/conftool/dbconfig/20220121-093120-root.json
  • 09:19 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18969 and previous config saved to /var/cache/conftool/dbconfig/20220121-091617-root.json
  • 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:07 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 09:06 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3063.esams.wmnet with OS buster
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18968 and previous config saved to /var/cache/conftool/dbconfig/20220121-090113-root.json
  • 09:00 vgutierrez: depool cp3063 to be reimaged as cache::upload_envoy - T271421
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18967 and previous config saved to /var/cache/conftool/dbconfig/20220121-084609-root.json
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1018.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 08:35 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1018.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18966 and previous config saved to /var/cache/conftool/dbconfig/20220121-083106-root.json
  • 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18965 and previous config saved to /var/cache/conftool/dbconfig/20220121-081602-root.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18964 and previous config saved to /var/cache/conftool/dbconfig/20220121-080058-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18963 and previous config saved to /var/cache/conftool/dbconfig/20220121-075801-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18962 and previous config saved to /var/cache/conftool/dbconfig/20220121-074555-root.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18961 and previous config saved to /var/cache/conftool/dbconfig/20220121-074257-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18960 and previous config saved to /var/cache/conftool/dbconfig/20220121-073051-root.json
  • 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1032.eqiad.wmnet with OS bullseye
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 60%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18959 and previous config saved to /var/cache/conftool/dbconfig/20220121-072754-root.json
  • 07:26 elukey: elukey@stat1007:~$ sudo systemctl reset-failed product-analytics-movement-metrics.service
  • 07:21 elukey: elukey@build2001:~$ sudo systemctl reset-failed ifup@ens13.service
  • 07:19 elukey: systemctl reset-failed session-3.scope on an-test-client1001 (failed, transient unit)
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18958 and previous config saved to /var/cache/conftool/dbconfig/20220121-071250-root.json
  • 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1032.eqiad.wmnet with OS bullseye
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 for reimage T299741', diff saved to https://phabricator.wikimedia.org/P18957 and previous config saved to /var/cache/conftool/dbconfig/20220121-065854-marostegui.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 40%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18956 and previous config saved to /var/cache/conftool/dbconfig/20220121-065746-root.json
  • 06:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2028.codfw.wmnet with OS bullseye
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18955 and previous config saved to /var/cache/conftool/dbconfig/20220121-064243-root.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 20%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18954 and previous config saved to /var/cache/conftool/dbconfig/20220121-062739-root.json
  • 06:24 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2028.codfw.wmnet with OS bullseye
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2032 to es1 master T299741', diff saved to https://phabricator.wikimedia.org/P18953 and previous config saved to /var/cache/conftool/dbconfig/20220121-062116-marostegui.json
  • 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2030.codfw.wmnet with OS bullseye
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18952 and previous config saved to /var/cache/conftool/dbconfig/20220121-061235-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18951 and previous config saved to /var/cache/conftool/dbconfig/20220121-055732-root.json
  • 05:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2030.codfw.wmnet with OS bullseye
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: repooling after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P18950 and previous config saved to /var/cache/conftool/dbconfig/20220121-054228-root.json

2022-01-20

  • 22:40 inflatador: running puppet-merge for https://gerrit.wikimedia.org/r/755810
  • 22:27 urandom: rolling restart of Cassandra, aqs-next -- T298516
  • 21:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1008.eqiad.wmnet with OS buster
  • 20:58 jhathaway: rebotting mx1001 to test new kernel
  • 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:37 urandom: upgrading Cassandra to 3.11.11, aqs1010 -- T298516
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:36 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.18 refs T293959
  • 20:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:31 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/DiscussionTools/includes/HeadingItem.php: Backport: Prevent assertion failure caused by empty headings (T299583) (duration: 00m 50s)
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:38 bd808@deploy1002: Synchronized wmf-config/wikitech.php: wikitech: Remove password clear on block (duration: 00m 50s)
  • 19:19 jhathaway: rebooting mx1001 to test new kernel
  • 19:17 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync on main
  • 19:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:14 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply on main
  • 19:13 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: sync on main
  • 19:11 cjming: end of UTC evening backport & config window
  • 19:10 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply on main
  • 19:10 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main
  • 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:08 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main
  • 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:07 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable language alert for pilot wikis except thwiki, viwiki. (T295555) (duration: 00m 51s)
  • 19:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:29 taavi@deploy1002: Synchronized php-1.38.0-wmf.18/skins/Vector/includes/Hooks.php: Backport: Do not try to make watchlist collapsible on wikis where watchlist is disabled (T299671) (duration: 00m 50s)
  • 18:27 ppchelko@deploy1002: Synchronized w/tmp_settings_bench.php: Config: gerrit 755741 enhancements for the settings benchmark entrypoint (duration: 00m 51s)
  • 18:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2022.codfw.wmnet
  • 18:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2022.codfw.wmnet with OS buster
  • 18:17 mutante: running puppet on cp403*
  • 17:45 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2022.codfw.wmnet with OS buster
  • 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2021.codfw.wmnet
  • 17:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2021.codfw.wmnet with OS buster
  • 17:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1008.eqiad.wmnet with OS buster
  • 17:18 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.18/includes/: Backport: Revert "Make Block objects aware of which wiki they belong to" (duration: 00m 55s)
  • 17:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster
  • 17:15 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1008.eqiad.wmnet with OS buster
  • 17:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS buster
  • 17:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2021.codfw.wmnet with OS buster
  • 17:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:04 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=inference
  • 17:03 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2020.codfw.wmnet with OS buster
  • 17:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:55 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2020.codfw.wmnet with OS buster
  • 16:55 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2019.codfw.wmnet with OS buster
  • 16:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:50 ppchelko@deploy1002: Synchronized w/tmp_settings_bench.php: Config: gerrit 755399 add temporary entrypoint for settings benchmark (duration: 00m 50s)
  • 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2019.codfw.wmnet with OS buster
  • 16:48 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2019.codfw.wmnet with OS buster
  • 16:40 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2019.codfw.wmnet with OS buster
  • 16:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2018.codfw.wmnet
  • 16:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2018.codfw.wmnet with OS buster
  • 15:57 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2018.codfw.wmnet with OS buster
  • 15:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 15:46 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:43 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
  • 15:31 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
  • 15:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
  • 15:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
  • 15:20 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync on main
  • 15:16 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply on main
  • 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
  • 15:14 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: sync on main
  • 15:13 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
  • 15:12 moritzm: enabled hardware virtualisation in BIOS for ganeti1028 T293909
  • 15:11 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply on main
  • 15:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
  • 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
  • 15:05 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
  • 15:05 moritzm: enabled hardware virtualisation in BIOS for ganeti1027 T293909
  • 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
  • 14:58 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
  • 14:57 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
  • 14:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
  • 14:56 moritzm: enabled hardware virtualisation in BIOS for ganeti1026 T293909
  • 14:55 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 11s)
  • 14:55 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
  • 14:33 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2017.codfw.wmnet with OS buster
  • 14:25 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS buster
  • 14:20 moritzm: enabled hardware virtualisation in BIOS for ganeti1023 T283036
  • 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 14:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
  • 14:03 moritzm: enabled hardware virtualisation in BIOS for ganeti1024 T283036
  • 13:55 marostegui: Power off es1022 for onsite maintenance T299123
  • 13:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
  • 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1024.eqiad.wmnet with reason: Change hw virt setting in BIOS
  • 13:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1024.eqiad.wmnet with reason: Change hw virt setting in BIOS
  • 13:51 moritzm: enabled hardware virtualisation in BIOS for ganeti1025 T293909
  • 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
  • 13:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1025.eqiad.wmnet with reason: Change KVM setting in BIOS
  • 13:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1025.eqiad.wmnet with reason: Change KVM setting in BIOS
  • 13:13 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/CentralNotice/includes/: Backport: Replace remaining usages of IDatabase::fetchObject()/::numRows() (T286694) (duration: 00m 50s)
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:03 Lucas_WMDE: UTC morning backport window done
  • 13:02 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/includes/deferred/LinksUpdate/LinksUpdate.php: Backport: Fix deprecation warning from LinksUpdate::getImages() (T299472) (duration: 00m 50s)
  • 13:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:01 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/maintenance/: Backport: Replace remaining usages of IDatabase::fetchObject() (T299471) (2/2) (duration: 00m 50s)
  • 13:00 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/includes/: Backport: Replace remaining usages of IDatabase::fetchObject() (T299471) (1/2) (duration: 00m 56s)
  • 12:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable usage tracking for statements in Waray Wikipedia (T296383) (expecting some gradual increase of wbc_entity_usage rows on warwiki) (duration: 00m 51s)
  • 12:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T285149)', diff saved to https://phabricator.wikimedia.org/P18943 and previous config saved to /var/cache/conftool/dbconfig/20220120-121520-marostegui.json
  • 12:10 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync on production
  • 12:10 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply on staging
  • 12:10 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply on production
  • 12:09 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: sync on production
  • 12:08 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply on staging
  • 12:08 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply on production
  • 12:07 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: sync on staging
  • 12:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
  • 12:06 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
  • 12:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on staging
  • 12:05 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
  • 12:05 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
  • 12:05 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on staging
  • 12:05 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
  • 12:05 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
  • 12:04 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on staging
  • 12:04 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
  • 12:04 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18942 and previous config saved to /var/cache/conftool/dbconfig/20220120-120015-marostegui.json
  • 11:49 moritzm: add ganeti1024 to Ganeti eqiad cluster T283036
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18941 and previous config saved to /var/cache/conftool/dbconfig/20220120-114510-marostegui.json
  • 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
  • 11:30 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T285149)', diff saved to https://phabricator.wikimedia.org/P18940 and previous config saved to /var/cache/conftool/dbconfig/20220120-113006-marostegui.json
  • 11:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T285149)', diff saved to https://phabricator.wikimedia.org/P18939 and previous config saved to /var/cache/conftool/dbconfig/20220120-112854-marostegui.json
  • 11:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T285149)', diff saved to https://phabricator.wikimedia.org/P18938 and previous config saved to /var/cache/conftool/dbconfig/20220120-112846-marostegui.json
  • 11:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:24 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync on production
  • 11:23 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply on staging
  • 11:23 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply on production
  • 11:22 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 11:22 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 11:21 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 03s)
  • 11:21 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 11:19 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: sync on production
  • 11:18 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 11:18 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 11:18 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply on staging
  • 11:18 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply on production
  • 11:16 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: sync on staging
  • 11:13 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply on production
  • 11:13 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply on staging
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18937 and previous config saved to /var/cache/conftool/dbconfig/20220120-111341-marostegui.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18936 and previous config saved to /var/cache/conftool/dbconfig/20220120-105837-marostegui.json
  • 10:52 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 10:52 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1018.eqiad.wmnet with OS buster
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T285149)', diff saved to https://phabricator.wikimedia.org/P18935 and previous config saved to /var/cache/conftool/dbconfig/20220120-104332-marostegui.json
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T285149)', diff saved to https://phabricator.wikimedia.org/P18934 and previous config saved to /var/cache/conftool/dbconfig/20220120-104220-marostegui.json
  • 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T285149)', diff saved to https://phabricator.wikimedia.org/P18933 and previous config saved to /var/cache/conftool/dbconfig/20220120-104206-marostegui.json
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18932 and previous config saved to /var/cache/conftool/dbconfig/20220120-102702-marostegui.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18931 and previous config saved to /var/cache/conftool/dbconfig/20220120-101157-marostegui.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T285149)', diff saved to https://phabricator.wikimedia.org/P18930 and previous config saved to /var/cache/conftool/dbconfig/20220120-095652-marostegui.json
  • 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1018.eqiad.wmnet with OS buster
  • 09:49 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1018.eqiad.wmnet with OS buster
  • 09:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1018.eqiad.wmnet with OS buster
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T285149)', diff saved to https://phabricator.wikimedia.org/P18929 and previous config saved to /var/cache/conftool/dbconfig/20220120-092232-marostegui.json
  • 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T285149)', diff saved to https://phabricator.wikimedia.org/P18928 and previous config saved to /var/cache/conftool/dbconfig/20220120-092225-marostegui.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18927 and previous config saved to /var/cache/conftool/dbconfig/20220120-091127-root.json
  • 09:09 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:08 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 09:07 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18926 and previous config saved to /var/cache/conftool/dbconfig/20220120-090720-marostegui.json
  • 09:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:00 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:00 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 09:00 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:00 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster2001.codfw.wmnet
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18925 and previous config saved to /var/cache/conftool/dbconfig/20220120-085623-root.json
  • 08:55 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
  • 08:52 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18924 and previous config saved to /var/cache/conftool/dbconfig/20220120-085215-marostegui.json
  • 08:52 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:51 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:51 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 08:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 08:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18923 and previous config saved to /var/cache/conftool/dbconfig/20220120-084120-root.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T285149)', diff saved to https://phabricator.wikimedia.org/P18922 and previous config saved to /var/cache/conftool/dbconfig/20220120-083711-marostegui.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T285149)', diff saved to https://phabricator.wikimedia.org/P18921 and previous config saved to /var/cache/conftool/dbconfig/20220120-083558-marostegui.json
  • 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T285149)', diff saved to https://phabricator.wikimedia.org/P18920 and previous config saved to /var/cache/conftool/dbconfig/20220120-083520-marostegui.json
  • 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18919 and previous config saved to /var/cache/conftool/dbconfig/20220120-082616-root.json
  • 08:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18918 and previous config saved to /var/cache/conftool/dbconfig/20220120-082015-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 for on-site maintenance T299123', diff saved to https://phabricator.wikimedia.org/P18917 and previous config saved to /var/cache/conftool/dbconfig/20220120-081809-marostegui.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18916 and previous config saved to /var/cache/conftool/dbconfig/20220120-081112-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18915 and previous config saved to /var/cache/conftool/dbconfig/20220120-080510-marostegui.json
  • 07:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1128.eqiad.wmnet with OS bullseye
  • 07:57 marostegui: Stop mysql on db1117 to clone db1128 T299344
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18913 and previous config saved to /var/cache/conftool/dbconfig/20220120-075609-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T285149)', diff saved to https://phabricator.wikimedia.org/P18912 and previous config saved to /var/cache/conftool/dbconfig/20220120-075005-marostegui.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T285149)', diff saved to https://phabricator.wikimedia.org/P18911 and previous config saved to /var/cache/conftool/dbconfig/20220120-074753-marostegui.json
  • 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T285149)', diff saved to https://phabricator.wikimedia.org/P18910 and previous config saved to /var/cache/conftool/dbconfig/20220120-074746-marostegui.json
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18909 and previous config saved to /var/cache/conftool/dbconfig/20220120-074105-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18908 and previous config saved to /var/cache/conftool/dbconfig/20220120-073241-marostegui.json
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1128.eqiad.wmnet with OS bullseye
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18907 and previous config saved to /var/cache/conftool/dbconfig/20220120-072558-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18906 and previous config saved to /var/cache/conftool/dbconfig/20220120-071736-marostegui.json
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18905 and previous config saved to /var/cache/conftool/dbconfig/20220120-071054-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T285149)', diff saved to https://phabricator.wikimedia.org/P18904 and previous config saved to /var/cache/conftool/dbconfig/20220120-070231-marostegui.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T285149)', diff saved to https://phabricator.wikimedia.org/P18903 and previous config saved to /var/cache/conftool/dbconfig/20220120-070119-marostegui.json
  • 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 07:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T285149)', diff saved to https://phabricator.wikimedia.org/P18902 and previous config saved to /var/cache/conftool/dbconfig/20220120-070052-marostegui.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18901 and previous config saved to /var/cache/conftool/dbconfig/20220120-065551-root.json
  • 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1180.eqiad.wmnet with OS bullseye
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18900 and previous config saved to /var/cache/conftool/dbconfig/20220120-064547-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18899 and previous config saved to /var/cache/conftool/dbconfig/20220120-063042-marostegui.json
  • 06:17 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1180.eqiad.wmnet with OS bullseye
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T285149)', diff saved to https://phabricator.wikimedia.org/P18898 and previous config saved to /var/cache/conftool/dbconfig/20220120-061538-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 T299479', diff saved to https://phabricator.wikimedia.org/P18897 and previous config saved to /var/cache/conftool/dbconfig/20220120-061529-marostegui.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T285149)', diff saved to https://phabricator.wikimedia.org/P18896 and previous config saved to /var/cache/conftool/dbconfig/20220120-061407-marostegui.json
  • 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance

2022-01-19

  • 23:36 mutante: deploy1002 - checked freshly generated cert in /etc/helmfile-defaults/private/main_services/miscweb/eqiad.yaml with 'openssl x509 -noout -text -in .. | grep DNS'. now has static-bz on it. (T281538)
  • 23:35 mutante: puppetmaster1001 - revoked puppet cert miscweb.discovery.wmnet; updated kube_services.crts.yaml to include static-bugzilla.wikimedia.org, removed miscweb.discovery.wmnet.crt and .csr.pem, used cergen to check and regenerate cert, committed in private repo, ran puppet on deploy1001 - checked cert in /etc/helmfile-defaults/private/main_services/miscweb/eqiad.yaml with 'openssl x509
  • 21:43 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 26s)
  • 21:42 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 20:52 Krinkle: depool mw1340 (api_appserver) for performance and php-apcu testing
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:09 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.18 refs T293959 (duration: 00m 49s)
  • 20:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.18 refs T293959
  • 20:04 jhathaway: rebooting mx1001 to debug conntrack
  • 19:52 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.18/tests/phpunit/structure/SettingsTest.php: ed5e634: First pass on creating config-schema.yaml (duration: 00m 49s)
  • 19:49 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.18/includes/: ed5e634: First pass on creating config-schema.yaml (duration: 01m 02s)
  • 19:47 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash1009.eqiad.wmnet
  • 19:47 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash1008.eqiad.wmnet
  • 19:47 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash1007.eqiad.wmnet
  • 19:45 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash2006.codfw.wmnet
  • 19:45 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash2005.codfw.wmnet
  • 19:45 herron@puppetmaster1001: conftool action : set/pooled=no; selector: name=logstash2004.codfw.wmnet
  • 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2016.codfw.wmnet
  • 19:31 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2016.codfw.wmnet with OS buster
  • 19:17 cjming@deploy1002: Synchronized wmf-config/config: Config: Update config for pilot wikis: (T298519) (duration: 00m 49s)
  • 19:13 cjming@deploy1002: Synchronized wmf-config/config: message (duration: 00m 50s)
  • 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:12 cjming@deploy1002: Synchronized wmf-config/config/foundationwiki.yaml: Config: Update config for pilot wikis: (T298519) (duration: 00m 49s)
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:11 cjming@deploy1002: Synchronized wmf-config/config/viwiki.yaml: Config: Update config for pilot wikis: (T298519) (duration: 00m 49s)
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:10 cjming@deploy1002: Synchronized wmf-config/config/ptwikinews.yaml: Config: Update config for pilot wikis: (T298519) (duration: 00m 50s)
  • 19:09 cjming@deploy1002: Synchronized dblists/desktop-improvements.dblist: Config: Update config for pilot wikis: (T298519) (duration: 01m 09s)
  • 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T239814)', diff saved to https://phabricator.wikimedia.org/P18893 and previous config saved to /var/cache/conftool/dbconfig/20220119-190137-ladsgroup.json
  • 18:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
  • 18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18892 and previous config saved to /var/cache/conftool/dbconfig/20220119-184632-ladsgroup.json
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18891 and previous config saved to /var/cache/conftool/dbconfig/20220119-183128-ladsgroup.json
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T239814)', diff saved to https://phabricator.wikimedia.org/P18890 and previous config saved to /var/cache/conftool/dbconfig/20220119-181623-ladsgroup.json
  • 18:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1110.eqiad.wmnet
  • 18:10 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2016.codfw.wmnet with OS buster
  • 18:09 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db1110.eqiad.wmnet
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T239814)', diff saved to https://phabricator.wikimedia.org/P18889 and previous config saved to /var/cache/conftool/dbconfig/20220119-180840-ladsgroup.json
  • 18:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 18:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 18:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T239814)', diff saved to https://phabricator.wikimedia.org/P18888 and previous config saved to /var/cache/conftool/dbconfig/20220119-180154-ladsgroup.json
  • 17:58 herron: beginning logstash apifeatureusage switchover T297239
  • 17:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:54 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
  • 17:52 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2016.codfw.wmnet with OS buster
  • 17:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:50 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [wikitech] Drop the cloudadmin user group, no longer used and empty (T237890) (duration: 00m 50s)
  • 17:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:47 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable UserMerge (T216089) (duration: 00m 54s)
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18887 and previous config saved to /var/cache/conftool/dbconfig/20220119-174650-ladsgroup.json
  • 17:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:42 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Drop CentralAuthUserMerge log channel (T216089) (duration: 01m 05s)
  • 17:36 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
  • 17:35 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2016.codfw.wmnet with OS buster
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18886 and previous config saved to /var/cache/conftool/dbconfig/20220119-173145-ladsgroup.json
  • 17:26 _joe_: powercycling contint1001 via ipmi, T299542
  • 17:25 cmjohnson1: updating firmware, ganeti1018 T299527
  • 17:19 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
  • 17:18 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2016.codfw.wmnet with OS buster
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T239814)', diff saved to https://phabricator.wikimedia.org/P18885 and previous config saved to /var/cache/conftool/dbconfig/20220119-171640-ladsgroup.json
  • 16:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:56 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster
  • 16:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2015.codfw.wmnet
  • 16:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2015.codfw.wmnet with OS buster
  • 16:48 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:47 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:46 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 16:46 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 16:46 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:44 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:36 hashar: marking contint1001.wikimedia.org as offline in Jenkins since it is dramatically overloaded T299542
  • 16:33 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T285149)', diff saved to https://phabricator.wikimedia.org/P18883 and previous config saved to /var/cache/conftool/dbconfig/20220119-162717-marostegui.json
  • 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18882 and previous config saved to /var/cache/conftool/dbconfig/20220119-161212-marostegui.json
  • 16:01 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2015.codfw.wmnet with OS buster
  • 16:00 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase201[134].codfw.wmnet
  • 15:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2014.codfw.wmnet with OS buster
  • 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18881 and previous config saved to /var/cache/conftool/dbconfig/20220119-155706-marostegui.json
  • 15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:48 moritzm: installing tiff security updates on stretch
  • 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T285149)', diff saved to https://phabricator.wikimedia.org/P18879 and previous config saved to /var/cache/conftool/dbconfig/20220119-154201-marostegui.json
  • 15:40 mmandere: cp5005,cp4025: upgrade varnish to 6.0.9 T298758
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T285149)', diff saved to https://phabricator.wikimedia.org/P18878 and previous config saved to /var/cache/conftool/dbconfig/20220119-154046-marostegui.json
  • 15:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T285149)', diff saved to https://phabricator.wikimedia.org/P18877 and previous config saved to /var/cache/conftool/dbconfig/20220119-154039-marostegui.json
  • 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18876 and previous config saved to /var/cache/conftool/dbconfig/20220119-152534-marostegui.json
  • 15:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 15:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 15:16 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2014.codfw.wmnet with OS buster
  • 15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18875 and previous config saved to /var/cache/conftool/dbconfig/20220119-151029-marostegui.json
  • 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2013.codfw.wmnet with OS buster
  • 15:07 jbond: updating lldp parent fact
  • 15:01 moritzm: migrate primary/secondary instances off ganeti1022
  • 15:00 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1018.eqiad.wmnet with OS buster
  • 14:57 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T285149)', diff saved to https://phabricator.wikimedia.org/P18873 and previous config saved to /var/cache/conftool/dbconfig/20220119-145525-marostegui.json
  • 14:55 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T285149)', diff saved to https://phabricator.wikimedia.org/P18872 and previous config saved to /var/cache/conftool/dbconfig/20220119-145410-marostegui.json
  • 14:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T285149)', diff saved to https://phabricator.wikimedia.org/P18871 and previous config saved to /var/cache/conftool/dbconfig/20220119-145402-marostegui.json
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P18870 and previous config saved to /var/cache/conftool/dbconfig/20220119-143858-marostegui.json
  • 14:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:35 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1018.eqiad.wmnet with OS buster
  • 14:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:33 jayme: disabled insecure API on all k8s masters - T290967
  • 14:33 mmandere: esams: upgrade varnish to 6.0.9 T298758
  • 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1018.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 14:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti1018.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2013.codfw.wmnet with OS buster
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P18869 and previous config saved to /var/cache/conftool/dbconfig/20220119-142353-marostegui.json
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T285149)', diff saved to https://phabricator.wikimedia.org/P18868 and previous config saved to /var/cache/conftool/dbconfig/20220119-140848-marostegui.json
  • 14:04 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db1100.eqiad.wmnet
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T285149)', diff saved to https://phabricator.wikimedia.org/P18867 and previous config saved to /var/cache/conftool/dbconfig/20220119-140433-marostegui.json
  • 14:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 14:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T285149)', diff saved to https://phabricator.wikimedia.org/P18866 and previous config saved to /var/cache/conftool/dbconfig/20220119-140419-marostegui.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P18865 and previous config saved to /var/cache/conftool/dbconfig/20220119-134915-marostegui.json
  • 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:36 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db1100.eqiad.wmnet
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T239814)', diff saved to https://phabricator.wikimedia.org/P18864 and previous config saved to /var/cache/conftool/dbconfig/20220119-133514-ladsgroup.json
  • 13:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P18863 and previous config saved to /var/cache/conftool/dbconfig/20220119-133410-marostegui.json
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:26 hashar: Restarting Gerrit
  • 13:24 hashar@deploy1002: Finished deploy [gerrit/gerrit@a340940]: Gerrit upgrade from 3.3.6 to 3.3.9 on gerrit1001 # T299451 (duration: 00m 08s)
  • 13:24 hashar@deploy1002: Started deploy [gerrit/gerrit@a340940]: Gerrit upgrade from 3.3.6 to 3.3.9 on gerrit1001 # T299451
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:22 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.16 (duration: 01m 32s)
  • 13:20 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.12 (duration: 01m 43s)
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:19 hashar: Cleaning all branch with `scap clean --delete 1.38.0-wmf.12` apparently missed in previous train # T293958 T293959
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T285149)', diff saved to https://phabricator.wikimedia.org/P18862 and previous config saved to /var/cache/conftool/dbconfig/20220119-131905-marostegui.json
  • 13:18 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.13 (duration: 03m 11s)
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T285149)', diff saved to https://phabricator.wikimedia.org/P18861 and previous config saved to /var/cache/conftool/dbconfig/20220119-131750-marostegui.json
  • 13:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 13:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T285149)', diff saved to https://phabricator.wikimedia.org/P18860 and previous config saved to /var/cache/conftool/dbconfig/20220119-131743-marostegui.json
  • 13:16 hashar: Cleaning all branch with `scap clean --delete 1.38.0-wmf.13` apparently missed in previous train # T293958 T293959
  • 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:13 Lucas_WMDE: UTC morning backport+config window done
  • 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:08 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport: Revert "Undo update to the way the search interface is set" (part 2) (duration: 29m 08s)
  • 13:05 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug1001:~$ sudo -u www-data rm /tmp/URL*.urlupload_ # save space
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P18859 and previous config saved to /var/cache/conftool/dbconfig/20220119-130238-marostegui.json
  • 13:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1128 from dbctl T299344', diff saved to https://phabricator.wikimedia.org/P18858 and previous config saved to /var/cache/conftool/dbconfig/20220119-125658-marostegui.json
  • 12:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1155.eqiad.wmnet with OS bullseye
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P18857 and previous config saved to /var/cache/conftool/dbconfig/20220119-124733-marostegui.json
  • 12:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:38 lucaswerkmeister-wmde@deploy1002: Started scap: Backport: Revert "Undo update to the way the search interface is set" (part 2)
  • 12:38 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/MediaSearch/extension.json: Backport: Revert "Undo update to the way the search interface is set" (part 1) (duration: 01m 34s)
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T285149)', diff saved to https://phabricator.wikimedia.org/P18856 and previous config saved to /var/cache/conftool/dbconfig/20220119-123229-marostegui.json
  • 12:31 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/Flow/modules/flow/ui/widgets/mw.flow.ui.TopicMenuSelectWidget.js: Backport: Fix TopicMenuSelectWidget after OOUI change (T299473) (duration: 01m 08s)
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T285149)', diff saved to https://phabricator.wikimedia.org/P18855 and previous config saved to /var/cache/conftool/dbconfig/20220119-123114-marostegui.json
  • 12:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T285149)', diff saved to https://phabricator.wikimedia.org/P18854 and previous config saved to /var/cache/conftool/dbconfig/20220119-123106-marostegui.json
  • 12:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase201[12].codfw.wmnet
  • 12:19 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1155.eqiad.wmnet with OS bullseye
  • 12:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2012.codfw.wmnet with OS buster
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P18853 and previous config saved to /var/cache/conftool/dbconfig/20220119-121602-marostegui.json
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P18852 and previous config saved to /var/cache/conftool/dbconfig/20220119-120057-marostegui.json
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18851 and previous config saved to /var/cache/conftool/dbconfig/20220119-114949-root.json
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18850 and previous config saved to /var/cache/conftool/dbconfig/20220119-114944-root.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T285149)', diff saved to https://phabricator.wikimedia.org/P18849 and previous config saved to /var/cache/conftool/dbconfig/20220119-114552-marostegui.json
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T285149)', diff saved to https://phabricator.wikimedia.org/P18848 and previous config saved to /var/cache/conftool/dbconfig/20220119-114237-marostegui.json
  • 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 11:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T285149)', diff saved to https://phabricator.wikimedia.org/P18847 and previous config saved to /var/cache/conftool/dbconfig/20220119-114154-marostegui.json
  • 11:38 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2012.codfw.wmnet with OS buster
  • 11:35 moritzm: rebalance ganeti group D in codfw after adding ganeti2026 T282603
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18846 and previous config saved to /var/cache/conftool/dbconfig/20220119-113445-root.json
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18845 and previous config saved to /var/cache/conftool/dbconfig/20220119-113440-root.json
  • 11:32 oblivian@deploy1002: Finished deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001 (duration: 18m 27s)
  • 11:28 godog: bounce superset on an-tool1005 - T299383
  • 11:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2011.codfw.wmnet with OS buster
  • 11:28 godog: bounce superset on an-tool1010 - T299383
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18844 and previous config saved to /var/cache/conftool/dbconfig/20220119-112649-marostegui.json
  • 11:26 godog: bounce navtiming on webperf1001 - T299383
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18843 and previous config saved to /var/cache/conftool/dbconfig/20220119-111942-root.json
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18842 and previous config saved to /var/cache/conftool/dbconfig/20220119-111937-root.json
  • 11:15 moritzm: add ganeti2026 to Ganeti codfw cluster T282603
  • 11:14 oblivian@deploy1002: Started deploy [docker-pkg/deploy@62a5e87]: redeploy of 3.0.2, including build2001
  • 11:12 oblivian@deploy1002: Finished deploy [docker-pkg/deploy@536f77a]: redeploy of 3.0.2, in preparation for deployment on build2001 (duration: 01m 00s)
  • 11:12 filippo@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: Revert "ProductionServices: use graphite2003 for statsd" (T299383) (duration: 02m 09s)
  • 11:11 oblivian@deploy1002: Started deploy [docker-pkg/deploy@536f77a]: redeploy of 3.0.2, in preparation for deployment on build2001
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18840 and previous config saved to /var/cache/conftool/dbconfig/20220119-111144-marostegui.json
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18839 and previous config saved to /var/cache/conftool/dbconfig/20220119-110438-root.json
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18838 and previous config saved to /var/cache/conftool/dbconfig/20220119-110433-root.json
  • 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
  • 10:58 godog: flip graphite back to eqiad - T299383
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T285149)', diff saved to https://phabricator.wikimedia.org/P18837 and previous config saved to /var/cache/conftool/dbconfig/20220119-105640-marostegui.json
  • 10:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T285149)', diff saved to https://phabricator.wikimedia.org/P18836 and previous config saved to /var/cache/conftool/dbconfig/20220119-105523-marostegui.json
  • 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18835 and previous config saved to /var/cache/conftool/dbconfig/20220119-104934-root.json
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18834 and previous config saved to /var/cache/conftool/dbconfig/20220119-104929-root.json
  • 10:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.3.0 - ayounsi@cumin1001
  • 10:42 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.3.0 - ayounsi@cumin1001
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T285149)', diff saved to https://phabricator.wikimedia.org/P18833 and previous config saved to /var/cache/conftool/dbconfig/20220119-104109-marostegui.json
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2011.codfw.wmnet with OS buster
  • 10:40 ayounsi@deploy1002: Finished deploy [homer/deploy@d1fbc5c]: Homer release v0.3.0 (duration: 01m 26s)
  • 10:39 ayounsi@deploy1002: Started deploy [homer/deploy@d1fbc5c]: Homer release v0.3.0
  • 10:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2010.codfw.wmnet
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18832 and previous config saved to /var/cache/conftool/dbconfig/20220119-103431-root.json
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18831 and previous config saved to /var/cache/conftool/dbconfig/20220119-103425-root.json
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18830 and previous config saved to /var/cache/conftool/dbconfig/20220119-102604-marostegui.json
  • 10:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 10:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
  • 10:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply on production
  • 10:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18829 and previous config saved to /var/cache/conftool/dbconfig/20220119-101927-root.json
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18828 and previous config saved to /var/cache/conftool/dbconfig/20220119-101922-root.json
  • 10:18 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
  • 10:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply on production
  • 10:17 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
  • 10:17 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply on production
  • 10:17 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply on staging
  • 10:15 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync on production
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18827 and previous config saved to /var/cache/conftool/dbconfig/20220119-101100-marostegui.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18826 and previous config saved to /var/cache/conftool/dbconfig/20220119-100424-root.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18825 and previous config saved to /var/cache/conftool/dbconfig/20220119-100418-root.json
  • 10:03 hashar: Upgraded gerrit-replica.wikimedia.org from 3.3.6 to 3.3.9
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T285149)', diff saved to https://phabricator.wikimedia.org/P18824 and previous config saved to /var/cache/conftool/dbconfig/20220119-095555-marostegui.json
  • 09:54 hashar@deploy1002: Finished deploy [gerrit/gerrit@a340940]: Gerrit to 3.3.9 on gerrit 2001 # T299451 (duration: 00m 09s)
  • 09:54 hashar@deploy1002: Started deploy [gerrit/gerrit@a340940]: Gerrit to 3.3.9 on gerrit 2001 # T299451
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T285149)', diff saved to https://phabricator.wikimedia.org/P18823 and previous config saved to /var/cache/conftool/dbconfig/20220119-095428-marostegui.json
  • 09:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 09:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T285149)', diff saved to https://phabricator.wikimedia.org/P18822 and previous config saved to /var/cache/conftool/dbconfig/20220119-095421-marostegui.json
  • 09:49 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 08s)
  • 09:49 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18821 and previous config saved to /var/cache/conftool/dbconfig/20220119-094920-root.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18820 and previous config saved to /var/cache/conftool/dbconfig/20220119-094914-root.json
  • 09:48 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply on staging
  • 09:48 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply on production
  • 09:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync on production
  • 09:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply on staging
  • 09:47 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply on production
  • 09:44 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync on staging
  • 09:43 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply on production
  • 09:43 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply on staging
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18819 and previous config saved to /var/cache/conftool/dbconfig/20220119-093915-marostegui.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18818 and previous config saved to /var/cache/conftool/dbconfig/20220119-093416-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18817 and previous config saved to /var/cache/conftool/dbconfig/20220119-093411-root.json
  • 09:32 XioNoX: enable v6 BGP to HE in eqiad for testing
  • 09:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1098.eqiad.wmnet with OS bullseye
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18816 and previous config saved to /var/cache/conftool/dbconfig/20220119-092410-marostegui.json
  • 09:20 moritzm: migrate primary/secondary instances off ganeti1018
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T285149)', diff saved to https://phabricator.wikimedia.org/P18813 and previous config saved to /var/cache/conftool/dbconfig/20220119-090905-marostegui.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T285149)', diff saved to https://phabricator.wikimedia.org/P18812 and previous config saved to /var/cache/conftool/dbconfig/20220119-090839-marostegui.json
  • 09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T285149)', diff saved to https://phabricator.wikimedia.org/P18811 and previous config saved to /var/cache/conftool/dbconfig/20220119-090832-marostegui.json
  • 09:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1098.eqiad.wmnet with OS bullseye
  • 09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2129.codfw.wmnet with OS bullseye
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098 (s6,s7) for Bullseye reimage T299479', diff saved to https://phabricator.wikimedia.org/P18809 and previous config saved to /var/cache/conftool/dbconfig/20220119-085927-marostegui.json
  • 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18808 and previous config saved to /var/cache/conftool/dbconfig/20220119-085327-marostegui.json
  • 08:50 XioNoX: disable v6 BGP to HE in eqiad for testing
  • 08:46 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync on production
  • 08:45 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply on staging
  • 08:45 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply on production
  • 08:42 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync on production
  • 08:40 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply on staging
  • 08:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply on production
  • 08:40 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync on staging
  • 08:39 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
  • 08:39 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18807 and previous config saved to /var/cache/conftool/dbconfig/20220119-083822-marostegui.json
  • 08:35 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
  • 08:35 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
  • 08:34 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
  • 08:34 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
  • 08:34 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
  • 08:34 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
  • 08:33 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
  • 08:33 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
  • 08:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2076.codfw.wmnet with OS bullseye
  • 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2129.codfw.wmnet with OS bullseye
  • 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2114.codfw.wmnet with OS bullseye
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T285149)', diff saved to https://phabricator.wikimedia.org/P18806 and previous config saved to /var/cache/conftool/dbconfig/20220119-082318-marostegui.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T285149)', diff saved to https://phabricator.wikimedia.org/P18805 and previous config saved to /var/cache/conftool/dbconfig/20220119-081650-marostegui.json
  • 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T285149)', diff saved to https://phabricator.wikimedia.org/P18804 and previous config saved to /var/cache/conftool/dbconfig/20220119-081643-marostegui.json
  • 08:11 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2010.codfw.wmnet with OS buster
  • 08:10 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:10 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18803 and previous config saved to /var/cache/conftool/dbconfig/20220119-080138-marostegui.json
  • 07:57 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:55 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 07:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2114.codfw.wmnet with OS bullseye
  • 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2076.codfw.wmnet with OS bullseye
  • 07:52 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply on production
  • 07:52 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply on staging
  • 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2124.codfw.wmnet with OS bullseye
  • 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2117.codfw.wmnet with OS bullseye
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18802 and previous config saved to /var/cache/conftool/dbconfig/20220119-074633-marostegui.json
  • 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2089.codfw.wmnet with OS bullseye
  • 07:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2095.codfw.wmnet with OS bullseye
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T285149)', diff saved to https://phabricator.wikimedia.org/P18801 and previous config saved to /var/cache/conftool/dbconfig/20220119-073129-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T285149)', diff saved to https://phabricator.wikimedia.org/P18800 and previous config saved to /var/cache/conftool/dbconfig/20220119-072301-marostegui.json
  • 07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T285149)', diff saved to https://phabricator.wikimedia.org/P18799 and previous config saved to /var/cache/conftool/dbconfig/20220119-072253-marostegui.json
  • 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2124.codfw.wmnet with OS bullseye
  • 07:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2117.codfw.wmnet with OS bullseye
  • 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2089.codfw.wmnet with OS bullseye
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P18797 and previous config saved to /var/cache/conftool/dbconfig/20220119-070749-marostegui.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s3 weights T263127', diff saved to https://phabricator.wikimedia.org/P18796 and previous config saved to /var/cache/conftool/dbconfig/20220119-065318-marostegui.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P18795 and previous config saved to /var/cache/conftool/dbconfig/20220119-065244-marostegui.json
  • 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2095.codfw.wmnet with OS bullseye
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T285149)', diff saved to https://phabricator.wikimedia.org/P18794 and previous config saved to /var/cache/conftool/dbconfig/20220119-063739-marostegui.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T285149)', diff saved to https://phabricator.wikimedia.org/P18793 and previous config saved to /var/cache/conftool/dbconfig/20220119-063613-marostegui.json
  • 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T285149)', diff saved to https://phabricator.wikimedia.org/P18792 and previous config saved to /var/cache/conftool/dbconfig/20220119-063605-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18791 and previous config saved to /var/cache/conftool/dbconfig/20220119-062100-marostegui.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18790 and previous config saved to /var/cache/conftool/dbconfig/20220119-060555-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T285149)', diff saved to https://phabricator.wikimedia.org/P18789 and previous config saved to /var/cache/conftool/dbconfig/20220119-055051-marostegui.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T285149)', diff saved to https://phabricator.wikimedia.org/P18788 and previous config saved to /var/cache/conftool/dbconfig/20220119-054924-marostegui.json
  • 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 05:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 05:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 05:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 01:07 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: DiscussionTools: Use bullet indentation on ruwiki (T259864) (duration: 00m 53s)
  • 01:05 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [wmf-config] Deploy the cawiki test safety survey to production. (T296657) (duration: 00m 53s)
  • 01:02 catrope@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/DiscussionTools: Backport: Enable wikis to customize the syntax used for replies (T259864) and Ensure the marker appears in a reasonable place when replying with a bullet (T259864) (duration: 00m 53s)
  • 01:00 catrope@deploy1002: Synchronized php-1.38.0-wmf.18/extensions/AbuseFilter/: Backport: Don't use array keys for OOUI (T299463) and Don't use array keys for OOUI in AbuseFilterViewDiff (T299463) (duration: 00m 54s)
  • 00:49 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Change TheWikipediaLibrary editcount (T288070) (duration: 00m 53s)
  • 00:38 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Use namespaced CentralAuthUser (T298840) (duration: 00m 54s)
  • 00:35 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "commonswiki: Add peerj.com to wgCopyUploadsDomains whitelist" (duration: 00m 54s)
  • 00:33 WFan: re-enable the disabled jobs for civicrm upgrade
  • 00:30 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: azwiki: Change alias Q to QA for the draft namespace (T299332) (duration: 00m 53s)
  • 00:08 WFan: Upgrade CiviCrm from gerrit #755044
  • 00:07 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: fawiki: Exempt userspaces from being indexed by search engines (T299363) (duration: 00m 54s)
  • 00:00 WFan: disabling jobs for civiCrm upgrade

2022-01-18

  • 23:11 jhathaway: rebooting mx1001 to revert to the old kernel
  • 22:59 sbassett: Deployed security patch for T298434 to 1.38.0-wmf.18
  • 22:57 sbassett: Deployed security patch for T298434 to 1.380-wmf.17
  • 21:42 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.18 refs T293959
  • 21:29 jhuneidi@deploy1002: Finished scap: testwikis to 1.38.0-wmf.18 refs T293959 (duration: 38m 31s)
  • 21:26 jhathaway: rebooting mx1001, to test new kernel
  • 20:50 jhuneidi@deploy1002: Started scap: testwikis to 1.38.0-wmf.18 refs T293959
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0ff5874: pwnwiki: Deploy Growth features to newcomers (T298115) (duration: 02m 14s)
  • 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:57 dcausse: restarting blazegraph on wdqs1007 (jvm stuck for 13hours)
  • 17:37 hashar: restarted zuul on contint2001
  • 17:16 moritzm: installing gmp security updates
  • 16:53 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
  • 16:53 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
  • 16:52 hashar: contint2001: restarted ferm service
  • 16:49 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2010.codfw.wmnet with OS buster
  • 16:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
  • 16:47 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2010.codfw.wmnet with OS buster
  • 16:45 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
  • 16:21 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
  • 16:14 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2010.codfw.wmnet with OS buster
  • 16:13 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
  • 16:11 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2010.codfw.wmnet with OS buster
  • 16:10 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:09 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:07 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
  • 16:03 moritzm: installing xen security updates on buster (client-side libraries)
  • 15:59 hashar: Shutting down CI for maintenance on contint2001 # T283582
  • 15:54 godog: update kartotherian certs on maps hosts and roll-reload nginx - T297604
  • 15:54 moritzm: installing libssh2 security updates on stretch
  • 15:50 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 09s)
  • 15:50 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:47 andrewbogott: resizing the wikitech-static host for T298052
  • 15:45 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 02s)
  • 15:45 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
  • 15:35 godog: regenerate kartotherian certs via cergen - T297604
  • 14:33 kormat: Deploying wmfmariadbpy 0.8 T299406
  • 14:33 kormat: uploaded wmfmariadbpy 0.8 to apt.wm.o
  • 14:31 moritzm: installing rsync security updates on stretch
  • 14:28 moritzm: installing xorg-server security updates on stretch
  • 14:10 moritzm: installing vim security updates on stretch
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T285149)', diff saved to https://phabricator.wikimedia.org/P18780 and previous config saved to /var/cache/conftool/dbconfig/20220118-140540-marostegui.json
  • 13:55 XioNoX: update grafana-plugins on grafana hosts - T251184
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P18779 and previous config saved to /var/cache/conftool/dbconfig/20220118-135036-marostegui.json
  • 13:46 XioNoX: add grafana-plugins 0.3 (with worldmap plugin) to reprepo
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P18778 and previous config saved to /var/cache/conftool/dbconfig/20220118-133531-marostegui.json
  • 13:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:26 Lucas_WMDE: UTC morning backport window done
  • 13:24 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: Monitoring: Add '.Save' to distinguish from '.Click' events (T286366) (duration: 00m 54s)
  • 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T285149)', diff saved to https://phabricator.wikimedia.org/P18777 and previous config saved to /var/cache/conftool/dbconfig/20220118-132026-marostegui.json
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:14 moritzm: installing python-babel security updates on buster
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T285149)', diff saved to https://phabricator.wikimedia.org/P18776 and previous config saved to /var/cache/conftool/dbconfig/20220118-131215-marostegui.json
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T285149)', diff saved to https://phabricator.wikimedia.org/P18775 and previous config saved to /var/cache/conftool/dbconfig/20220118-131208-marostegui.json
  • 13:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: update requirements - ayounsi@cumin1001
  • 13:05 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: update requirements - ayounsi@cumin1001
  • 13:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:04 ayounsi@deploy1002: Finished deploy [homer/deploy@0f02386]: update requirements (duration: 01m 27s)
  • 13:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:02 ayounsi@deploy1002: Started deploy [homer/deploy@0f02386]: update requirements
  • 12:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:59 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: fawiki: Add flow-delete right to eliminators (T299223) (duration: 00m 51s)
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P18774 and previous config saved to /var/cache/conftool/dbconfig/20220118-125703-marostegui.json
  • 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:52 moritzm: installing ghostcript security updates for stretch
  • 12:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: azwiki: Add draft namespace (T299332) (duration: 00m 51s)
  • 12:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P18773 and previous config saved to /var/cache/conftool/dbconfig/20220118-124159-marostegui.json
  • 12:36 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/GrowthExperiments/modules/ext.growthExperiments.PostEdit/index.js: Backport: Post-edit dialog: Reload page upon dialog closing for structured tasks (T299188) (duration: 00m 51s)
  • 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:29 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: commonswiki: Add peerj.com to wgCopyUploadsDomains whitelist (T299247) (duration: 00m 51s)
  • 12:27 moritzm: imported docker-report bullseye rebuild to apt.wikimedia.org T298463
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T285149)', diff saved to https://phabricator.wikimedia.org/P18772 and previous config saved to /var/cache/conftool/dbconfig/20220118-122654-marostegui.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T285149)', diff saved to https://phabricator.wikimedia.org/P18771 and previous config saved to /var/cache/conftool/dbconfig/20220118-122546-marostegui.json
  • 12:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T285149)', diff saved to https://phabricator.wikimedia.org/P18770 and previous config saved to /var/cache/conftool/dbconfig/20220118-122538-marostegui.json
  • 12:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P18769 and previous config saved to /var/cache/conftool/dbconfig/20220118-121034-marostegui.json
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P18768 and previous config saved to /var/cache/conftool/dbconfig/20220118-115529-marostegui.json
  • 11:46 hashar: Rolled back Quibble 1.3.0 jobs due to php configuration files with at least releng/quibble-buster73:1.3.0 # T299389
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T285149)', diff saved to https://phabricator.wikimedia.org/P18767 and previous config saved to /var/cache/conftool/dbconfig/20220118-114024-marostegui.json
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T285149)', diff saved to https://phabricator.wikimedia.org/P18766 and previous config saved to /var/cache/conftool/dbconfig/20220118-113916-marostegui.json
  • 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:28 Amir1: mwscript findBadBlobs.php --wiki=dewiki --revisions 5730218 --mark "T299387"
  • 11:06 moritzm: running gnt-cluster renew-crypto --new-node-certificates for ganeti/eqiad cluster following 2.16 update
  • 11:06 mmandere: start rolling upgrade to varnish 6.0.9 T298758
  • 10:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1117.eqiad.wmnet with OS bullseye
  • 10:46 moritzm: gnt-cluster upgrade --to 2.16 for ganeti/eqiad cluster
  • 10:31 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1117.eqiad.wmnet with OS bullseye
  • 10:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:01 moritzm: running gnt-cluster renew-crypto --new-cluster-certificate --new-rapi-certificate --new-spice-certificate for ganeti/eqiad cluster
  • 10:00 marostegui: Move pc1014 to pc3 T299046
  • 09:59 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Revert: Promote pc1014 to master in pc2 T299046 (duration: 00m 50s)
  • 09:50 taavi: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "QuiteUnusual" "MarcGarver" # T298707
  • 09:50 moritzm: installing ganeti 2.16.0-1~bpo9+1+wmf1 on ganeti/eqiad servers T296721
  • 09:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:41 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Enable temporary global user groups on production (T153815) (duration: 00m 51s)
  • 09:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:32 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/includes: Backport: page: Use MainObjectStash instead of 'db-replicated' cache (T272512) (duration: 00m 56s)
  • 09:31 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/Linter/extension.json: Backport: Disable "inline-media-caption" category (T297443) (duration: 00m 51s)
  • 09:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:06 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/includes/watcheditem/WatchedItemStore.php: Backport: watcheditem: Try getting the cached version in resetNotificationTimestamp (duration: 00m 51s)
  • 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1012.eqiad.wmnet with OS bullseye
  • 08:55 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for build2001.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
  • 08:55 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for build2001.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
  • 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on build2001.codfw.wmnet with reason: reinstallation
  • 08:42 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on build2001.codfw.wmnet with reason: reinstallation
  • 08:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:37 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/ProofreadPage/includes/Page/PageContentHandler.php: Backport: Use fillParserOutputInternal instead of getParserOutput. (T292300) (duration: 00m 51s)
  • 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:32 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1012.eqiad.wmnet with OS bullseye
  • 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:30 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to master in pc2 T299046 (duration: 00m 51s)
  • 08:20 Amir1: cleaning up commons linter errors T298782
  • 08:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:12 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/Linter/includes/RecordLintJob.php: Backport: Drop 'inline-media-caption' lint requests (T297443 T299302) (duration: 00m 52s)
  • 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1014.eqiad.wmnet with OS bullseye
  • 07:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1014.eqiad.wmnet with OS bullseye
  • 06:34 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1014.eqiad.wmnet with OS bullseye
  • 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1014.eqiad.wmnet with OS bullseye
  • 06:13 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1014.eqiad.wmnet with OS bullseye
  • 06:02 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1014.eqiad.wmnet with OS bullseye
  • 05:59 kart_: Update apertium to 2022-01-18-052631-production (T218184, T202276, T218184, T270061, T248653, T248293, T248812, T248654)
  • 05:56 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: sync on production
  • 05:54 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply on staging
  • 05:54 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply on production
  • 05:54 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply on staging
  • 05:54 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply on production
  • 05:53 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: sync on production
  • 05:51 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply on staging
  • 05:51 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply on production
  • 05:49 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: sync on staging
  • 05:49 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply on production
  • 05:49 kartik@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply on staging
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist group from s3 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18764 and previous config saved to /var/cache/conftool/dbconfig/20220118-054659-marostegui.json
  • 02:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn

2022-01-17

  • 23:27 jynus: forced session revocation on phab for a user T299315
  • 20:48 aqu@deploy1002: Finished deploy [airflow-dags/analytics-test@27a4f7a]: (no justification provided) (duration: 00m 02s)
  • 20:48 aqu@deploy1002: Started deploy [airflow-dags/analytics-test@27a4f7a]: (no justification provided)
  • 18:47 krinkle@deploy1002: Finished deploy [integration/docroot@1621c26]: (no justification provided) (duration: 01m 14s)
  • 18:46 krinkle@deploy1002: Started deploy [integration/docroot@1621c26]: (no justification provided)
  • 16:30 moritzm: installing python-virtualenv bugfix updates from bullseye 11.2 point release
  • 16:21 moritzm: installing wget bugfix updates from bullseye 11.2 point release
  • 16:13 moritzm: installing freeipmi bugfix updates from bullseye 11.2 point release
  • 16:02 moritzm: installing curl bugfix updates from bullseye 11.2 point release
  • 15:54 mutante: mw1414,mw1415,mw1416,mw1417,mw1418,mw1447,mw1448,mw1449,mw1450,mw1437,mw1438 (all canaries eqiad) - apt-get remove --purge fonts*; apt-get remove --purge xfonts* (T294378)
  • 15:46 mutante: parse2001, parse2002, wtp1025, wtp1026 (all parsoid canaries - apt-get remove --purge fonts*; apt-get remove --purge xfonts* (T294378)
  • 15:40 mutante: mw2278, mw2279, mw2374, mw2376 (API and jobrunner canaries codfw) - apt-get remove --purge fonts*; apt-get remove --purge xfonts* (T294378)
  • 15:34 mutante: mw2271, mw2272, mw2251, mw2252 (appserver and API canaries codfw) - apt-get remove --purge fonts*; apt-get remove --purge xfonts* (T294378)
  • 15:01 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-airflow1003.eqiad.wmnet
  • 14:58 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM an-airflow1003.eqiad.wmnet
  • 14:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2132.codfw.wmnet with OS bullseye
  • 14:50 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-airflow1002.eqiad.wmnet
  • 14:48 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM an-airflow1002.eqiad.wmnet
  • 14:45 moritzm: imported cassandra 3.11.11 to component/cassandradev for stretch-wikimedia and buster-wikimedia T298805
  • 14:41 moritzm: systemctl reset-failed ifup@ens5.service on an-airflow1001 T273026
  • 14:39 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-airflow1001.eqiad.wmnet
  • 14:37 hnowlan: removing restbase2009 from cassandra configs
  • 14:30 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM an-airflow1001.eqiad.wmnet
  • 14:16 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2132.codfw.wmnet with OS bullseye
  • 14:15 marostegui: Reimage db2132 to Bullseye T299344
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges group from s3 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18762 and previous config saved to /var/cache/conftool/dbconfig/20220117-134520-marostegui.json
  • 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1151.eqiad.wmnet with OS bullseye
  • 12:19 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1151.eqiad.wmnet with OS bullseye
  • 12:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2142.codfw.wmnet with OS bullseye
  • 11:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2142.codfw.wmnet with OS bullseye
  • 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafkamon1002.eqiad.wmnet
  • 11:26 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kafkamon1002.eqiad.wmnet
  • 11:08 moritzm: switching kubetcd1006 to DRBD-backed storage (required for ganeti update)
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1006.eqiad.wmnet with reason: switch to drbd storage
  • 11:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1006.eqiad.wmnet with reason: switch to drbd storage
  • 11:00 moritzm: systemctl reset-failed ifup@ens5.service on kubetcd1005 T273026
  • 10:56 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked group from s3 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18761 and previous config saved to /var/cache/conftool/dbconfig/20220117-104801-marostegui.json
  • 10:47 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
  • 10:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1152.eqiad.wmnet with OS bullseye
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T285149)', diff saved to https://phabricator.wikimedia.org/P18760 and previous config saved to /var/cache/conftool/dbconfig/20220117-104459-marostegui.json
  • 10:44 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
  • 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1153.eqiad.wmnet with OS bullseye
  • 10:42 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
  • 10:32 moritzm: switching kubetcd1005 to DRBD-backed storage (required for ganeti update)
  • 10:31 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync on staging
  • 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1005.eqiad.wmnet with reason: switch to drbd storage
  • 10:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1005.eqiad.wmnet with reason: switch to drbd storage
  • 10:30 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply on production
  • 10:30 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply on staging
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P18759 and previous config saved to /var/cache/conftool/dbconfig/20220117-102954-marostegui.json
  • 10:17 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1152.eqiad.wmnet with OS bullseye
  • 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1153.eqiad.wmnet with OS bullseye
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P18758 and previous config saved to /var/cache/conftool/dbconfig/20220117-101450-marostegui.json
  • 10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2144.codfw.wmnet with OS bullseye
  • 10:04 moritzm: switching kubetcd1004 to DRBD-backed storage (required for ganeti update)
  • 10:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd1004.eqiad.wmnet with reason: switch to drbd storage
  • 10:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd1004.eqiad.wmnet with reason: switch to drbd storage
  • 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2143.codfw.wmnet with OS bullseye
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T285149)', diff saved to https://phabricator.wikimedia.org/P18757 and previous config saved to /var/cache/conftool/dbconfig/20220117-095945-marostegui.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T285149)', diff saved to https://phabricator.wikimedia.org/P18756 and previous config saved to /var/cache/conftool/dbconfig/20220117-095837-marostegui.json
  • 09:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 09:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T285149)', diff saved to https://phabricator.wikimedia.org/P18755 and previous config saved to /var/cache/conftool/dbconfig/20220117-095830-marostegui.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P18754 and previous config saved to /var/cache/conftool/dbconfig/20220117-094325-marostegui.json
  • 09:30 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2144.codfw.wmnet with OS bullseye
  • 09:30 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2143.codfw.wmnet with OS bullseye
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P18753 and previous config saved to /var/cache/conftool/dbconfig/20220117-092820-marostegui.json
  • 09:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1017.eqiad.wmnet with OS bullseye
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T285149)', diff saved to https://phabricator.wikimedia.org/P18752 and previous config saved to /var/cache/conftool/dbconfig/20220117-091316-marostegui.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T285149)', diff saved to https://phabricator.wikimedia.org/P18751 and previous config saved to /var/cache/conftool/dbconfig/20220117-091308-marostegui.json
  • 09:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 09:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T285149)', diff saved to https://phabricator.wikimedia.org/P18750 and previous config saved to /var/cache/conftool/dbconfig/20220117-091300-marostegui.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P18749 and previous config saved to /var/cache/conftool/dbconfig/20220117-085756-marostegui.json
  • 08:53 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1017.eqiad.wmnet with OS bullseye
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P18748 and previous config saved to /var/cache/conftool/dbconfig/20220117-084251-marostegui.json
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema1003.eqiad.wmnet
  • 08:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM schema1003.eqiad.wmnet
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T285149)', diff saved to https://phabricator.wikimedia.org/P18747 and previous config saved to /var/cache/conftool/dbconfig/20220117-082746-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T285149)', diff saved to https://phabricator.wikimedia.org/P18746 and previous config saved to /var/cache/conftool/dbconfig/20220117-082638-marostegui.json
  • 08:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema1004.eqiad.wmnet
  • 08:17 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM schema1004.eqiad.wmnet
  • 06:59 elukey: `systemctl reset-failed ifup@ens5.service` on an-test-client1001 and kafka-test1010
  • 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1016.eqiad.wmnet with OS bullseye
  • 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1016.eqiad.wmnet with OS bullseye

2022-01-16

  • 08:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 08:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
  • 08:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply on production
  • 08:18 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 08:17 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply on staging
  • 08:17 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply on production

2022-01-15

  • 08:55 legoktm: finished running recountCategories on s4 wikis (T299244)
  • 07:58 legoktm: finished running recountCategories on s7 wikis (T299244)
  • 07:51 legoktm: finished running recountCategories on s2 wikis (T299244)
  • 06:41 <legoktm>: finished running recountCategories on s3 wikis (T299244)
  • 06:21 <legoktm>: finished running recountCategories on s6 wikis (T299244)
  • 06:19 <legoktm>: finished running recountCategories on s5 wikis (T299244)
  • 06:18 <legoktm>: finished running recountCategories on s8 wikis (T299244)
  • 06:14 legoktm: running recountCategories on s3 wikis
  • 05:20 legoktm: started recountCategories.php --wiki=enwiki --mode pages (T299244)
  • 03:05 legoktm: started refreshLinks --dfn-only via systemd units for s7-s8 (T299244)
  • 03:01 legoktm: started refreshLinks --dfn-only via systemd units for s2-s6 (T299244)
  • 02:55 legoktm: started mwscript refreshLinks.php --wiki=commonswiki --dfn-only (T299244)
  • 02:54 legoktm: started mwscript refreshLinks.php --wiki=enwiki --dfn-only (T299244)
  • 02:52 legoktm: started mwscript refreshLinks.php --wiki=enwiki --dfn-only
  • 01:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:04 legoktm: starting recountCategories.php --mode pages --wiki enwiki on mwmaint1002
  • 01:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:58 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.17 refs T293958
  • 00:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:52 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17 refs T293958 (duration: 00m 52s)
  • 00:51 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17 refs T293958
  • 00:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:46 jforrester@deploy1002: Finished scap: Revert "LinksUpdate refactor" and follow-ups for T299244 re. T293958 (duration: 03m 58s)
  • 00:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:42 jforrester@deploy1002: Started scap: Revert "LinksUpdate refactor" and follow-ups for T299244 re. T293958
  • 00:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert "all/group1 wikis to 1.38.0-wmf.17"

2022-01-14

  • 23:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2051.codfw.wmnet with OS stretch
  • 22:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
  • 18:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
  • 18:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
  • 17:44 bblack: drmrs asw: removed native-vlan-id from config on secondary (x-rack) interfaces of lvses to debug network issue
  • 17:26 bblack: reboot lvs600[23]
  • 16:55 bblack: reboot lvs6001
  • 16:30 bblack: rebooting cp60xx where x is 6, 7, 8, 14, 15, 16 (downtimed)
  • 16:15 dancy@deploy1002: Synchronized README: Testing php-fpm restart (duration: 03m 18s)
  • 16:04 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
  • 15:40 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
  • 15:39 bblack: lvs6001 + all services downtimed
  • 15:29 bblack@cumin1001: conftool action : set/pooled=yes; selector: dc=drmrs
  • 15:00 bblack: silenced site=drmrs in alertmanager for one month, I think
  • 15:00 bblack: silenced site=drmrs in alertmanager, I think
  • 13:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2011.codfw.wmnet with OS bullseye
  • 13:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
  • 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2011.codfw.wmnet with OS bullseye
  • 12:53 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
  • 12:51 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
  • 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1024.eqiad.wmnet with OS buster
  • 12:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1024.eqiad.wmnet with OS buster
  • 12:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
  • 12:18 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
  • 11:51 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
  • 11:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
  • 11:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing
  • 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1023.eqiad.wmnet with OS buster
  • 11:18 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1023.eqiad.wmnet with OS buster
  • 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM archiva1002.wikimedia.org
  • 11:00 moritzm: systemctl reset-failed ifup@ens5.service on archiva1002 T273026
  • 10:56 moritzm: rebooting archiva1002 (running archiva.wikimedia.org)
  • 10:56 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM archiva1002.wikimedia.org
  • 10:55 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
  • 10:50 moritzm: systemctl reset-failed ifup@ens5.service on an-test-ui1001 T273026
  • 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-ui1001.eqiad.wmnet
  • 10:42 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-ui1001.eqiad.wmnet
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-presto1001.eqiad.wmnet
  • 10:17 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-presto1001.eqiad.wmnet
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
  • 10:05 moritzm: rebooting matomo1002 (running piwik.wikimedia.org)
  • 10:04 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-druid1001.eqiad.wmnet
  • 09:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-druid1001.eqiad.wmnet
  • 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM apt1001.wikimedia.org
  • 09:35 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM apt1001.wikimedia.org
  • 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install1003.wikimedia.org
  • 09:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install1003.wikimedia.org
  • 09:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-client1001.eqiad.wmnet
  • 09:19 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-test-client1001.eqiad.wmnet
  • 09:11 marostegui: Move pc1014 from pc1 to pc2 T299046
  • 09:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2013.codfw.wmnet with OS bullseye
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1009.eqiad.wmnet
  • 09:01 moritzm: rebooting an-tool1009 (running hue.wikimedia.org)
  • 09:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1009.eqiad.wmnet
  • 09:00 moritzm: systemctl reset-failed ifup@ens5.service on an-tool1005 T273026
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1008.eqiad.wmnet
  • 08:58 moritzm: rebooting an-tool1008 (running yarn.wikimedia.org)
  • 08:58 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1008.eqiad.wmnet
  • 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1007.eqiad.wmnet
  • 08:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1007.eqiad.wmnet
  • 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-tool1005.eqiad.wmnet
  • 08:51 moritzm: rebooting an-tool1007 (running turnilo.wikimedia.org)
  • 08:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM an-tool1005.eqiad.wmnet
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cuminunpriv1001.eqiad.wmnet
  • 08:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cuminunpriv1001.eqiad.wmnet
  • 08:33 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2013.codfw.wmnet with OS bullseye
  • 07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2012.codfw.wmnet with OS bullseye
  • 07:05 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2012.codfw.wmnet with OS bullseye
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager group from s3 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18735 and previous config saved to /var/cache/conftool/dbconfig/20220114-063554-marostegui.json
  • 06:15 marostegui: Failover m5 proxy from dbproxy1017 to dbproxy1021 T298586
  • 05:16 legoktm: manually restarted discard_held_messages service on lists1001, failed with a spurious sqlalchemy issue about packets being out of order
  • 00:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:23 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.17 refs T293958
  • 00:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:15 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17 refs T293958 (duration: 01m 06s)
  • 00:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:13 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17 refs T293958
  • 00:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:09 dduvall@deploy1002: Synchronized php-1.38.0-wmf.17/includes/content/WikitextContentHandler.php: Backport: In WikitextContentHandler always use getFreshParser() (T299149) (duration: 01m 07s)

2022-01-13

  • 22:40 WFan: Updating payment-wiki, revision changed from 8497eae9 to 5cc9d5e0
  • 22:18 dzahn@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=miscweb
  • 22:00 dzahn@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=miscweb
  • 21:48 mutante: running puppet on cp-ulsfo
  • 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:31 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.17"
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:29 dduvall: rolling back wmf.17 from group1 due to a large increase in "Parser state cleared while parsing" across commons and group1 wikipedias (T293958, T299149)
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:17 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17 refs T293958 (duration: 01m 06s)
  • 20:16 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17 refs T293958
  • 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:07 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.17 refs T293958
  • 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:43 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
  • 19:42 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2051.codfw.wmnet with OS stretch
  • 19:40 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync on main
  • 19:40 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable ArticlePlaceholder on dagwiki (T298349) (duration: 01m 13s)
  • 19:37 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply on main
  • 19:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:25 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: sync on main
  • 19:23 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply on main
  • 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:19 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add event stream config for ios.notification_interaction (T290920) (duration: 01m 13s)
  • 19:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:15 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add event stream config for android.customize_toolbar_interaction (T297818) (duration: 01m 12s)
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:07 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable skin migration mode on the beta cluster (duration: 01m 14s)
  • 18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
  • 17:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
  • 17:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1005.eqiad.wmnet with reason: requires resync after planet sync
  • 17:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1005.eqiad.wmnet with reason: requires resync after planet sync
  • 17:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 17:34 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 17:33 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 17:29 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 17:29 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 17:29 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 17:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 17:28 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 17:22 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
  • 17:22 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
  • 17:11 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
  • 17:07 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
  • 17:01 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
  • 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
  • 16:27 moritzm: impor maps-deduped-tilelist 0.0.5 to buster-wikimedia/main T297408
  • 16:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cuminunpriv1001.eqiad.wmnet
  • 16:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cuminunpriv1001.eqiad.wmnet
  • 15:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
  • 15:50 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster
  • 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aphlict1001.eqiad.wmnet
  • 15:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aphlict1001.eqiad.wmnet
  • 15:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM flowspec1001.eqiad.wmnet
  • 15:40 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM flowspec1001.eqiad.wmnet
  • 15:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
  • 15:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1004.wikimedia.org
  • 15:26 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1004.wikimedia.org
  • 15:23 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
  • 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1003.wikimedia.org
  • 15:21 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2009.codfw.wmnet with OS buster
  • 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1003.wikimedia.org
  • 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM seaborgium.wikimedia.org
  • 15:15 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM seaborgium.wikimedia.org
  • 15:10 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader1002.wikimedia.org
  • 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader1002.wikimedia.org
  • 14:56 mmandere: cp3053: upgrade varnish to 6.0.9-1wm1 T298758
  • 14:56 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster
  • 14:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp1001.wikimedia.org
  • 14:47 moritzm: systemctl reset-failed ifup@ens5.service on idp1001 T273026
  • 14:39 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp1001.wikimedia.org
  • 14:15 moritzm: switch ml-etcd1003 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
  • 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd1003.eqiad.wmnet with reason: switch to drbd storage
  • 14:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd1003.eqiad.wmnet with reason: switch to drbd storage
  • 13:53 mmandere@cumin1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet
  • 13:49 moritzm: switch ml-etcd1002 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd1002.eqiad.wmnet with reason: switch to drbd storage
  • 13:48 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd1002.eqiad.wmnet with reason: switch to drbd storage
  • 13:45 mmandere@cumin1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet
  • 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader1001.wikimedia.org
  • 13:33 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader1001.wikimedia.org
  • 13:23 moritzm: switch ml-etcd1001 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
  • 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd1001.eqiad.wmnet with reason: switch to drbd storage
  • 13:21 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd1001.eqiad.wmnet with reason: switch to drbd storage
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cloudbackup1001-dev.eqiad.wmnet
  • 13:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1001-dev.eqiad.wmnet
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18731 and previous config saved to /var/cache/conftool/dbconfig/20220113-124307-root.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions group from s3 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18730 and previous config saved to /var/cache/conftool/dbconfig/20220113-124300-marostegui.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Remove all special groups from s3 codfw T263127', diff saved to https://phabricator.wikimedia.org/P18729 and previous config saved to /var/cache/conftool/dbconfig/20220113-124140-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es1021', diff saved to https://phabricator.wikimedia.org/P18728 and previous config saved to /var/cache/conftool/dbconfig/20220113-123744-marostegui.json
  • 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cloudbackup1002-dev.eqiad.wmnet
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18727 and previous config saved to /var/cache/conftool/dbconfig/20220113-122803-root.json
  • 12:27 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1002-dev.eqiad.wmnet
  • 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-corp1001.wikimedia.org
  • 12:21 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-corp1001.wikimedia.org
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 60%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18726 and previous config saved to /var/cache/conftool/dbconfig/20220113-121300-root.json
  • 12:03 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM eventlog1003.eqiad.wmnet
  • 11:59 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM eventlog1003.eqiad.wmnet
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18725 and previous config saved to /var/cache/conftool/dbconfig/20220113-115756-root.json
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 40%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18724 and previous config saved to /var/cache/conftool/dbconfig/20220113-114252-root.json
  • 11:34 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18723 and previous config saved to /var/cache/conftool/dbconfig/20220113-112749-root.json
  • 11:26 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet
  • 11:26 _joe_: update scap everywhere T298986
  • 11:25 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15]: scap testing (duration: 00m 09s)
  • 11:25 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15]: scap testing
  • 11:24 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15]: (no justification provided) (duration: 00m 09s)
  • 11:23 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15]: (no justification provided)
  • 11:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testreduce1001.eqiad.wmnet
  • 11:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2022.codfw.wmnet with OS bullseye
  • 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testreduce1001.eqiad.wmnet
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 20%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18722 and previous config saved to /var/cache/conftool/dbconfig/20220113-111245-root.json
  • 11:11 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet
  • 11:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox1001.wikimedia.org
  • 11:08 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet
  • 11:03 moritzm: rebooting netbox1001 (running netbox.wikimedia.org)
  • 11:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox1001.wikimedia.org
  • 11:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1001.eqiad.wmnet with OS buster
  • 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netboxdb1001.eqiad.wmnet
  • 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netboxdb1001.eqiad.wmnet
  • 10:58 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1008.eqiad.wmnet
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18721 and previous config saved to /var/cache/conftool/dbconfig/20220113-105741-root.json
  • 10:56 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1008.eqiad.wmnet
  • 10:52 hashar: Restarting Jenkins CI for plugins update T298691
  • 10:47 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1007.eqiad.wmnet
  • 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM search-loader1001.eqiad.wmnet
  • 10:45 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1007.eqiad.wmnet
  • 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM search-loader1001.eqiad.wmnet
  • 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2022.codfw.wmnet with OS bullseye
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18720 and previous config saved to /var/cache/conftool/dbconfig/20220113-104238-root.json
  • 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc1001.wikimedia.org
  • 10:29 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1001.eqiad.wmnet with OS buster
  • 10:29 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM irc1001.wikimedia.org
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: repooling after reimage', diff saved to https://phabricator.wikimedia.org/P18719 and previous config saved to /var/cache/conftool/dbconfig/20220113-102734-root.json
  • 10:27 moritzm: systemctl reset-failed ifup@ens5.service on lists1001 T273026
  • 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM grafana1002.eqiad.wmnet
  • 10:10 moritzm: rebooting grafana1002 (running grafana.wikimedia.org)
  • 10:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM grafana1002.eqiad.wmnet
  • 10:09 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
  • 10:02 mmandere: cp3052: upgrade varnish to 6.0.9-1wm1 T298758
  • 10:02 joal@deploy1002: Finished deploy [analytics/refinery@94ec386]: Hotfix analytics deploy [analytics/refinery@94ec386] (duration: 21m 47s)
  • 10:02 elukey: run kafka preferred-replica-election on kafka-main1001 to force a rebalance of partition leaders (after kafka-main1002's reimage)
  • 10:00 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1006.eqiad.wmnet
  • 09:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1002.eqiad.wmnet with OS buster
  • 09:56 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1006.eqiad.wmnet
  • 09:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
  • 09:46 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
  • 09:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
  • 09:40 joal@deploy1002: Started deploy [analytics/refinery@94ec386]: Hotfix analytics deploy [analytics/refinery@94ec386]
  • 09:40 joal@deploy1002: Finished deploy [analytics/refinery@94ec386] (thin): Hotfix analytics deploy THIN [analytics/refinery@94ec386] (duration: 00m 07s)
  • 09:40 joal@deploy1002: Started deploy [analytics/refinery@94ec386] (thin): Hotfix analytics deploy THIN [analytics/refinery@94ec386]
  • 09:39 joal@deploy1002: Finished deploy [analytics/refinery@94ec386] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@94ec386] (duration: 06m 59s)
  • 09:35 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
  • 09:32 joal@deploy1002: Started deploy [analytics/refinery@94ec386] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@94ec386]
  • 09:30 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
  • 09:30 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
  • 09:26 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1002.eqiad.wmnet with OS buster
  • 09:25 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
  • 09:24 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1022.eqiad.wmnet with OS bullseye
  • 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM xhgui1001.eqiad.wmnet
  • 09:14 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM xhgui1001.eqiad.wmnet
  • 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1022.eqiad.wmnet with OS bullseye
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM lists1001.wikimedia.org
  • 09:02 moritzm: rebooting lists1001 (running lists.wikimedia.org) to pick up new KVM setting
  • 09:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM lists1001.wikimedia.org
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022, give weight to es1021 T295965 ', diff saved to https://phabricator.wikimedia.org/P18718 and previous config saved to /var/cache/conftool/dbconfig/20220113-085906-marostegui.json
  • 08:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1003.eqiad.wmnet with OS buster
  • 08:39 elukey: ipmi mc reset cold for kafka-main1002, mgmt interface not reachable via ssh
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges group from s7 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18717 and previous config saved to /var/cache/conftool/dbconfig/20220113-083923-marostegui.json
  • 08:28 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: Take LogicException into consideration (T299111) (duration: 01m 28s)
  • 08:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:21 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: Take LogicException into consideration (T299111) (duration: 01m 28s)
  • 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1003.eqiad.wmnet with OS buster
  • 08:06 marostegui: Change innodb_checksum_algorithm=full_crc32 on eqiad sanitarium hosts (db1154, db1155) T287244
  • 08:02 elukey: ipmi mc reset cold for kafka-main1003, mgmt interface not reachable via ssh
  • 07:57 elukey: stop kafka* on kafka-main1003 as prep-step for reimage to buster
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked group from s7 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18715 and previous config saved to /var/cache/conftool/dbconfig/20220113-075012-marostegui.json
  • 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1015.eqiad.wmnet with OS bullseye
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1015.eqiad.wmnet with OS bullseye
  • 06:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 06:41 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/includes/export/WikiExporter.php: Backport: export: Remove ignoring rev_page_id index (T163532) (duration: 01m 28s)
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18714 and previous config saved to /var/cache/conftool/dbconfig/20220113-064113-root.json
  • 06:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 06:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 06:38 marostegui: Failover m3 proxy from dbproxy1016 to dbproxy1020 T298586
  • 06:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 06:26 marostegui: Remove rev_page_id from frwiki,jawiki,ruwiki and labswiki from db1096 (s6) T285149
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18713 and previous config saved to /var/cache/conftool/dbconfig/20220113-062609-root.json
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18712 and previous config saved to /var/cache/conftool/dbconfig/20220113-061105-root.json
  • 06:05 tstarling@deploy1002: Synchronized php-1.38.0-wmf.17/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 01m 27s)
  • 05:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 05:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 05:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: repooling after maintenance and reimage', diff saved to https://phabricator.wikimedia.org/P18711 and previous config saved to /var/cache/conftool/dbconfig/20220113-055602-root.json
  • 05:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 05:53 tstarling@deploy1002: Synchronized php-1.38.0-wmf.17/tests/phpunit/unit/includes/libs/rdbms/database/DatabaseSQLTest.php: (no justification provided) (duration: 01m 32s)
  • 05:00 TimStarling: doing T299095 restorations on s3 wikis
  • 04:30 TimStarling: on mwmaint1002: inserting 11565 rows into itwiki.pagelinks for T299095
  • 03:33 TimStarling: on mwmaint1002: inserting 1714288 into wikidatawiki.pagelinks for T299095
  • 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:30 TimStarling: on mwmaint1002: inserting 4221344 rows into commonswiki.pagelinks to clean up from T299095
  • 02:29 tstarling@deploy1002: Synchronized php-1.38.0-wmf.16/maintenance/sql.php: batch size (duration: 01m 28s)
  • 00:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:31 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable CirrusSearch on it/en Wikivoyage (duration: 01m 28s)
  • 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:24 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Skip vector-2022 skin in config, not Vector skin (T298923) (duration: 01m 29s)
  • 00:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:11 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Disambiguator notifications on all wikis (T293319) (duration: 01m 28s)
  • 00:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn

2022-01-12

  • 23:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 23:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 23:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 23:29 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.17
  • 23:07 jhathaway: rebooting mx1001 to get old kernel
  • 22:48 cwhite: end eqiad opensearch upgrade T288621
  • 21:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T297191)', diff saved to https://phabricator.wikimedia.org/P18709 and previous config saved to /var/cache/conftool/dbconfig/20220112-214258-marostegui.json
  • 21:28 mbsantos: mbsantos@maps1009.eqiad.wmnet: start imposm-initial-import - full planet re-import (T299049)
  • 21:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P18708 and previous config saved to /var/cache/conftool/dbconfig/20220112-212753-marostegui.json
  • 21:19 ryankemper: [WDQS] T299098 depooled `wdqs2003` so dc-ops can take a look at the PS2 failure
  • 21:18 joal@deploy1002: Finished deploy [analytics/refinery@988b7d2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@988b7d2] (duration: 06m 57s)
  • 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P18707 and previous config saved to /var/cache/conftool/dbconfig/20220112-211248-marostegui.json
  • 21:11 joal@deploy1002: Started deploy [analytics/refinery@988b7d2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@988b7d2]
  • 21:11 joal@deploy1002: Finished deploy [analytics/refinery@988b7d2] (thin): Regular analytics weekly train THIN [analytics/refinery@988b7d2] (duration: 00m 07s)
  • 21:11 joal@deploy1002: Started deploy [analytics/refinery@988b7d2] (thin): Regular analytics weekly train THIN [analytics/refinery@988b7d2]
  • 21:10 joal@deploy1002: Finished deploy [analytics/refinery@988b7d2]: Regular analytics weekly train [analytics/refinery@988b7d2] (duration: 24m 20s)
  • 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T297191)', diff saved to https://phabricator.wikimedia.org/P18706 and previous config saved to /var/cache/conftool/dbconfig/20220112-205744-marostegui.json
  • 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T297191)', diff saved to https://phabricator.wikimedia.org/P18705 and previous config saved to /var/cache/conftool/dbconfig/20220112-205636-marostegui.json
  • 20:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 20:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T297191)', diff saved to https://phabricator.wikimedia.org/P18704 and previous config saved to /var/cache/conftool/dbconfig/20220112-205629-marostegui.json
  • 20:46 joal@deploy1002: Started deploy [analytics/refinery@988b7d2]: Regular analytics weekly train [analytics/refinery@988b7d2]
  • 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18703 and previous config saved to /var/cache/conftool/dbconfig/20220112-204124-marostegui.json
  • 20:36 dduvall: 1.38.0-wmf.17 rolled back from group1 due to large spike in db read-only errors and slow queries (T293958)
  • 20:33 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.38.0-wmf.17
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18702 and previous config saved to /var/cache/conftool/dbconfig/20220112-202619-marostegui.json
  • 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:21 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17 refs T293958 (duration: 01m 21s)
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:19 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17 refs T293958
  • 20:19 jgleeson: updated payments from 939cb4bc to 8497eae9
  • 20:17 mutante: applying firewall change on phabricator (VCS, git-ssh), second attempt, first codfw-only
  • 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T297191)', diff saved to https://phabricator.wikimedia.org/P18701 and previous config saved to /var/cache/conftool/dbconfig/20220112-201114-marostegui.json
  • 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T297191)', diff saved to https://phabricator.wikimedia.org/P18700 and previous config saved to /var/cache/conftool/dbconfig/20220112-200806-marostegui.json
  • 20:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 20:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T297191)', diff saved to https://phabricator.wikimedia.org/P18699 and previous config saved to /var/cache/conftool/dbconfig/20220112-200759-marostegui.json
  • 19:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P18698 and previous config saved to /var/cache/conftool/dbconfig/20220112-195254-marostegui.json
  • 19:52 hashar: Restarting CI Jenkins once more to apply the Gearman plugin update T298691
  • 19:44 hashar: Clearing /srv partition on integration-castor03
  • 19:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P18697 and previous config saved to /var/cache/conftool/dbconfig/20220112-193749-marostegui.json
  • 19:34 hashar: Upgrading CI Jenkins and Gearman plugin T298691
  • 19:29 mutante: wdqs2003 - one power supply failed so it's not redundant anymore, says Icinga
  • 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:25 cwhite: begin eqiad opensearch upgrade T288621
  • 19:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T297191)', diff saved to https://phabricator.wikimedia.org/P18696 and previous config saved to /var/cache/conftool/dbconfig/20220112-192244-marostegui.json
  • 19:22 mutante: deneb - for some reason the "package builder clean up build directory"-service fails T287222
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:21 cjming: end of UTC evening backport & config window
  • 19:21 mutante: [deneb:~] $ sudo systemctl start package_builder_Clean_up_build_directory.service
  • 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:19 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add new vector skin key to RelatedArticlesFooterAllowedSkins. (T298916) (duration: 01m 21s)
  • 19:18 mutante: pybal-test2002 - apt-get clean after icinga alert about disk space running out
  • 19:17 mutante: zookeeper-test1002 - CRITICAL - degraded: The following units failed: ifup@ens5.service - for this issue see T273026 (T268074)
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:14 mutante: elastic10180 - one power supply seeming failed - see icinga IPMI alert - [Status = Critical, PS Redundancy = Critical] T294805
  • 19:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T297191)', diff saved to https://phabricator.wikimedia.org/P18695 and previous config saved to /var/cache/conftool/dbconfig/20220112-191436-marostegui.json
  • 19:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 19:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 19:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T297191)', diff saved to https://phabricator.wikimedia.org/P18694 and previous config saved to /var/cache/conftool/dbconfig/20220112-191428-marostegui.json
  • 19:13 cjming@deploy1002: Synchronized php-1.38.0-wmf.17/includes/export/WikiExporter.php: Backport: Partial revert of I1a691f01cd82e60bf41207d32501edb4b9835e37 to unbreak dumps (T299020) (duration: 01m 22s)
  • 19:12 mutante: mirror1001 - CRITICAL - degraded: The following units failed: update-ubuntu-mirror.service - T286898
  • 19:09 hashar: Upgraded releases Jenkins from 2.319.1 to 2.319.2 # T298691
  • 19:06 moritzm: imported jenkins 2.319.2 to thirdparty/ci fpr buster-wikimedia
  • 19:05 mutante: [mwmaint1002:~] $ sudo systemctl status mediawiki_job_updatequerypages_mostlinked_s3@13.service (running fine but had failed for unknown reason last time it was supposed to run automatically)
  • 18:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P18693 and previous config saved to /var/cache/conftool/dbconfig/20220112-185923-marostegui.json
  • 18:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
  • 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
  • 18:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P18692 and previous config saved to /var/cache/conftool/dbconfig/20220112-184418-marostegui.json
  • 18:40 mutante: phab1001 - temp disabling puppet - deployed firewall change on phab2001 - debugging - no impact
  • 18:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T297191)', diff saved to https://phabricator.wikimedia.org/P18691 and previous config saved to /var/cache/conftool/dbconfig/20220112-182913-marostegui.json
  • 18:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T297191)', diff saved to https://phabricator.wikimedia.org/P18690 and previous config saved to /var/cache/conftool/dbconfig/20220112-182806-marostegui.json
  • 18:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T297191)', diff saved to https://phabricator.wikimedia.org/P18689 and previous config saved to /var/cache/conftool/dbconfig/20220112-182725-marostegui.json
  • 18:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P18688 and previous config saved to /var/cache/conftool/dbconfig/20220112-181220-marostegui.json
  • 17:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P18687 and previous config saved to /var/cache/conftool/dbconfig/20220112-175715-marostegui.json
  • 17:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T297191)', diff saved to https://phabricator.wikimedia.org/P18686 and previous config saved to /var/cache/conftool/dbconfig/20220112-174211-marostegui.json
  • 17:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T297191)', diff saved to https://phabricator.wikimedia.org/P18685 and previous config saved to /var/cache/conftool/dbconfig/20220112-174103-marostegui.json
  • 17:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 17:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T297191)', diff saved to https://phabricator.wikimedia.org/P18684 and previous config saved to /var/cache/conftool/dbconfig/20220112-174056-marostegui.json
  • 17:38 _joe_: deploying scap 4.1.1 to the restbase canaries T298986
  • 17:34 _joe_: deploying scap 4.1.1 to the mediawiki canaries T298986
  • 17:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1169.eqiad.wmnet with OS bullseye
  • 17:27 dancy@deploy1002: Started scap: testing
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18683 and previous config saved to /var/cache/conftool/dbconfig/20220112-172551-marostegui.json
  • 17:25 dancy@deploy1002: Started scap: testing
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18682 and previous config saved to /var/cache/conftool/dbconfig/20220112-171047-marostegui.json
  • 17:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:06 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 21s)
  • 17:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bullseye
  • 16:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter1005.eqiad.wmnet
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T297191)', diff saved to https://phabricator.wikimedia.org/P18681 and previous config saved to /var/cache/conftool/dbconfig/20220112-165542-marostegui.json
  • 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T297191)', diff saved to https://phabricator.wikimedia.org/P18680 and previous config saved to /var/cache/conftool/dbconfig/20220112-165434-marostegui.json
  • 16:54 akosiaris@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter1005.eqiad.wmnet
  • 16:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 16:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 16:53 hnowlan: Decommissioning cassandra instance restbase2009-c via nodetool
  • 16:48 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 16:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:46 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 21s)
  • 16:45 elukey: elukey@prometheus2004:~$ sudo apt-get remove linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64 linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64
  • 16:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:44 elukey: elukey@prometheus2003:~$ sudo apt-get remove linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64 linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64
  • 16:40 elukey: elukey@prometheus1004:~$ sudo apt-get remove linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64 linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64
  • 16:39 elukey: elukey@prometheus1003:~$ sudo apt-get remove linux-image-4.9.0-11-amd64 linux-image-4.9.0-12-amd64 linux-image-4.9.0-13-amd64 linux-image-4.9.0-8-amd64 linux-image-4.9.0-9-amd64
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P18678 and previous config saved to /var/cache/conftool/dbconfig/20220112-163919-marostegui.json
  • 16:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mx1001.wikimedia.org
  • 16:36 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter1004.eqiad.wmnet
  • 16:35 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mx1001.wikimedia.org
  • 16:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:31 akosiaris@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter1004.eqiad.wmnet
  • 16:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:25 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 16s)
  • 16:25 elukey: stop kafka* on kafka-main1003 to allow dcops maintenance (nic/bios upgrades) - T298867
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P18677 and previous config saved to /var/cache/conftool/dbconfig/20220112-162414-marostegui.json
  • 16:20 moritzm: switch kubestagetcd1006 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
  • 16:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: switch to DRBD disk storage
  • 16:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: switch to DRBD disk storage
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T297191)', diff saved to https://phabricator.wikimedia.org/P18676 and previous config saved to /var/cache/conftool/dbconfig/20220112-160910-marostegui.json
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T297191)', diff saved to https://phabricator.wikimedia.org/P18675 and previous config saved to /var/cache/conftool/dbconfig/20220112-160802-marostegui.json
  • 16:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 16:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T297191)', diff saved to https://phabricator.wikimedia.org/P18674 and previous config saved to /var/cache/conftool/dbconfig/20220112-160755-marostegui.json
  • 16:02 elukey: stop kafka* on kafka-main1002 to allow dcops maintenance (nic/bios upgrades) - T298867
  • 15:57 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
  • 15:56 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
  • 15:56 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch
  • 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P18673 and previous config saved to /var/cache/conftool/dbconfig/20220112-155250-marostegui.json
  • 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P18672 and previous config saved to /var/cache/conftool/dbconfig/20220112-153745-marostegui.json
  • 15:23 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch
  • 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T297191)', diff saved to https://phabricator.wikimedia.org/P18671 and previous config saved to /var/cache/conftool/dbconfig/20220112-152240-marostegui.json
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T297191)', diff saved to https://phabricator.wikimedia.org/P18670 and previous config saved to /var/cache/conftool/dbconfig/20220112-152133-marostegui.json
  • 15:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 15:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 15:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 15:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T297191)', diff saved to https://phabricator.wikimedia.org/P18669 and previous config saved to /var/cache/conftool/dbconfig/20220112-152121-marostegui.json
  • 15:14 elukey: stop kafka* on kafka-main1001 to allow dcops maintenance (nic/bios upgrades) - T298867
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P18668 and previous config saved to /var/cache/conftool/dbconfig/20220112-150616-marostegui.json
  • 14:59 moritzm: switch kubestagetcd1005 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
  • 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: switch to DRBD disk storage
  • 14:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: switch to DRBD disk storage
  • 14:56 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
  • 14:55 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
  • 14:54 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply on main
  • 14:54 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P18667 and previous config saved to /var/cache/conftool/dbconfig/20220112-145111-marostegui.json
  • 14:42 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
  • 14:42 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
  • 14:40 jelto: remove helm2 from deployment_server T251305 https://gerrit.wikimedia.org/r/c/operations/puppet/+/753026
  • 14:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: sync on staging
  • 14:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply on production
  • 14:37 jelto@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply on staging
  • 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow1002.eqiad.wmnet
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T297191)', diff saved to https://phabricator.wikimedia.org/P18666 and previous config saved to /var/cache/conftool/dbconfig/20220112-143606-marostegui.json
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T297191)', diff saved to https://phabricator.wikimedia.org/P18665 and previous config saved to /var/cache/conftool/dbconfig/20220112-143258-marostegui.json
  • 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 14:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 14:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T297191)', diff saved to https://phabricator.wikimedia.org/P18664 and previous config saved to /var/cache/conftool/dbconfig/20220112-143241-marostegui.json
  • 14:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow1002.eqiad.wmnet
  • 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:23 moritzm: switch kubestagetcd1004 to DRBD (needed to be able to shuffle instances around for the Ganeti buster update)
  • 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: switch to DRBD disk storage
  • 14:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: switch to DRBD disk storage
  • 14:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P18663 and previous config saved to /var/cache/conftool/dbconfig/20220112-141736-marostegui.json
  • 14:17 ladsgroup@deploy1002: Synchronized wmf-config: Config: Merge db-codfw.php and db-eqiad.php into db-production.php (T260297), Part III (duration: 01m 07s)
  • 14:15 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Merge db-codfw.php and db-eqiad.php into db-production.php (T260297), Part II (duration: 01m 08s)
  • 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf1002.eqiad.wmnet
  • 14:14 ladsgroup@deploy1002: Synchronized wmf-config/db-production.php: Config: Merge db-codfw.php and db-eqiad.php into db-production.php (T260297), Part I (duration: 01m 07s)
  • 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:09 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf1002.eqiad.wmnet
  • 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf1001.eqiad.wmnet
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P18662 and previous config saved to /var/cache/conftool/dbconfig/20220112-140232-marostegui.json
  • 14:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf1001.eqiad.wmnet
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 T295965', diff saved to https://phabricator.wikimedia.org/P18661 and previous config saved to /var/cache/conftool/dbconfig/20220112-135858-marostegui.json
  • 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T297191)', diff saved to https://phabricator.wikimedia.org/P18659 and previous config saved to /var/cache/conftool/dbconfig/20220112-134727-marostegui.json
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T297191)', diff saved to https://phabricator.wikimedia.org/P18658 and previous config saved to /var/cache/conftool/dbconfig/20220112-134620-marostegui.json
  • 13:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 13:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18657 and previous config saved to /var/cache/conftool/dbconfig/20220112-134103-root.json
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:30 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable flaggedrevs stable template inclusion in ruwikisource (T226054) (duration: 01m 08s)
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18656 and previous config saved to /var/cache/conftool/dbconfig/20220112-132600-root.json
  • 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:23 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1004.eqiad.wmnet
  • 13:20 urbanecm@deploy1002: Finished scap: 4b1e241: Undo update to the way the search interface is set (duration: 19m 19s)
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard1002.eqiad.wmnet
  • 13:18 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1004.eqiad.wmnet
  • 13:14 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard1002.eqiad.wmnet
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1003.eqiad.wmnet
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18655 and previous config saved to /var/cache/conftool/dbconfig/20220112-131056-root.json
  • 13:08 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1003.eqiad.wmnet
  • 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM debmonitor1002.eqiad.wmnet
  • 13:01 urbanecm@deploy1002: Started scap: 4b1e241: Undo update to the way the search interface is set
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 T295965', diff saved to https://phabricator.wikimedia.org/P18654 and previous config saved to /var/cache/conftool/dbconfig/20220112-130050-marostegui.json
  • 13:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM debmonitor1002.eqiad.wmnet
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18653 and previous config saved to /var/cache/conftool/dbconfig/20220112-125552-root.json
  • 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM failoid1002.eqiad.wmnet
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist group from s7 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18652 and previous config saved to /var/cache/conftool/dbconfig/20220112-125402-marostegui.json
  • 12:52 awight: EU deployment reopened :-)
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P18651 and previous config saved to /var/cache/conftool/dbconfig/20220112-125208-marostegui.json
  • 12:51 awight: EU deployment complete
  • 12:50 awight@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/TemplateData: Backport: Allow aliases to be integers in addition to strings (T298795) (duration: 01m 07s)
  • 12:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM failoid1002.eqiad.wmnet
  • 12:48 Amir1: removing orphan lint error reports in all wikis (T298782)
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T297191)', diff saved to https://phabricator.wikimedia.org/P18650 and previous config saved to /var/cache/conftool/dbconfig/20220112-124514-marostegui.json
  • 12:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P18649 and previous config saved to /var/cache/conftool/dbconfig/20220112-123010-marostegui.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 T295965', diff saved to https://phabricator.wikimedia.org/P18648 and previous config saved to /var/cache/conftool/dbconfig/20220112-122742-marostegui.json
  • 12:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P18647 and previous config saved to /var/cache/conftool/dbconfig/20220112-121505-marostegui.json
  • 12:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cfe389a: fawiki: Add extendedmover usergroup (T299038) (duration: 01m 08s)
  • 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc1002.eqiad.wmnet
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 T295965', diff saved to https://phabricator.wikimedia.org/P18646 and previous config saved to /var/cache/conftool/dbconfig/20220112-120931-marostegui.json
  • 12:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc1002.eqiad.wmnet
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc1001.eqiad.wmnet
  • 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc1001.eqiad.wmnet
  • 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM releases1002.eqiad.wmnet
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T297191)', diff saved to https://phabricator.wikimedia.org/P18645 and previous config saved to /var/cache/conftool/dbconfig/20220112-120000-marostegui.json
  • 11:58 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM releases1002.eqiad.wmnet
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 T295965', diff saved to https://phabricator.wikimedia.org/P18644 and previous config saved to /var/cache/conftool/dbconfig/20220112-115259-marostegui.json
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T297191)', diff saved to https://phabricator.wikimedia.org/P18643 and previous config saved to /var/cache/conftool/dbconfig/20220112-115031-marostegui.json
  • 11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T297191)', diff saved to https://phabricator.wikimedia.org/P18642 and previous config saved to /var/cache/conftool/dbconfig/20220112-115024-marostegui.json
  • 11:42 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
  • 11:42 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P18641 and previous config saved to /var/cache/conftool/dbconfig/20220112-113518-marostegui.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 T295965', diff saved to https://phabricator.wikimedia.org/P18640 and previous config saved to /var/cache/conftool/dbconfig/20220112-113119-marostegui.json
  • 11:21 elukey: move kafka-jumbo nodes to fixed kafka uid/gid - T296990
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P18639 and previous config saved to /var/cache/conftool/dbconfig/20220112-112013-marostegui.json
  • 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T297191)', diff saved to https://phabricator.wikimedia.org/P18638 and previous config saved to /var/cache/conftool/dbconfig/20220112-110508-marostegui.json
  • 11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dborch1001.wikimedia.org
  • 11:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM dborch1001.wikimedia.org
  • 10:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:59 moritzm: rebalance ganeti/codfw row B (all nodes reimaged to Buster)
  • 10:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Give more traffic to db1128 in s1 T295965', diff saved to https://phabricator.wikimedia.org/P18637 and previous config saved to /var/cache/conftool/dbconfig/20220112-105650-marostegui.json
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T297191)', diff saved to https://phabricator.wikimedia.org/P18636 and previous config saved to /var/cache/conftool/dbconfig/20220112-105540-marostegui.json
  • 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T297191)', diff saved to https://phabricator.wikimedia.org/P18635 and previous config saved to /var/cache/conftool/dbconfig/20220112-105532-marostegui.json
  • 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dbmonitor1002.wikimedia.org
  • 10:52 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: sync on main
  • 10:50 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply on main
  • 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM dbmonitor1002.wikimedia.org
  • 10:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply on main
  • 10:50 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply on main
  • 10:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:48 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
  • 10:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:47 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
  • 10:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:42 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply on main
  • 10:42 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply on main
  • 10:41 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply on main
  • 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P18634 and previous config saved to /var/cache/conftool/dbconfig/20220112-104028-marostegui.json
  • 10:39 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: sync on main
  • 10:38 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply on main
  • 10:37 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1128 in s1 with minimal weight T295965', diff saved to https://phabricator.wikimedia.org/P18633 and previous config saved to /var/cache/conftool/dbconfig/20220112-103619-marostegui.json
  • 10:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
  • 10:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply on main
  • 10:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply on main
  • 10:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply on main
  • 10:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply on main
  • 10:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: sync on main
  • 10:32 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply on main
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1128', diff saved to https://phabricator.wikimedia.org/P18632 and previous config saved to /var/cache/conftool/dbconfig/20220112-103144-marostegui.json
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1128 in s1 with minimal weight T295965', diff saved to https://phabricator.wikimedia.org/P18631 and previous config saved to /var/cache/conftool/dbconfig/20220112-102938-marostegui.json
  • 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P18630 and previous config saved to /var/cache/conftool/dbconfig/20220112-102523-marostegui.json
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T297191)', diff saved to https://phabricator.wikimedia.org/P18629 and previous config saved to /var/cache/conftool/dbconfig/20220112-101018-marostegui.json
  • 10:08 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab1001.wikimedia.org
  • 10:06 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab1001.wikimedia.org
  • 10:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:57 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Revert: Promote pc1014 to master in pc1 (duration: 01m 07s)
  • 09:54 hnowlan: Decommissioning cassandra instance restbase2009-b via nodetool
  • 09:53 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab-runner1001.eqiad.wmnet
  • 09:51 moritzm: reverting kubetcd2006 back to "plain" storage
  • 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to plain disk storage
  • 09:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2006.codfw.wmnet with reason: switch to plain disk storage
  • 09:51 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab-runner1001.eqiad.wmnet
  • 09:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1011.eqiad.wmnet with OS bullseye
  • 09:21 moritzm: reverting kubetcd2005 back to "plain" storage
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: switch to plain disk storage
  • 09:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: switch to plain disk storage
  • 09:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1011.eqiad.wmnet with OS bullseye
  • 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T297191)', diff saved to https://phabricator.wikimedia.org/P18628 and previous config saved to /var/cache/conftool/dbconfig/20220112-090959-marostegui.json
  • 09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:08 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to master in pc1 (duration: 01m 08s)
  • 09:05 marostegui: Reset replication on pc1014
  • 08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T297191)', diff saved to https://phabricator.wikimedia.org/P18627 and previous config saved to /var/cache/conftool/dbconfig/20220112-085024-marostegui.json
  • 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM miscweb1002.eqiad.wmnet
  • 08:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM miscweb1002.eqiad.wmnet
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P18626 and previous config saved to /var/cache/conftool/dbconfig/20220112-083520-marostegui.json
  • 08:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug1002.eqiad.wmnet
  • 08:27 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug1002.eqiad.wmnet
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug1001.eqiad.wmnet
  • 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug1001.eqiad.wmnet
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P18625 and previous config saved to /var/cache/conftool/dbconfig/20220112-082015-marostegui.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T297191)', diff saved to https://phabricator.wikimedia.org/P18624 and previous config saved to /var/cache/conftool/dbconfig/20220112-080510-marostegui.json
  • 08:00 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
  • 07:59 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
  • 07:57 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: sync on main
  • 07:56 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply on main
  • 07:53 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: sync on main
  • 07:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply on main
  • 07:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: sync on main
  • 07:46 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply on main
  • 07:44 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: sync on main
  • 07:41 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply on main
  • 07:41 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
  • 07:40 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
  • 07:40 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: sync on main
  • 07:37 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply on main
  • 07:37 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: sync on main
  • 07:37 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply on main
  • 07:29 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: sync on main
  • 07:28 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply on main
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T297191)', diff saved to https://phabricator.wikimedia.org/P18623 and previous config saved to /var/cache/conftool/dbconfig/20220112-072826-marostegui.json
  • 07:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 07:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T297191)', diff saved to https://phabricator.wikimedia.org/P18622 and previous config saved to /var/cache/conftool/dbconfig/20220112-071003-marostegui.json
  • 07:02 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1169.eqiad.wmnet with OS bullseye
  • 06:58 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: sync on main
  • 06:58 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply on main
  • 06:58 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: sync on main
  • 06:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply on main
  • 06:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: sync on main
  • 06:55 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply on main
  • 06:55 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: sync on main
  • 06:55 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply on main
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P18621 and previous config saved to /var/cache/conftool/dbconfig/20220112-065458-marostegui.json
  • 06:53 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: sync on main
  • 06:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply on main
  • 06:51 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: sync on main
  • 06:50 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply on main
  • 06:49 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: sync on main
  • 06:48 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply on main
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P18620 and previous config saved to /var/cache/conftool/dbconfig/20220112-063953-marostegui.json
  • 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bullseye
  • 06:36 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1169.eqiad.wmnet with OS bullseye
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T297191)', diff saved to https://phabricator.wikimedia.org/P18619 and previous config saved to /var/cache/conftool/dbconfig/20220112-062449-marostegui.json
  • 06:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bullseye
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T297191)', diff saved to https://phabricator.wikimedia.org/P18618 and previous config saved to /var/cache/conftool/dbconfig/20220112-060923-marostegui.json
  • 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 for Bullseye reimage T295965', diff saved to https://phabricator.wikimedia.org/P18617 and previous config saved to /var/cache/conftool/dbconfig/20220112-060803-marostegui.json
  • 06:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 00:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:09 urbanecm: UTC late evening B&C done
  • 00:09 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 24a2639: Enable Disambiguator notifications for French Wikipedia (T293319) (duration: 01m 08s)
  • 00:05 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 00:03 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)

2022-01-11

  • 23:56 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:48 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 23:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 23:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 23:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 23:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 23:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 23:05 dduvall@deploy1002: Synchronized php-1.38.0-wmf.17/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.DesktopArticleTarget.js: Backport: Watchlist API update: Call correct method (T298999) (duration: 02m 40s)
  • 23:04 dduvall: syncing backport to fix VE regression that followed testwiki/group0 deployment (cc T293958)
  • 21:29 mutante: mw1418 - apt-get remove --purge fonts*; apt-get remove --purge xfonts*; running puppet - nothing gets reinstalled and with --purge it means 'dpkg -l | grep fonts' is actually empty, not full of "rc" still - T294378
  • 21:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T297191)', diff saved to https://phabricator.wikimedia.org/P18615 and previous config saved to /var/cache/conftool/dbconfig/20220111-211134-marostegui.json
  • 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P18614 and previous config saved to /var/cache/conftool/dbconfig/20220111-205629-marostegui.json
  • 20:56 mutante: mw1418 (lowest numbered canary appserver that we use for httpbb hourly tests on cumin1001) - apt-get autoremove - removed font* and python3* packages - reason: T294378
  • 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:42 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1009.eqiad.wmnet
  • 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P18613 and previous config saved to /var/cache/conftool/dbconfig/20220111-204124-marostegui.json
  • 20:38 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1009.eqiad.wmnet
  • 20:38 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.17 refs T293958
  • 20:36 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1008.eqiad.wmnet
  • 20:32 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1008.eqiad.wmnet
  • 20:31 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1007.eqiad.wmnet
  • 20:31 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1032.eqiad.wmnet
  • 20:27 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1007.eqiad.wmnet
  • 20:27 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1032.eqiad.wmnet
  • 20:26 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1031.eqiad.wmnet
  • 20:26 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1030.eqiad.wmnet
  • 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T297191)', diff saved to https://phabricator.wikimedia.org/P18612 and previous config saved to /var/cache/conftool/dbconfig/20220111-202620-marostegui.json
  • 20:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T297191)', diff saved to https://phabricator.wikimedia.org/P18611 and previous config saved to /var/cache/conftool/dbconfig/20220111-202513-marostegui.json
  • 20:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 20:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 20:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T297191)', diff saved to https://phabricator.wikimedia.org/P18610 and previous config saved to /var/cache/conftool/dbconfig/20220111-202505-marostegui.json
  • 20:23 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1031.eqiad.wmnet
  • 20:23 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1030.eqiad.wmnet
  • 20:17 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1024.eqiad.wmnet
  • 20:17 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1025.eqiad.wmnet
  • 20:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P18609 and previous config saved to /var/cache/conftool/dbconfig/20220111-201000-marostegui.json
  • 20:09 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1025.eqiad.wmnet
  • 20:08 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1024.eqiad.wmnet
  • 20:01 dduvall@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.17 refs T293958 (duration: 39m 38s)
  • 19:59 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash1023.eqiad.wmnet
  • 19:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P18608 and previous config saved to /var/cache/conftool/dbconfig/20220111-195456-marostegui.json
  • 19:53 cwhite@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash1023.eqiad.wmnet
  • 19:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T297191)', diff saved to https://phabricator.wikimedia.org/P18607 and previous config saved to /var/cache/conftool/dbconfig/20220111-193951-marostegui.json
  • 19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T297191)', diff saved to https://phabricator.wikimedia.org/P18606 and previous config saved to /var/cache/conftool/dbconfig/20220111-193844-marostegui.json
  • 19:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 19:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T297191)', diff saved to https://phabricator.wikimedia.org/P18605 and previous config saved to /var/cache/conftool/dbconfig/20220111-193836-marostegui.json
  • 19:30 sukhe: upload pdns-recursor_4.6.0-1wm1 to apt.wm.o (buster) - T252132
  • 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P18604 and previous config saved to /var/cache/conftool/dbconfig/20220111-192331-marostegui.json
  • 19:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:21 dduvall@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.17 refs T293958
  • 19:17 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum1002.eqiad.wmnet
  • 19:13 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum1002.eqiad.wmnet
  • 19:13 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum1001.eqiad.wmnet
  • 19:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P18603 and previous config saved to /var/cache/conftool/dbconfig/20220111-190827-marostegui.json
  • 19:05 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum1001.eqiad.wmnet
  • 19:05 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh1002.wikimedia.org
  • 19:04 dduvall@deploy1002: Pruned MediaWiki: 1.38.0-wmf.9 (duration: 15m 51s)
  • 19:01 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh1002.wikimedia.org
  • 19:00 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh1001.wikimedia.org
  • 18:58 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh1001.wikimedia.org
  • 18:57 ebernhardson: clear wcqs.jnl and aliases.map for all wcqs instances T296470
  • 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T297191)', diff saved to https://phabricator.wikimedia.org/P18602 and previous config saved to /var/cache/conftool/dbconfig/20220111-185322-marostegui.json
  • 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T297191)', diff saved to https://phabricator.wikimedia.org/P18601 and previous config saved to /var/cache/conftool/dbconfig/20220111-185215-marostegui.json
  • 18:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T297191)', diff saved to https://phabricator.wikimedia.org/P18600 and previous config saved to /var/cache/conftool/dbconfig/20220111-185208-marostegui.json
  • 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:41 _joe_: also ran apt-get autoremove on mwdebug1002
  • 18:41 _joe_: installed scap 4.1.1 on mwdebug1002 T298986, ran scap pull successfully
  • 18:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P18599 and previous config saved to /var/cache/conftool/dbconfig/20220111-183703-marostegui.json
  • 18:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-coord1002.eqiad.wmnet with OS buster
  • 18:29 _joe_: uploaded scap 4.1.1-1 to apt T298986
  • 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P18598 and previous config saved to /var/cache/conftool/dbconfig/20220111-182158-marostegui.json
  • 18:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS buster
  • 18:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T297191)', diff saved to https://phabricator.wikimedia.org/P18597 and previous config saved to /var/cache/conftool/dbconfig/20220111-180653-marostegui.json
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T297191)', diff saved to https://phabricator.wikimedia.org/P18596 and previous config saved to /var/cache/conftool/dbconfig/20220111-180547-marostegui.json
  • 18:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T297191)', diff saved to https://phabricator.wikimedia.org/P18595 and previous config saved to /var/cache/conftool/dbconfig/20220111-180534-marostegui.json
  • 17:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P18594 and previous config saved to /var/cache/conftool/dbconfig/20220111-175029-marostegui.json
  • 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2009.codfw.wmnet
  • 17:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P18593 and previous config saved to /var/cache/conftool/dbconfig/20220111-173524-marostegui.json
  • 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T297191)', diff saved to https://phabricator.wikimedia.org/P18592 and previous config saved to /var/cache/conftool/dbconfig/20220111-172019-marostegui.json
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T297191)', diff saved to https://phabricator.wikimedia.org/P18591 and previous config saved to /var/cache/conftool/dbconfig/20220111-171912-marostegui.json
  • 17:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T297191)', diff saved to https://phabricator.wikimedia.org/P18590 and previous config saved to /var/cache/conftool/dbconfig/20220111-171905-marostegui.json
  • 17:13 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@65895c0]: Remove cassandra from kartotherian sources (duration: 02m 04s)
  • 17:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir1002.eqiad.wmnet
  • 17:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@65895c0]: Remove cassandra from kartotherian sources
  • 17:10 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@65895c0]: Remove cassandra from kartotherian sources (duration: 03m 33s)
  • 17:08 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir1002.eqiad.wmnet
  • 17:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir1001.eqiad.wmnet
  • 17:07 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@65895c0]: Remove cassandra from kartotherian sources
  • 17:06 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:06 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:04 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P18589 and previous config saved to /var/cache/conftool/dbconfig/20220111-170400-marostegui.json
  • 17:03 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:03 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir1001.eqiad.wmnet
  • 17:03 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:00 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P18588 and previous config saved to /var/cache/conftool/dbconfig/20220111-164856-marostegui.json
  • 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T297191)', diff saved to https://phabricator.wikimedia.org/P18587 and previous config saved to /var/cache/conftool/dbconfig/20220111-163351-marostegui.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T297191)', diff saved to https://phabricator.wikimedia.org/P18586 and previous config saved to /var/cache/conftool/dbconfig/20220111-163244-marostegui.json
  • 16:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 16:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T297191)', diff saved to https://phabricator.wikimedia.org/P18585 and previous config saved to /var/cache/conftool/dbconfig/20220111-163237-marostegui.json
  • 16:29 arturo: aborrero@apt1001:~ $ sudo -i reprepro clearvanished
  • 16:23 arturo: aborrero@apt1001:~ $ sudo -i reprepro --noskipold --component thirdparty/kubeadm-k8s-1-21 update buster-wikimedia
  • 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P18584 and previous config saved to /var/cache/conftool/dbconfig/20220111-161732-marostegui.json
  • 16:03 cwhite: begin rolling restart of opensearch in codfw - jvm upgrade
  • 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P18583 and previous config saved to /var/cache/conftool/dbconfig/20220111-160227-marostegui.json
  • 15:59 vgutierrez: re-enable puppet on acme-chief clients after acmechief1001 reboot - T294120
  • 15:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief1001.eqiad.wmnet
  • 15:56 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief1001.eqiad.wmnet
  • 15:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2009.codfw.wmnet with reason: Decommissioning - hnowlan
  • 15:56 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2009.codfw.wmnet with reason: Decommissioning - hnowlan
  • 15:55 vgutierrez: disable puppet on acme-chief clients for acmechief1001 reboot - T294120
  • 15:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief-test1001.eqiad.wmnet
  • 15:51 ebernhardson: restart elasticserach_6@production-search-psi-eqiad on elastic1049 to resolve issue with full heap
  • 15:47 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief-test1001.eqiad.wmnet
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T297191)', diff saved to https://phabricator.wikimedia.org/P18582 and previous config saved to /var/cache/conftool/dbconfig/20220111-154722-marostegui.json
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T297191)', diff saved to https://phabricator.wikimedia.org/P18580 and previous config saved to /var/cache/conftool/dbconfig/20220111-154615-marostegui.json
  • 15:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T297191)', diff saved to https://phabricator.wikimedia.org/P18579 and previous config saved to /var/cache/conftool/dbconfig/20220111-154608-marostegui.json
  • 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P18578 and previous config saved to /var/cache/conftool/dbconfig/20220111-153103-marostegui.json
  • 15:30 hnowlan: Decommissioning cassandra instance restbase2009-a via nodetool
  • 15:22 arnoldokoth: systemctl reset-failed ifup@ens5.service on otrs1001 T273026
  • 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P18577 and previous config saved to /var/cache/conftool/dbconfig/20220111-151558-marostegui.json
  • 15:10 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM otrs1001.eqiad.wmnet
  • 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki1001.eqiad.wmnet
  • 15:04 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM rpki1001.eqiad.wmnet
  • 15:02 aokoth@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM otrs1001.eqiad.wmnet
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T297191)', diff saved to https://phabricator.wikimedia.org/P18576 and previous config saved to /var/cache/conftool/dbconfig/20220111-150054-marostegui.json
  • 15:00 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM etherpad1002.eqiad.wmnet
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T297191)', diff saved to https://phabricator.wikimedia.org/P18575 and previous config saved to /var/cache/conftool/dbconfig/20220111-145947-marostegui.json
  • 14:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 14:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T297191)', diff saved to https://phabricator.wikimedia.org/P18574 and previous config saved to /var/cache/conftool/dbconfig/20220111-145939-marostegui.json
  • 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM zookeeper-test1002.eqiad.wmnet
  • 14:56 aokoth@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM etherpad1002.eqiad.wmnet
  • 14:48 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM zookeeper-test1002.eqiad.wmnet
  • 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ping1002.eqiad.wmnet
  • 14:44 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ping1002.eqiad.wmnet
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P18573 and previous config saved to /var/cache/conftool/dbconfig/20220111-144435-marostegui.json
  • 14:38 XioNoX: disable ping-offload in eqiad
  • 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:35 marostegui: Upgrade pc1014 mysql
  • 14:33 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Clean up nova-network remains (2/2) (duration: 02m 40s)
  • 14:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:31 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Clean up nova-network remains (1/2) (duration: 02m 49s)
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P18572 and previous config saved to /var/cache/conftool/dbconfig/20220111-142930-marostegui.json
  • 14:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:25 taavi@deploy1002: Synchronized wmf-config/reverse-proxy.php: Config: reverse-proxy: add drmrs ranges (T282787) (duration: 01m 36s)
  • 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1021.eqiad.wmnet with OS bullseye
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T297191)', diff saved to https://phabricator.wikimedia.org/P18571 and previous config saved to /var/cache/conftool/dbconfig/20220111-141425-marostegui.json
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T297191)', diff saved to https://phabricator.wikimedia.org/P18570 and previous config saved to /var/cache/conftool/dbconfig/20220111-141318-marostegui.json
  • 14:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 14:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 14:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 14:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 14:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 14:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 14:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T297191)', diff saved to https://phabricator.wikimedia.org/P18569 and previous config saved to /var/cache/conftool/dbconfig/20220111-141249-marostegui.json
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P18568 and previous config saved to /var/cache/conftool/dbconfig/20220111-135744-marostegui.json
  • 13:50 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1021.eqiad.wmnet with OS bullseye
  • 13:43 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P18567 and previous config saved to /var/cache/conftool/dbconfig/20220111-134239-marostegui.json
  • 13:36 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 13:36 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 13:33 moritzm: installing 4.9.290 kernels von stretch systems (no reboots yet)
  • 13:29 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T297191)', diff saved to https://phabricator.wikimedia.org/P18565 and previous config saved to /var/cache/conftool/dbconfig/20220111-132734-marostegui.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T297191)', diff saved to https://phabricator.wikimedia.org/P18564 and previous config saved to /var/cache/conftool/dbconfig/20220111-132627-marostegui.json
  • 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 13:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM people1003.eqiad.wmnet
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:07 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM people1003.eqiad.wmnet
  • 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM planet1002.eqiad.wmnet
  • 12:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM planet1002.eqiad.wmnet
  • 12:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T297191)', diff saved to https://phabricator.wikimedia.org/P18563 and previous config saved to /var/cache/conftool/dbconfig/20220111-122143-marostegui.json
  • 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:15 cparle@deploy1002: Synchronized wmf-config: Config: Enable support for references (T230315) (duration: 01m 00s)
  • 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubetcd2004.codfw.wmnet with reason: switch to plain disk storage
  • 12:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubetcd2004.codfw.wmnet with reason: switch to plain disk storage
  • 12:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18562 and previous config saved to /var/cache/conftool/dbconfig/20220111-121025-root.json
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18561 and previous config saved to /var/cache/conftool/dbconfig/20220111-120638-marostegui.json
  • 12:00 moritzm: reverting kubetcd2004.codfw.wmnet back to "plain" storage
  • 11:56 moritzm: rebalance ganeti row A (all nodes reimaged to Buster)
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18560 and previous config saved to /var/cache/conftool/dbconfig/20220111-115522-root.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18559 and previous config saved to /var/cache/conftool/dbconfig/20220111-115133-marostegui.json
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18558 and previous config saved to /var/cache/conftool/dbconfig/20220111-114018-root.json
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T297191)', diff saved to https://phabricator.wikimedia.org/P18557 and previous config saved to /var/cache/conftool/dbconfig/20220111-113628-marostegui.json
  • 11:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T297191)', diff saved to https://phabricator.wikimedia.org/P18556 and previous config saved to /var/cache/conftool/dbconfig/20220111-113216-marostegui.json
  • 11:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T297191)', diff saved to https://phabricator.wikimedia.org/P18555 and previous config saved to /var/cache/conftool/dbconfig/20220111-113208-marostegui.json
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
  • 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18554 and previous config saved to /var/cache/conftool/dbconfig/20220111-112514-root.json
  • 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18553 and previous config saved to /var/cache/conftool/dbconfig/20220111-111704-marostegui.json
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18551 and previous config saved to /var/cache/conftool/dbconfig/20220111-110159-marostegui.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T297191)', diff saved to https://phabricator.wikimedia.org/P18550 and previous config saved to /var/cache/conftool/dbconfig/20220111-104654-marostegui.json
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T297191)', diff saved to https://phabricator.wikimedia.org/P18549 and previous config saved to /var/cache/conftool/dbconfig/20220111-103941-marostegui.json
  • 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T297191)', diff saved to https://phabricator.wikimedia.org/P18548 and previous config saved to /var/cache/conftool/dbconfig/20220111-103927-marostegui.json
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18547 and previous config saved to /var/cache/conftool/dbconfig/20220111-102421-marostegui.json
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18546 and previous config saved to /var/cache/conftool/dbconfig/20220111-100917-marostegui.json
  • 09:58 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
  • 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2019.codfw.wmnet with OS buster
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T297191)', diff saved to https://phabricator.wikimedia.org/P18545 and previous config saved to /var/cache/conftool/dbconfig/20220111-095408-marostegui.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T297191)', diff saved to https://phabricator.wikimedia.org/P18544 and previous config saved to /var/cache/conftool/dbconfig/20220111-095254-marostegui.json
  • 09:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T297191)', diff saved to https://phabricator.wikimedia.org/P18543 and previous config saved to /var/cache/conftool/dbconfig/20220111-095246-marostegui.json
  • 09:51 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
  • 09:40 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster1001.eqiad.wmnet
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18542 and previous config saved to /var/cache/conftool/dbconfig/20220111-093741-marostegui.json
  • 09:35 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1001.eqiad.wmnet
  • 09:33 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster1001.eqiad.wmnet
  • 09:29 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1001.eqiad.wmnet
  • 09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T296143)', diff saved to https://phabricator.wikimedia.org/P18541 and previous config saved to /var/cache/conftool/dbconfig/20220111-092706-ladsgroup.json
  • 09:25 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2019.codfw.wmnet with OS buster
  • 09:23 ema: cp4021 (upload), cp4027 (text): upgrade varnish to 6.0.9-1wm1 T298758
  • 09:23 hashar: Upgrading Jenkins and Apache on releases1002 & release2002
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18540 and previous config saved to /var/cache/conftool/dbconfig/20220111-092236-marostegui.json
  • 09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2078.codfw.wmnet with OS bullseye
  • 09:15 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1002.eqiad.wmnet
  • 09:13 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1002.eqiad.wmnet
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P18539 and previous config saved to /var/cache/conftool/dbconfig/20220111-091201-ladsgroup.json
  • 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2023.codfw.wmnet with OS buster
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T297191)', diff saved to https://phabricator.wikimedia.org/P18538 and previous config saved to /var/cache/conftool/dbconfig/20220111-090732-marostegui.json
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T297191)', diff saved to https://phabricator.wikimedia.org/P18537 and previous config saved to /var/cache/conftool/dbconfig/20220111-090119-marostegui.json
  • 09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T297191)', diff saved to https://phabricator.wikimedia.org/P18536 and previous config saved to /var/cache/conftool/dbconfig/20220111-090111-marostegui.json
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P18535 and previous config saved to /var/cache/conftool/dbconfig/20220111-085656-ladsgroup.json
  • 08:48 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2078.codfw.wmnet with OS bullseye
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18534 and previous config saved to /var/cache/conftool/dbconfig/20220111-084606-marostegui.json
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T296143)', diff saved to https://phabricator.wikimedia.org/P18533 and previous config saved to /var/cache/conftool/dbconfig/20220111-084151-ladsgroup.json
  • 08:40 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2023.codfw.wmnet with OS buster
  • 08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2124.codfw.wmnet
  • 08:33 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2124.codfw.wmnet
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T296143)', diff saved to https://phabricator.wikimedia.org/P18532 and previous config saved to /var/cache/conftool/dbconfig/20220111-083322-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T296143)', diff saved to https://phabricator.wikimedia.org/P18531 and previous config saved to /var/cache/conftool/dbconfig/20220111-083314-ladsgroup.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18530 and previous config saved to /var/cache/conftool/dbconfig/20220111-083102-marostegui.json
  • 08:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1020.eqiad.wmnet with OS bullseye
  • 08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P18529 and previous config saved to /var/cache/conftool/dbconfig/20220111-081809-ladsgroup.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T297191)', diff saved to https://phabricator.wikimedia.org/P18528 and previous config saved to /var/cache/conftool/dbconfig/20220111-081557-marostegui.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T297191)', diff saved to https://phabricator.wikimedia.org/P18527 and previous config saved to /var/cache/conftool/dbconfig/20220111-081442-marostegui.json
  • 08:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 08:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T297191)', diff saved to https://phabricator.wikimedia.org/P18526 and previous config saved to /var/cache/conftool/dbconfig/20220111-081400-marostegui.json
  • 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P18525 and previous config saved to /var/cache/conftool/dbconfig/20220111-080305-ladsgroup.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18524 and previous config saved to /var/cache/conftool/dbconfig/20220111-075856-marostegui.json
  • 07:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1020.eqiad.wmnet with OS bullseye
  • 07:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 07:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 07:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 07:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T296143)', diff saved to https://phabricator.wikimedia.org/P18523 and previous config saved to /var/cache/conftool/dbconfig/20220111-074800-ladsgroup.json
  • 07:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2117.codfw.wmnet
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18522 and previous config saved to /var/cache/conftool/dbconfig/20220111-074351-marostegui.json
  • 07:42 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2117.codfw.wmnet
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T296143)', diff saved to https://phabricator.wikimedia.org/P18521 and previous config saved to /var/cache/conftool/dbconfig/20220111-074202-ladsgroup.json
  • 07:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T296143)', diff saved to https://phabricator.wikimedia.org/P18520 and previous config saved to /var/cache/conftool/dbconfig/20220111-074154-ladsgroup.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T297191)', diff saved to https://phabricator.wikimedia.org/P18519 and previous config saved to /var/cache/conftool/dbconfig/20220111-072847-marostegui.json
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P18518 and previous config saved to /var/cache/conftool/dbconfig/20220111-072649-ladsgroup.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T297191)', diff saved to https://phabricator.wikimedia.org/P18517 and previous config saved to /var/cache/conftool/dbconfig/20220111-071729-marostegui.json
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T297191)', diff saved to https://phabricator.wikimedia.org/P18516 and previous config saved to /var/cache/conftool/dbconfig/20220111-071721-marostegui.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18515 and previous config saved to /var/cache/conftool/dbconfig/20220111-071254-root.json
  • 07:12 taavi: extensions/CentralAuth/maintenance/migrateHiddenLevel.php finished - T289068
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P18514 and previous config saved to /var/cache/conftool/dbconfig/20220111-071144-ladsgroup.json
  • 07:07 marostegui: Failover m2 proxy from dbproxy1015 to dbproxy1013 T298586
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18513 and previous config saved to /var/cache/conftool/dbconfig/20220111-070216-marostegui.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18512 and previous config saved to /var/cache/conftool/dbconfig/20220111-065750-root.json
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T296143)', diff saved to https://phabricator.wikimedia.org/P18511 and previous config saved to /var/cache/conftool/dbconfig/20220111-065640-ladsgroup.json
  • 06:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2114.codfw.wmnet
  • 06:51 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2114.codfw.wmnet
  • 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T296143)', diff saved to https://phabricator.wikimedia.org/P18510 and previous config saved to /var/cache/conftool/dbconfig/20220111-065118-ladsgroup.json
  • 06:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 06:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 06:50 Amir1: upgrading mysql on ['db2114', 'db2117', 'db2124']
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18509 and previous config saved to /var/cache/conftool/dbconfig/20220111-064712-marostegui.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18508 and previous config saved to /var/cache/conftool/dbconfig/20220111-064247-root.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T297191)', diff saved to https://phabricator.wikimedia.org/P18507 and previous config saved to /var/cache/conftool/dbconfig/20220111-063207-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T297191)', diff saved to https://phabricator.wikimedia.org/P18506 and previous config saved to /var/cache/conftool/dbconfig/20220111-063052-marostegui.json
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1012.eqiad.wmnet with OS bullseye
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P18505 and previous config saved to /var/cache/conftool/dbconfig/20220111-062743-root.json
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es2032 after Bullseye reimage T295965', diff saved to https://phabricator.wikimedia.org/P18504 and previous config saved to /var/cache/conftool/dbconfig/20220111-062620-marostegui.json
  • 06:21 taavi: starting extensions/CentralAuth/maintenance/migrateHiddenLevel.php on a mwmaint1002 screen session - T289068
  • 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1012.eqiad.wmnet with OS bullseye
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T297191)', diff saved to https://phabricator.wikimedia.org/P18503 and previous config saved to /var/cache/conftool/dbconfig/20220111-054417-marostegui.json
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 02:41 eileen: * revision d90542c2 -> 2956a622 (latest)
  • 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:42 eileen: revision 277989d7 -> d90542c2 (latest) civicrm
  • 00:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:24 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.16/skins/Vector/resources/skins.vector.js/dropdownMenus.js: 79b33f2: Fix TypeError: document.querySelectorAll(...).forEach is not a function (T298910) (duration: 00m 59s)
  • 00:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn

2022-01-10

  • 22:36 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main
  • 22:34 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main
  • 20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T297191)', diff saved to https://phabricator.wikimedia.org/P18502 and previous config saved to /var/cache/conftool/dbconfig/20220110-202728-marostegui.json
  • 20:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18501 and previous config saved to /var/cache/conftool/dbconfig/20220110-201224-marostegui.json
  • 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18500 and previous config saved to /var/cache/conftool/dbconfig/20220110-195719-marostegui.json
  • 19:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T297191)', diff saved to https://phabricator.wikimedia.org/P18499 and previous config saved to /var/cache/conftool/dbconfig/20220110-194214-marostegui.json
  • 19:32 ejegg: updated fundraising civicrm from 3d334f30 to 277989d7
  • 19:29 urbanecm: UTC evening B&C finished
  • 19:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8f5ca9a: Enable TheWikipediaLibrary on most wikis (T288070) (duration: 01m 00s)
  • 19:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T297191)', diff saved to https://phabricator.wikimedia.org/P18497 and previous config saved to /var/cache/conftool/dbconfig/20220110-184154-marostegui.json
  • 18:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T297191)', diff saved to https://phabricator.wikimedia.org/P18496 and previous config saved to /var/cache/conftool/dbconfig/20220110-184147-marostegui.json
  • 18:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18495 and previous config saved to /var/cache/conftool/dbconfig/20220110-182642-marostegui.json
  • 18:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18494 and previous config saved to /var/cache/conftool/dbconfig/20220110-181137-marostegui.json
  • 17:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T297191)', diff saved to https://phabricator.wikimedia.org/P18493 and previous config saved to /var/cache/conftool/dbconfig/20220110-175633-marostegui.json
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T297191)', diff saved to https://phabricator.wikimedia.org/P18492 and previous config saved to /var/cache/conftool/dbconfig/20220110-175503-marostegui.json
  • 17:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T297191)', diff saved to https://phabricator.wikimedia.org/P18491 and previous config saved to /var/cache/conftool/dbconfig/20220110-175455-marostegui.json
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18489 and previous config saved to /var/cache/conftool/dbconfig/20220110-173950-marostegui.json
  • 17:34 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1016.eqiad.wmnet
  • 17:32 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1016.eqiad.wmnet
  • 17:30 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1015.eqiad.wmnet
  • 17:28 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1015.eqiad.wmnet
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18488 and previous config saved to /var/cache/conftool/dbconfig/20220110-172446-marostegui.json
  • 17:23 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1006.eqiad.wmnet
  • 17:21 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1006.eqiad.wmnet
  • 17:16 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1005.eqiad.wmnet
  • 17:14 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1005.eqiad.wmnet
  • 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T297191)', diff saved to https://phabricator.wikimedia.org/P18487 and previous config saved to /var/cache/conftool/dbconfig/20220110-170941-marostegui.json
  • 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T297191)', diff saved to https://phabricator.wikimedia.org/P18486 and previous config saved to /var/cache/conftool/dbconfig/20220110-170811-marostegui.json
  • 17:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 17:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T297191)', diff saved to https://phabricator.wikimedia.org/P18485 and previous config saved to /var/cache/conftool/dbconfig/20220110-170804-marostegui.json
  • 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18484 and previous config saved to /var/cache/conftool/dbconfig/20220110-165259-marostegui.json
  • 16:52 ema: varnish 6.0.9-1wm1 uploaded to buster-wikimedia - component/varnish6 T298758
  • 16:47 moritzm: installing 5.10.84 kernels on bullseye hosts (no reboots involved, just installing the new kernels in parallel)
  • 16:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18483 and previous config saved to /var/cache/conftool/dbconfig/20220110-163754-marostegui.json
  • 16:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T297191)', diff saved to https://phabricator.wikimedia.org/P18482 and previous config saved to /var/cache/conftool/dbconfig/20220110-162249-marostegui.json
  • 16:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2023.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 16:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2023.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 16:21 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry1004.eqiad.wmnet
  • 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T297191)', diff saved to https://phabricator.wikimedia.org/P18481 and previous config saved to /var/cache/conftool/dbconfig/20220110-162122-marostegui.json
  • 16:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 16:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T297191)', diff saved to https://phabricator.wikimedia.org/P18480 and previous config saved to /var/cache/conftool/dbconfig/20220110-162114-marostegui.json
  • 16:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2019.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 16:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2019.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
  • 16:19 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry1004.eqiad.wmnet
  • 16:18 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:13 root@cumin1001: START - Cookbook sre.dns.netbox
  • 16:09 damilare: process-control config ecf09aa0 -> 66e69bda
  • 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P18479 and previous config saved to /var/cache/conftool/dbconfig/20220110-160608-marostegui.json
  • 16:00 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM chartmuseum1001.eqiad.wmnet
  • 16:00 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry1003.eqiad.wmnet
  • 15:57 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry1003.eqiad.wmnet
  • 15:56 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM chartmuseum1001.eqiad.wmnet
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P18478 and previous config saved to /var/cache/conftool/dbconfig/20220110-155103-marostegui.json
  • 15:49 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
  • 15:49 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dragonfly-supernode1001.eqiad.wmnet
  • 15:45 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM dragonfly-supernode1001.eqiad.wmnet
  • 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T297191)', diff saved to https://phabricator.wikimedia.org/P18476 and previous config saved to /var/cache/conftool/dbconfig/20220110-153559-marostegui.json
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T297191)', diff saved to https://phabricator.wikimedia.org/P18475 and previous config saved to /var/cache/conftool/dbconfig/20220110-153429-marostegui.json
  • 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T297191)', diff saved to https://phabricator.wikimedia.org/P18474 and previous config saved to /var/cache/conftool/dbconfig/20220110-153421-marostegui.json
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18472 and previous config saved to /var/cache/conftool/dbconfig/20220110-151917-marostegui.json
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18471 and previous config saved to /var/cache/conftool/dbconfig/20220110-150412-marostegui.json
  • 14:55 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetdb1002.eqiad.wmnet
  • 14:51 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 14:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:49 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: Give priority to PreparedUpdate (T288639) (duration: 01m 00s)
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T297191)', diff saved to https://phabricator.wikimedia.org/P18470 and previous config saved to /var/cache/conftool/dbconfig/20220110-144907-marostegui.json
  • 14:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T297191)', diff saved to https://phabricator.wikimedia.org/P18469 and previous config saved to /var/cache/conftool/dbconfig/20220110-144737-marostegui.json
  • 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 14:36 jbond@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM puppetdb1002.eqiad.wmnet
  • 14:32 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test1001.wikimedia.org
  • 14:27 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test1001.wikimedia.org
  • 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM moscovium.eqiad.wmnet
  • 14:19 jelto: upload wmf-sre-laptop 0.5.3 deb package
  • 14:19 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM moscovium.eqiad.wmnet
  • 14:07 jbond: disable puppet fleet wide for puppetdb restart
  • 13:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 13:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 13:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 13:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 13:54 btullis: upgrading oozie packages in reprepro in order to pick up new log4j version
  • 13:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2032.codfw.wmnet with OS bullseye
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T297191)', diff saved to https://phabricator.wikimedia.org/P18468 and previous config saved to /var/cache/conftool/dbconfig/20220110-131523-marostegui.json
  • 13:02 moritzm: installing ghostscript security updates
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18467 and previous config saved to /var/cache/conftool/dbconfig/20220110-130018-marostegui.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18466 and previous config saved to /var/cache/conftool/dbconfig/20220110-124513-marostegui.json
  • 12:44 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2032.codfw.wmnet with OS bullseye
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2032 for Bullseye reimage T295965', diff saved to https://phabricator.wikimedia.org/P18465 and previous config saved to /var/cache/conftool/dbconfig/20220110-124222-marostegui.json
  • 12:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:36 taavi: UTC morning deploys done
  • 12:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:34 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: hewikisource: remove "קטע" namespace and its talk page (T298430) (duration: 00m 58s)
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T297191)', diff saved to https://phabricator.wikimedia.org/P18464 and previous config saved to /var/cache/conftool/dbconfig/20220110-123009-marostegui.json
  • 12:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T297191)', diff saved to https://phabricator.wikimedia.org/P18463 and previous config saved to /var/cache/conftool/dbconfig/20220110-122847-marostegui.json
  • 12:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T297191)', diff saved to https://phabricator.wikimedia.org/P18462 and previous config saved to /var/cache/conftool/dbconfig/20220110-122840-marostegui.json
  • 12:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:24 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Growth: Add GEMentorDashboardDeploymentMode (T298792) (duration: 00m 59s)
  • 12:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:18 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: uzwiki: Amend Babel configuration (T131924) (duration: 00m 59s)
  • 12:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18460 and previous config saved to /var/cache/conftool/dbconfig/20220110-121335-marostegui.json
  • 12:10 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add MediaSearch profiles (T297863) (duration: 00m 59s)
  • 12:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18459 and previous config saved to /var/cache/conftool/dbconfig/20220110-115830-marostegui.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T297191)', diff saved to https://phabricator.wikimedia.org/P18458 and previous config saved to /var/cache/conftool/dbconfig/20220110-114326-marostegui.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T297191)', diff saved to https://phabricator.wikimedia.org/P18457 and previous config saved to /var/cache/conftool/dbconfig/20220110-114305-marostegui.json
  • 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 9 hosts with reason: Maintenance
  • 11:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 9 hosts with reason: Maintenance
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T297191)', diff saved to https://phabricator.wikimedia.org/P18456 and previous config saved to /var/cache/conftool/dbconfig/20220110-114043-marostegui.json
  • 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18455 and previous config saved to /var/cache/conftool/dbconfig/20220110-112538-marostegui.json
  • 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18454 and previous config saved to /var/cache/conftool/dbconfig/20220110-111034-marostegui.json
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T297191)', diff saved to https://phabricator.wikimedia.org/P18453 and previous config saved to /var/cache/conftool/dbconfig/20220110-105529-marostegui.json
  • 10:53 moritzm: installing openjdk-11 security updates
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager group from s7 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18452 and previous config saved to /var/cache/conftool/dbconfig/20220110-104445-marostegui.json
  • 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T297191)', diff saved to https://phabricator.wikimedia.org/P18451 and previous config saved to /var/cache/conftool/dbconfig/20220110-104004-marostegui.json
  • 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:38 elukey: stop/start kafka daemons on kafka-main1* nodes to move the kafka user to fixed uid/gid - T296641
  • 10:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:16 Amir1: removing echo objectcache entries on all wikis (T272512)
  • 09:56 moritzm: migrating primary/secondary instances off ganeti2019
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T297191)', diff saved to https://phabricator.wikimedia.org/P18449 and previous config saved to /var/cache/conftool/dbconfig/20220110-093534-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions group from s7 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18448 and previous config saved to /var/cache/conftool/dbconfig/20220110-092605-marostegui.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P18447 and previous config saved to /var/cache/conftool/dbconfig/20220110-092029-marostegui.json
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P18446 and previous config saved to /var/cache/conftool/dbconfig/20220110-090525-marostegui.json
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove all groups from s7 codfw T263127', diff saved to https://phabricator.wikimedia.org/P18445 and previous config saved to /var/cache/conftool/dbconfig/20220110-085402-marostegui.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T297191)', diff saved to https://phabricator.wikimedia.org/P18444 and previous config saved to /var/cache/conftool/dbconfig/20220110-085020-marostegui.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T297191)', diff saved to https://phabricator.wikimedia.org/P18443 and previous config saved to /var/cache/conftool/dbconfig/20220110-084912-marostegui.json
  • 08:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 08:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 08:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T297191)', diff saved to https://phabricator.wikimedia.org/P18442 and previous config saved to /var/cache/conftool/dbconfig/20220110-084858-marostegui.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P18441 and previous config saved to /var/cache/conftool/dbconfig/20220110-083354-marostegui.json
  • 08:25 moritzm: migrating primary/secondary instances off ganeti2023
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P18440 and previous config saved to /var/cache/conftool/dbconfig/20220110-081849-marostegui.json
  • 08:13 marostegui: Drop table wikishared.wikimedia_editor_tasks_targets_passed T264225
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T297191)', diff saved to https://phabricator.wikimedia.org/P18439 and previous config saved to /var/cache/conftool/dbconfig/20220110-080344-marostegui.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T297191)', diff saved to https://phabricator.wikimedia.org/P18438 and previous config saved to /var/cache/conftool/dbconfig/20220110-080236-marostegui.json
  • 08:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1155,1165].eqiad.wmnet with reason: Maintenance
  • 08:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1155,1165].eqiad.wmnet with reason: Maintenance
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T297191)', diff saved to https://phabricator.wikimedia.org/P18437 and previous config saved to /var/cache/conftool/dbconfig/20220110-080225-marostegui.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P18436 and previous config saved to /var/cache/conftool/dbconfig/20220110-074720-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P18435 and previous config saved to /var/cache/conftool/dbconfig/20220110-073216-marostegui.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T297191)', diff saved to https://phabricator.wikimedia.org/P18434 and previous config saved to /var/cache/conftool/dbconfig/20220110-071711-marostegui.json
  • 07:16 marostegui: Failover m1 proxy from dbproxy1012 to dbproxy1014 T298586
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T297191)', diff saved to https://phabricator.wikimedia.org/P18433 and previous config saved to /var/cache/conftool/dbconfig/20220110-071603-marostegui.json
  • 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T297191)', diff saved to https://phabricator.wikimedia.org/P18432 and previous config saved to /var/cache/conftool/dbconfig/20220110-071556-marostegui.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P18431 and previous config saved to /var/cache/conftool/dbconfig/20220110-070051-marostegui.json
  • 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1014.eqiad.wmnet with OS bullseye
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P18430 and previous config saved to /var/cache/conftool/dbconfig/20220110-064546-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T297191)', diff saved to https://phabricator.wikimedia.org/P18429 and previous config saved to /var/cache/conftool/dbconfig/20220110-063042-marostegui.json
  • 06:29 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1014.eqiad.wmnet with OS bullseye
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T297191)', diff saved to https://phabricator.wikimedia.org/P18428 and previous config saved to /var/cache/conftool/dbconfig/20220110-062934-marostegui.json
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T297191)', diff saved to https://phabricator.wikimedia.org/P18427 and previous config saved to /var/cache/conftool/dbconfig/20220110-062925-marostegui.json
  • 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1013.eqiad.wmnet with OS bullseye
  • 06:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 06:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 06:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 06:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 06:16 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: Use PreparedUpdate to avoid double parse (T288639) (duration: 01m 00s)
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P18426 and previous config saved to /var/cache/conftool/dbconfig/20220110-061420-marostegui.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P18425 and previous config saved to /var/cache/conftool/dbconfig/20220110-055915-marostegui.json
  • 05:58 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1013.eqiad.wmnet with OS bullseye
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T297191)', diff saved to https://phabricator.wikimedia.org/P18424 and previous config saved to /var/cache/conftool/dbconfig/20220110-054410-marostegui.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T297191)', diff saved to https://phabricator.wikimedia.org/P18423 and previous config saved to /var/cache/conftool/dbconfig/20220110-054100-marostegui.json
  • 05:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance

2022-01-08

  • 10:51 elukey: restart hive daemons on an-coord1002 (after my last upgrade/rollback of packages the prometheus agent settings were not picked up, so no metrics)

2022-01-07

  • 22:07 eileen: config revision changed from 3df415c1 to ecf09aa0 - disable eoy email jobs
  • 20:08 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/{zhwikinews,zhwikinews-1.5x,zhwikinews-2x,zhwikinews-hans,zhwikinews-hans-1.5x,zhwikinews-hans-2x}.png via purgeList.php
  • 19:49 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host apifeatureusage2001.codfw.wmnet
  • 19:41 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host apifeatureusage1001.eqiad.wmnet
  • 19:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS bullseye
  • 19:21 herron@cumin1001: START - Cookbook sre.ganeti.makevm for new host apifeatureusage2001.codfw.wmnet
  • 19:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS bullseye
  • 19:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS bullseye
  • 19:11 herron@cumin1001: START - Cookbook sre.ganeti.makevm for new host apifeatureusage1001.eqiad.wmnet
  • 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS bullseye
  • 15:18 taavi: reset email address for Ollie Shotton developer account per T298779
  • 15:08 ottomata: creeating mediainfo-streaming-updater.mutation topics on kafka main-eqiad and main-codfw and setting retention to 30 days - T296470
  • 14:05 ema: upgrade varnish on deployment-cache-text06 to 6.0.9 T298758
  • 12:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:14 taavi@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/ProofreadPage/modules/page: Backport: Makes sure $imgContHorizontal is always initialized (T298694) (duration: 00m 59s)
  • 11:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:56 taavi@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/Flow: Backport: Revert "Use strict equality when safe to do so" (T298760) (duration: 01m 00s)
  • 11:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:40 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:33 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T297191)', diff saved to https://phabricator.wikimedia.org/P18413 and previous config saved to /var/cache/conftool/dbconfig/20220107-072742-marostegui.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P18412 and previous config saved to /var/cache/conftool/dbconfig/20220107-071237-marostegui.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P18411 and previous config saved to /var/cache/conftool/dbconfig/20220107-065733-marostegui.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T297191)', diff saved to https://phabricator.wikimedia.org/P18410 and previous config saved to /var/cache/conftool/dbconfig/20220107-064228-marostegui.json
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T297191)', diff saved to https://phabricator.wikimedia.org/P18409 and previous config saved to /var/cache/conftool/dbconfig/20220107-064119-marostegui.json
  • 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 06:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2089.codfw.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2089.codfw.wmnet with reason: Maintenance
  • 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db[2076,2095].codfw.wmnet with reason: Maintenance
  • 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db[2076,2095].codfw.wmnet with reason: Maintenance
  • 05:47 marostegui: rename wikishared.wikimedia_editor_tasks_targets_passed on db1120 T264225
  • 00:23 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: viwiktionary: add namespaces "Appendix" and "Appendix talk" (T298289) (duration: 00m 59s)
  • 00:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn

2022-01-06

  • 23:52 jhathaway: bouncing blazegraph on wdqs1004
  • 23:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-coord1002.eqiad.wmnet with OS buster
  • 22:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS buster
  • 22:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 22:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 22:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 22:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 22:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.16 refs T293958
  • 22:14 twentyafterfour@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/Scribunto/: sync Scribunto to deploy https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Scribunto/+/752006/ (duration: 01m 08s)
  • 22:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 22:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 22:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:23 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3297991]: update rdf-spark-tools jar to 0.3.98 (duration: 02m 15s)
  • 20:21 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3297991]: update rdf-spark-tools jar to 0.3.98
  • 20:19 inflatador: banned elastic2051 from both chi and omega search clusters - T298674
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:01 twentyafterfour@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Sync https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/751841 (duration: 01m 08s)
  • 20:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:59 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@63c162d]: generate entity revision maps for commons / wcqs (duration: 02m 07s)
  • 19:57 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@63c162d]: generate entity revision maps for commons / wcqs
  • 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:07 taavi: UTC evening deploys done
  • 19:05 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add data.nhm.ac.uk to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T298451) (duration: 01m 09s)
  • 19:02 razzi: systemctl restart haproxy on dbproxy1018 to repool clouddb1018 for T298505
  • 18:59 mutante: puppetmaster1001 - creating missing Icinga contact for jgleeson in private puppet repo T298649
  • 18:51 mutante: contint1001 - after contint2001 also re-enabled puppet and deployed 751816 zuul-merger refactor - service git-daemon refreshed and runnning
  • 18:50 razzi: run sudo maintain-views --databases centralauth --replace-all on clouddb1018 for T298505
  • 18:47 mutante: contint* - deploying zuul-merger puppet refactor change, first codfw-only
  • 18:00 btullis@deploy1002: Finished deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production (duration: 00m 09s)
  • 18:00 btullis@deploy1002: Started deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production
  • 17:45 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@6f5caf9]: allow for null columns in export to relforge (duration: 02m 11s)
  • 17:42 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@6f5caf9]: allow for null columns in export to relforge
  • 16:42 otto@deploy1002: Finished deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production (duration: 00m 34s)
  • 16:41 otto@deploy1002: Started deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production
  • 16:37 inflatador: restarting elastic2052 for configuration change - T298674
  • 16:33 taavi: reset wikitech email for User:Iniquity per T298683
  • 16:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:21 taavi@deploy1002: Synchronized wmf-config/wikitech.php: wikitech: Re-enable Phabricator and Gerrit users after unblock (duration: 01m 09s)
  • 16:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:19 btullis@deploy1002: Finished deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production (duration: 00m 41s)
  • 16:18 btullis@deploy1002: Started deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production
  • 16:18 btullis@deploy1002: Finished deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production (duration: 07m 16s)
  • 16:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:10 btullis@deploy1002: Started deploy [cassandra/logstash-logback-encoder@fb10de1] (aqs): Deploying logstash-logback-encoder to production
  • 16:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 15:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
  • 15:00 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
  • 13:51 jbond: deploy cfssl_1.6.1-0+deb9u1_amd64 to stretch systems
  • 09:57 hashar: Restarting zuul-merger on contint2001 and contint1001 | https://gerrit.wikimedia.org/r/c/operations/puppet/+/738370/ | T187897
  • 07:06 Amir1: revoke DROP from wikiadmin globally
  • 02:34 eileen: civicrm revision changed from 67264062 to 3d334f30
  • 00:32 dancy@deploy1002: Synchronized wmf-config/logos.php: Config: Change the Traditional Chinese and Simplified Chinese logo for zhwikinews (T298550) (duration: 01m 07s)
  • 00:30 dancy@deploy1002: Synchronized logos/config.yaml: Config: Change the Traditional Chinese and Simplified Chinese logo for zhwikinews (T298550) (duration: 01m 07s)
  • 00:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn

2022-01-05

  • 23:50 razzi: sudo systemctl reload haproxy on dbproxy1019 to repool clouddb1014 for T298505
  • 23:26 razzi: run sudo maintain-views --databases centralauth --debug --replace-all on clouddb1014 for T298505
  • 22:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 22:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 22:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 22:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:25 eileen: civicrm revision 32d7370a -> 67264062
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:39 razzi: reload haproxy on dbproxy1019 to repool clouddb1014 for T298505
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:25 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.16 refs T293957 (duration: 01m 07s)
  • 20:23 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.16 refs T293957
  • 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:14 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.16 refs T293957
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:11 twentyafterfour@deploy1002: Synchronized php-1.38.0-wmf.16/includes/changetags/ChangeTags.php: unblock the train, refs T293957 (duration: 01m 09s)
  • 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:47 urbanecm@deploy1002: Finished scap: 485e72b: Add it namespace aliases in scn (T297844) (duration: 11m 40s)
  • 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:41 razzi: reload haproxy on dbproxy1019 (previously incorrectly reloaded dbproxy1018) for T298505
  • 19:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:35 urbanecm@deploy1002: Started scap: 485e72b: Add it namespace aliases in scn (T297844)
  • 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f2da5be: Deploy sticky header (T295976) (duration: 01m 42s)
  • 19:31 razzi: reload haproxy on dbproxy1018 for T298505
  • 19:27 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.13/skins/Vector/resources/skins.vector.es6/stickyHeader.js: f6424f3: Dont use ts-ignore. It is hiding real errors (T297119) (duration: 01m 08s)
  • 19:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: aff4ac3: Add www.artsobservasjoner.no to the wgCopyUploadsDomains allowlist of Commons (T298449) (duration: 01m 08s)
  • 19:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:47 jgleeson: localsettings changed from 2d371ed1 to 3df415c1
  • 18:22 bd808: Toolhub: ran `poetry run ./manage.py migrate` against m5-master
  • 18:18 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: sync on main
  • 18:16 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply on main
  • 18:07 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync on main
  • 18:06 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply on main
  • 18:04 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync on main
  • 18:03 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply on main
  • 18:03 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync on main
  • 18:02 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply on main
  • 18:02 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply on main
  • 17:57 btullis@deploy1002: Finished deploy [analytics/superset/deploy@09094de]: Deployment of Superset 1.3.2 to production (duration: 00m 29s)
  • 17:57 btullis@deploy1002: Started deploy [analytics/superset/deploy@09094de]: Deployment of Superset 1.3.2 to production
  • 17:55 andrew@deploy1002: Finished deploy [horizon/deploy@b300fa6]: minor code format update (duration: 04m 09s)
  • 17:53 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: sync on main
  • 17:51 andrew@deploy1002: Started deploy [horizon/deploy@b300fa6]: minor code format update
  • 17:50 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply on main
  • 17:48 btullis@deploy1002: Finished deploy [analytics/superset/deploy@09094de]: Deployment of Superset 1.3.2 to staging (duration: 00m 39s)
  • 17:47 btullis@deploy1002: Started deploy [analytics/superset/deploy@09094de]: Deployment of Superset 1.3.2 to staging
  • 17:46 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: sync on main
  • 17:46 btullis@deploy1002: Finished deploy [analytics/superset/deploy@09094de]: Deployment of Superset 1.3.2 to staging (duration: 03m 11s)
  • 17:42 btullis@deploy1002: Started deploy [analytics/superset/deploy@09094de]: Deployment of Superset 1.3.2 to staging
  • 17:42 btullis@deploy1002: Started deploy [analytics/superset/deploy@09094de]: Deployment for something important
  • 17:36 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply on main
  • 17:26 andrew@deploy1002: Finished deploy [horizon/deploy@15efe04]: sudo panel update (duration: 04m 00s)
  • 17:21 andrew@deploy1002: Started deploy [horizon/deploy@15efe04]: sudo panel update
  • 17:21 andrew@deploy1002: Finished deploy [horizon/deploy@15efe04]: sudo panel update (codfw1dev) (duration: 01m 54s)
  • 17:19 andrew@deploy1002: Started deploy [horizon/deploy@15efe04]: sudo panel update (codfw1dev)
  • 17:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:11 sbassett: Deployed security fix for T298581 to wmf.16
  • 17:04 sbassett@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/MobileFrontend/includes/specials/SpecialMobileContributions.php: Deploy security fix for T298581 (duration: 01m 08s)
  • 16:51 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:51 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:38 andrew@deploy1002: Finished deploy [horizon/deploy@5e57e78]: sudo panel update (codfw1dev) (duration: 02m 08s)
  • 16:36 andrew@deploy1002: Started deploy [horizon/deploy@5e57e78]: sudo panel update (codfw1dev)
  • 16:27 andrew@deploy1002: Finished deploy [horizon/deploy@5e57e78]: sudo panel update (duration: 03m 53s)
  • 16:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:26 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:23 andrew@deploy1002: Started deploy [horizon/deploy@5e57e78]: sudo panel update
  • 14:54 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:50 aokoth@cumin1001: START - Cookbook sre.dns.netbox
  • 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2087:3316, db2087:3317 after reimage T295965', diff saved to https://phabricator.wikimedia.org/P18402 and previous config saved to /var/cache/conftool/dbconfig/20220105-134827-marostegui.json
  • 13:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2001.codfw.wmnet with OS bullseye
  • 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:38 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/TrustedXFF/: ce7113b: Add more Zscaler ranges (T298241) (duration: 01m 09s)
  • 13:37 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/TrustedXFF/: d35e36f: Add more Zscaler ranges (T298241) (duration: 01m 09s)
  • 13:33 Amir1: delete echo keys from objectchange in frwiki (T272512)
  • 13:23 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: sync on main
  • 13:22 jelto@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply on main
  • 13:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2001.codfw.wmnet with OS bullseye
  • 13:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2002.codfw.wmnet with OS bullseye
  • 12:38 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2002.codfw.wmnet with OS bullseye
  • 12:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:18 taavi: UTC morning deploys done
  • 12:16 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add akwiki as an import source for twwiki (T298296) (duration: 01m 09s)
  • 12:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:07 vgutierrez: pool cp5005 running envoyproxy as TLS terminator - T271421
  • 12:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2003.codfw.wmnet with OS bullseye
  • 11:56 jbond: rollout cfssl 1.6.1
  • 11:55 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: sync on staging
  • 11:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5005.eqsin.wmnet with OS buster
  • 11:55 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply on production
  • 11:55 jelto@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply on staging
  • 11:34 aokoth@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kubestage1002.eqiad.wmnet
  • 11:24 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2003.codfw.wmnet with OS bullseye
  • 11:24 btullis: updating hive packages in reprepro for log4j update
  • 11:24 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts kubestage1002.eqiad.wmnet
  • 11:20 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy2003.codfw.wmnet with OS bullseye
  • 10:54 jbond: upload cfssl 1.6.1
  • 10:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2003.codfw.wmnet with OS bullseye
  • 10:48 hashar: CI: switching MediaWiki selenium from php built-in server to Apache # https://gerrit.wikimedia.org/r/751697
  • 10:40 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:37 aokoth@cumin1001: START - Cookbook sre.dns.netbox
  • 10:02 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 10:01 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8137ffc: pwnwiki: Enable Growth features in dark mode (T298115; 3/3) (duration: 01m 07s)
  • 10:00 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 10:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:59 urbanecm@deploy1002: Synchronized wmf-config/config/pwnwiki.yaml: 8137ffc: pwnwiki: Enable Growth features in dark mode (T298115; 2/3) (duration: 01m 07s)
  • 09:59 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 09:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:58 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 8137ffc: pwnwiki: Enable Growth features in dark mode (T298115; 1/3) (duration: 01m 07s)
  • 09:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:53 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/GrowthExperiments/includes/Mentorship/Hooks/MentorFilterHooks.php: 24e15e1: MentorFilterHooks: Include only primary mentors (T298031) (duration: 01m 07s)
  • 09:48 aokoth@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kubestage1001.eqiad.wmnet
  • 09:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:37 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/TrustedXFF/trusted-hosts.php: ab8fe98: Add Zscaler to list of trusted hosts for XFF (T298241) (duration: 01m 08s)
  • 09:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/TrustedXFF/trusted-hosts.php: 010d96b: Add Zscaler to list of trusted hosts for XFF (T298241) (duration: 01m 09s)
  • 09:33 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts kubestage1001.eqiad.wmnet
  • 09:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5005.eqsin.wmnet with OS buster
  • 09:24 vgutierrez: depool cp5005 to be reimaged as cache::upload_envoy - T271421
  • 08:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2087.codfw.wmnet with OS bullseye
  • 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2087.codfw.wmnet with OS bullseye
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316, db2087:3317 for Buster reimage T295965', diff saved to https://phabricator.wikimedia.org/P18399 and previous config saved to /var/cache/conftool/dbconfig/20220105-082529-marostegui.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T297191)', diff saved to https://phabricator.wikimedia.org/P18398 and previous config saved to /var/cache/conftool/dbconfig/20220105-081600-marostegui.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P18397 and previous config saved to /var/cache/conftool/dbconfig/20220105-080055-marostegui.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P18396 and previous config saved to /var/cache/conftool/dbconfig/20220105-074551-marostegui.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T297191)', diff saved to https://phabricator.wikimedia.org/P18395 and previous config saved to /var/cache/conftool/dbconfig/20220105-073046-marostegui.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T297191)', diff saved to https://phabricator.wikimedia.org/P18394 and previous config saved to /var/cache/conftool/dbconfig/20220105-072937-marostegui.json
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 02:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:13 Amir1: running foreachwikiindblist all maintenance/refreshImageMetadata.php --force --verbose --mediatype=OFFICE --oldimage (T298417)
  • 02:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:11 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.16/maintenance/refreshImageMetadata.php: Backport: maintenance: Add support for oldimage table metadata refresh (T298417) (duration: 01m 07s)
  • 02:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:09 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.13/maintenance/refreshImageMetadata.php: Backport: maintenance: Add support for oldimage table metadata refresh (T298417) (duration: 01m 08s)
  • 01:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:49 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Delete Tematica namespace (NS:104) in Italian Wikivoyage (T298315) (duration: 01m 07s)
  • 01:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:43 ebernhardson@deploy1002: Synchronized static/images/mobile/copyright/wikivoyage-wordmark-bn.svg: Config: Update bnwikivoyage wordmark logo (T298033) (duration: 01m 07s)
  • 01:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:41 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update bnwikivoyage wordmark logo (T298033) (duration: 01m 07s)
  • 01:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:12 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Move CirrusSearch more_like traffic to eqiad (duration: 01m 07s)
  • 01:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:01 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 34bf91e: GrowthExperiments: Add campaign pattern for JOSA (T298057) (duration: 01m 08s)
  • 00:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7aff17f: Fix wordmark svgs for strategywiki, viwikibooks (T290091; 2/2) (duration: 01m 07s)
  • 00:52 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: 7aff17f: Fix wordmark svgs for strategywiki, viwikibooks (T290091; 1/2) (duration: 01m 07s)
  • 00:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6c220f0: Enable slow-parsoid logs (duration: 01m 08s)
  • 00:40 twentyafterfour@deploy1002: Synchronized php-1.38.0-wmf.16/includes/content/ContentModelChange.php: fix patch application failure (duration: 01m 07s)
  • 00:37 twentyafterfour@deploy1002: Synchronized php-1.38.0-wmf.16/extensions/VisualEditor/: fix patch application failure (duration: 01m 09s)

2022-01-04

  • 22:55 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.16 refs T293957 (duration: 37m 56s)
  • 22:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 22:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 22:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 22:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 22:17 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.16 refs T293957
  • 21:15 eileen: process-control checkout revision (e58e4e50 -> eb83f208)
  • 21:02 eileen: process-control config 40467fc2 -> e58e4e50
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:43 eileen: config b26653a4 -> 40467fc2 (latest)
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:34 eileen: civicrm revision aaceb4ab -> 328c8542
  • 20:33 twentyafterfour_: MediaWiki train for 1.38.0-wmf.16 - ran `scap prep` T293957
  • 16:57 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@b38fb58]: Switch mjolnir norm_query_clustering to the shsaded refinery jar (duration: 02m 11s)
  • 16:55 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@b38fb58]: Switch mjolnir norm_query_clustering to the shsaded refinery jar
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T277354)', diff saved to https://phabricator.wikimedia.org/P18388 and previous config saved to /var/cache/conftool/dbconfig/20220104-160930-marostegui.json
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P18387 and previous config saved to /var/cache/conftool/dbconfig/20220104-155425-marostegui.json
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P18386 and previous config saved to /var/cache/conftool/dbconfig/20220104-153920-marostegui.json
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T277354)', diff saved to https://phabricator.wikimedia.org/P18384 and previous config saved to /var/cache/conftool/dbconfig/20220104-152416-marostegui.json
  • 15:07 aokoth@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:21 oblivian@deploy1002: Synchronized docroot: Config: Make symlinks relative so they work on a local checkout too (T285232) (duration: 00m 57s)
  • 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:12 oblivian@deploy1002: Synchronized images: Config: Remove dead symlinks (T285232) (duration: 00m 58s)
  • 14:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:57 godog: bump prometheus k8s + ops space in eqiad
  • 13:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2087.codfw.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2087.codfw.wmnet with reason: Maintenance
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T277354)', diff saved to https://phabricator.wikimedia.org/P18382 and previous config saved to /var/cache/conftool/dbconfig/20220104-134410-marostegui.json
  • 13:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1121,1155].eqiad.wmnet with reason: Maintenance
  • 13:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1121,1155].eqiad.wmnet with reason: Maintenance
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T277354)', diff saved to https://phabricator.wikimedia.org/P18381 and previous config saved to /var/cache/conftool/dbconfig/20220104-134359-marostegui.json
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P18380 and previous config saved to /var/cache/conftool/dbconfig/20220104-132854-marostegui.json
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P18379 and previous config saved to /var/cache/conftool/dbconfig/20220104-131349-marostegui.json
  • 13:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 13:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298316)', diff saved to https://phabricator.wikimedia.org/P18378 and previous config saved to /var/cache/conftool/dbconfig/20220104-130816-marostegui.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T277354)', diff saved to https://phabricator.wikimedia.org/P18377 and previous config saved to /var/cache/conftool/dbconfig/20220104-125845-marostegui.json
  • 12:53 taavi: UTC morning deploys done
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18376 and previous config saved to /var/cache/conftool/dbconfig/20220104-125312-marostegui.json
  • 12:52 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: prod: WRITE_BOTH for centralauth hidden level migration (T289068) (duration: 00m 57s)
  • 12:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges from s2 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18375 and previous config saved to /var/cache/conftool/dbconfig/20220104-123845-marostegui.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P18374 and previous config saved to /var/cache/conftool/dbconfig/20220104-123807-marostegui.json
  • 12:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:34 taavi@deploy1002: Synchronized php-1.38.0-wmf.13/extensions/LdapAuthentication/includes/LdapAuthenticationPlugin.php: Backport: Include ldap errno on account creation debug logs (T298508) (duration: 00m 58s)
  • 12:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298316)', diff saved to https://phabricator.wikimedia.org/P18373 and previous config saved to /var/cache/conftool/dbconfig/20220104-122302-marostegui.json
  • 12:22 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Create autopatroller and patroller groups on bnwiktionary (T298187) (duration: 00m 57s)
  • 12:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298316)', diff saved to https://phabricator.wikimedia.org/P18372 and previous config saved to /var/cache/conftool/dbconfig/20220104-121643-marostegui.json
  • 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:15 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Make reply tool available as opt-out on specieswiki (T297535) (duration: 00m 57s)
  • 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:13 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Make reply tool available as opt-out on metawiki (T297534) (duration: 00m 59s)
  • 12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db[1106,1154].eqiad.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db[1106,1154].eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 15 hosts with reason: Maintenance
  • 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 15 hosts with reason: Maintenance
  • 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 11:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 11:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298316)', diff saved to https://phabricator.wikimedia.org/P18370 and previous config saved to /var/cache/conftool/dbconfig/20220104-114503-marostegui.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18369 and previous config saved to /var/cache/conftool/dbconfig/20220104-112959-marostegui.json
  • 11:20 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:20 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:18 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P18368 and previous config saved to /var/cache/conftool/dbconfig/20220104-111454-marostegui.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298316)', diff saved to https://phabricator.wikimedia.org/P18367 and previous config saved to /var/cache/conftool/dbconfig/20220104-105949-marostegui.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T277354)', diff saved to https://phabricator.wikimedia.org/P18366 and previous config saved to /var/cache/conftool/dbconfig/20220104-105922-marostegui.json
  • 10:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 10:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T277354)', diff saved to https://phabricator.wikimedia.org/P18365 and previous config saved to /var/cache/conftool/dbconfig/20220104-105914-marostegui.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298316)', diff saved to https://phabricator.wikimedia.org/P18364 and previous config saved to /var/cache/conftool/dbconfig/20220104-105244-marostegui.json
  • 10:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 10:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 10:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P18362 and previous config saved to /var/cache/conftool/dbconfig/20220104-104410-marostegui.json
  • 10:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 10:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 10:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 10:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 10:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 10:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P18360 and previous config saved to /var/cache/conftool/dbconfig/20220104-102905-marostegui.json
  • 10:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T277354)', diff saved to https://phabricator.wikimedia.org/P18359 and previous config saved to /var/cache/conftool/dbconfig/20220104-101400-marostegui.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298316)', diff saved to https://phabricator.wikimedia.org/P18358 and previous config saved to /var/cache/conftool/dbconfig/20220104-094920-marostegui.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18357 and previous config saved to /var/cache/conftool/dbconfig/20220104-093415-marostegui.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18356 and previous config saved to /var/cache/conftool/dbconfig/20220104-091910-marostegui.json
  • 09:04 dcaro: start merging puppet cleanup patches
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298316)', diff saved to https://phabricator.wikimedia.org/P18355 and previous config saved to /var/cache/conftool/dbconfig/20220104-090406-marostegui.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298316)', diff saved to https://phabricator.wikimedia.org/P18354 and previous config saved to /var/cache/conftool/dbconfig/20220104-085127-marostegui.json
  • 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298316)', diff saved to https://phabricator.wikimedia.org/P18353 and previous config saved to /var/cache/conftool/dbconfig/20220104-085118-marostegui.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18352 and previous config saved to /var/cache/conftool/dbconfig/20220104-083613-marostegui.json
  • 08:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2094.codfw.wmnet with OS bullseye
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T277354)', diff saved to https://phabricator.wikimedia.org/P18351 and previous config saved to /var/cache/conftool/dbconfig/20220104-082306-marostegui.json
  • 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T277354)', diff saved to https://phabricator.wikimedia.org/P18350 and previous config saved to /var/cache/conftool/dbconfig/20220104-082259-marostegui.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P18349 and previous config saved to /var/cache/conftool/dbconfig/20220104-082109-marostegui.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P18348 and previous config saved to /var/cache/conftool/dbconfig/20220104-080754-marostegui.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298316)', diff saved to https://phabricator.wikimedia.org/P18347 and previous config saved to /var/cache/conftool/dbconfig/20220104-080604-marostegui.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298316)', diff saved to https://phabricator.wikimedia.org/P18346 and previous config saved to /var/cache/conftool/dbconfig/20220104-080051-marostegui.json
  • 08:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2094.codfw.wmnet with OS bullseye
  • 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Maintenance
  • 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P18345 and previous config saved to /var/cache/conftool/dbconfig/20220104-075249-marostegui.json
  • 07:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 07:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 9 hosts with reason: Maintenance
  • 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 9 hosts with reason: Maintenance
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298316)', diff saved to https://phabricator.wikimedia.org/P18344 and previous config saved to /var/cache/conftool/dbconfig/20220104-074456-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T277354)', diff saved to https://phabricator.wikimedia.org/P18343 and previous config saved to /var/cache/conftool/dbconfig/20220104-073745-marostegui.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18342 and previous config saved to /var/cache/conftool/dbconfig/20220104-072951-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P18341 and previous config saved to /var/cache/conftool/dbconfig/20220104-071446-marostegui.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298316)', diff saved to https://phabricator.wikimedia.org/P18340 and previous config saved to /var/cache/conftool/dbconfig/20220104-065942-marostegui.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298316)', diff saved to https://phabricator.wikimedia.org/P18339 and previous config saved to /var/cache/conftool/dbconfig/20220104-063714-marostegui.json
  • 06:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 06:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 06:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 04:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T277354)', diff saved to https://phabricator.wikimedia.org/P18338 and previous config saved to /var/cache/conftool/dbconfig/20220104-042116-marostegui.json
  • 04:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 04:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 04:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T277354)', diff saved to https://phabricator.wikimedia.org/P18337 and previous config saved to /var/cache/conftool/dbconfig/20220104-042109-marostegui.json
  • 04:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P18335 and previous config saved to /var/cache/conftool/dbconfig/20220104-040604-marostegui.json
  • 04:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2144.codfw.wmnet
  • 04:01 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2144.codfw.wmnet
  • 03:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P18334 and previous config saved to /var/cache/conftool/dbconfig/20220104-035059-marostegui.json
  • 03:50 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2144.codfw.wmnet
  • 03:50 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2144.codfw.wmnet
  • 03:36 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2144.codfw.wmnet
  • 03:36 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2144.codfw.wmnet
  • 03:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T277354)', diff saved to https://phabricator.wikimedia.org/P18333 and previous config saved to /var/cache/conftool/dbconfig/20220104-033555-marostegui.json
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T277354)', diff saved to https://phabricator.wikimedia.org/P18332 and previous config saved to /var/cache/conftool/dbconfig/20220104-015125-marostegui.json
  • 01:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 01:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 01:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T297094)', diff saved to https://phabricator.wikimedia.org/P18331 and previous config saved to /var/cache/conftool/dbconfig/20220104-012506-marostegui.json
  • 01:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18330 and previous config saved to /var/cache/conftool/dbconfig/20220104-011001-marostegui.json
  • 00:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P18329 and previous config saved to /var/cache/conftool/dbconfig/20220104-005456-marostegui.json
  • 00:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T297094)', diff saved to https://phabricator.wikimedia.org/P18328 and previous config saved to /var/cache/conftool/dbconfig/20220104-003951-marostegui.json
  • 00:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 00:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 00:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T277354)', diff saved to https://phabricator.wikimedia.org/P18327 and previous config saved to /var/cache/conftool/dbconfig/20220104-000947-marostegui.json

2022-01-03

  • 23:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P18326 and previous config saved to /var/cache/conftool/dbconfig/20220103-235443-marostegui.json
  • 23:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P18325 and previous config saved to /var/cache/conftool/dbconfig/20220103-233938-marostegui.json
  • 23:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T277354)', diff saved to https://phabricator.wikimedia.org/P18324 and previous config saved to /var/cache/conftool/dbconfig/20220103-232433-marostegui.json
  • 21:50 cwhite: manually upgrade to grafana 8 on grafana-next (T282863)
  • 21:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T277354)', diff saved to https://phabricator.wikimedia.org/P18323 and previous config saved to /var/cache/conftool/dbconfig/20220103-212216-marostegui.json
  • 21:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 21:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 21:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T277354)', diff saved to https://phabricator.wikimedia.org/P18322 and previous config saved to /var/cache/conftool/dbconfig/20220103-212209-marostegui.json
  • 21:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P18321 and previous config saved to /var/cache/conftool/dbconfig/20220103-210704-marostegui.json
  • 20:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P18320 and previous config saved to /var/cache/conftool/dbconfig/20220103-205159-marostegui.json
  • 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T277354)', diff saved to https://phabricator.wikimedia.org/P18319 and previous config saved to /var/cache/conftool/dbconfig/20220103-203654-marostegui.json
  • 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T277354)', diff saved to https://phabricator.wikimedia.org/P18318 and previous config saved to /var/cache/conftool/dbconfig/20220103-185305-marostegui.json
  • 18:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 18:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 18:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T277354)', diff saved to https://phabricator.wikimedia.org/P18317 and previous config saved to /var/cache/conftool/dbconfig/20220103-185257-marostegui.json
  • 18:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P18316 and previous config saved to /var/cache/conftool/dbconfig/20220103-183752-marostegui.json
  • 18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T297094)', diff saved to https://phabricator.wikimedia.org/P18315 and previous config saved to /var/cache/conftool/dbconfig/20220103-183130-marostegui.json
  • 18:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 18:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T297094)', diff saved to https://phabricator.wikimedia.org/P18314 and previous config saved to /var/cache/conftool/dbconfig/20220103-183122-marostegui.json
  • 18:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P18313 and previous config saved to /var/cache/conftool/dbconfig/20220103-182248-marostegui.json
  • 18:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18312 and previous config saved to /var/cache/conftool/dbconfig/20220103-181617-marostegui.json
  • 18:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T277354)', diff saved to https://phabricator.wikimedia.org/P18311 and previous config saved to /var/cache/conftool/dbconfig/20220103-180743-marostegui.json
  • 18:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P18310 and previous config saved to /var/cache/conftool/dbconfig/20220103-180112-marostegui.json
  • 17:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T297094)', diff saved to https://phabricator.wikimedia.org/P18309 and previous config saved to /var/cache/conftool/dbconfig/20220103-174608-marostegui.json
  • 17:13 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudbackup1003.eqiad.wmnet with OS buster
  • 16:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1088.eqiad.wmnet with OS buster
  • 16:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1086.eqiad.wmnet with OS buster
  • 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T297094)', diff saved to https://phabricator.wikimedia.org/P18308 and previous config saved to /var/cache/conftool/dbconfig/20220103-164652-marostegui.json
  • 16:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 16:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T297094)', diff saved to https://phabricator.wikimedia.org/P18307 and previous config saved to /var/cache/conftool/dbconfig/20220103-164645-marostegui.json
  • 16:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1003.eqiad.wmnet with OS buster
  • 16:43 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudbackup1003.eqiad.wmnet with OS buster
  • 16:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1003.eqiad.wmnet with OS buster
  • 16:37 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2144.codfw.wmnet
  • 16:37 ladsgroup@cumin1001: START - Cookbook sre.mysql.upgrade for db2144.codfw.wmnet
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18306 and previous config saved to /var/cache/conftool/dbconfig/20220103-163140-marostegui.json
  • 16:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1088.eqiad.wmnet with OS buster
  • 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1086.eqiad.wmnet with OS buster
  • 16:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1087.eqiad.wmnet with OS buster
  • 16:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1085.eqiad.wmnet with OS buster
  • 16:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1084.eqiad.wmnet with OS buster
  • 16:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1088.eqiad.wmnet with OS buster
  • 16:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1086.eqiad.wmnet with OS buster
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18305 and previous config saved to /var/cache/conftool/dbconfig/20220103-161635-marostegui.json
  • 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T277354)', diff saved to https://phabricator.wikimedia.org/P18304 and previous config saved to /var/cache/conftool/dbconfig/20220103-161232-marostegui.json
  • 16:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T277354)', diff saved to https://phabricator.wikimedia.org/P18303 and previous config saved to /var/cache/conftool/dbconfig/20220103-161224-marostegui.json
  • 16:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1088.eqiad.wmnet with OS buster
  • 16:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1087.eqiad.wmnet with OS buster
  • 16:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1086.eqiad.wmnet with OS buster
  • 16:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1085.eqiad.wmnet with OS buster
  • 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T297094)', diff saved to https://phabricator.wikimedia.org/P18302 and previous config saved to /var/cache/conftool/dbconfig/20220103-160131-marostegui.json
  • 16:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1084.eqiad.wmnet with OS buster
  • 15:58 vgutierrez: pool cp2029 - T298293
  • 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P18301 and previous config saved to /var/cache/conftool/dbconfig/20220103-155720-marostegui.json
  • 15:53 moritzm: installing publicsuffix 20211207.1025-0+deb11u1 on bullseye hosts
  • 15:50 moritzm: installing gmp security updates
  • 15:43 moritzm: installing datatables.js security updates
  • 15:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp2029.codfw.wmnet with reason: Swapping faulty DIMM with B1
  • 15:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp2029.codfw.wmnet with reason: Swapping faulty DIMM with B1
  • 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P18300 and previous config saved to /var/cache/conftool/dbconfig/20220103-154215-marostegui.json
  • 15:41 moritzm: installing edk2 security updates
  • 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T277354)', diff saved to https://phabricator.wikimedia.org/P18299 and previous config saved to /var/cache/conftool/dbconfig/20220103-152710-marostegui.json
  • 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T297094)', diff saved to https://phabricator.wikimedia.org/P18298 and previous config saved to /var/cache/conftool/dbconfig/20220103-151558-marostegui.json
  • 15:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 15:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T297094)', diff saved to https://phabricator.wikimedia.org/P18297 and previous config saved to /var/cache/conftool/dbconfig/20220103-151550-marostegui.json
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18296 and previous config saved to /var/cache/conftool/dbconfig/20220103-150045-marostegui.json
  • 15:00 hashar: Restarting Gerrit primary on gerrit1001
  • 14:59 hashar: Restarting Gerrit replica on gerrit2001
  • 14:46 jayme: published image docker-registry.discovery.wmnet/cfssl-issuer:0.2.0-1 - T294560
  • 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18295 and previous config saved to /var/cache/conftool/dbconfig/20220103-144539-marostegui.json
  • 14:42 XioNoX: push CR744782 "Deprecate interface-range external" to all routers
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T297094)', diff saved to https://phabricator.wikimedia.org/P18293 and previous config saved to /var/cache/conftool/dbconfig/20220103-143034-marostegui.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T297094)', diff saved to https://phabricator.wikimedia.org/P18292 and previous config saved to /var/cache/conftool/dbconfig/20220103-140232-marostegui.json
  • 14:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Maintenance
  • 14:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Maintenance
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T297094)', diff saved to https://phabricator.wikimedia.org/P18291 and previous config saved to /var/cache/conftool/dbconfig/20220103-140221-marostegui.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18290 and previous config saved to /var/cache/conftool/dbconfig/20220103-134716-marostegui.json
  • 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T277354)', diff saved to https://phabricator.wikimedia.org/P18289 and previous config saved to /var/cache/conftool/dbconfig/20220103-134227-marostegui.json
  • 13:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 13:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P18288 and previous config saved to /var/cache/conftool/dbconfig/20220103-133212-marostegui.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T297094)', diff saved to https://phabricator.wikimedia.org/P18287 and previous config saved to /var/cache/conftool/dbconfig/20220103-131707-marostegui.json
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host build2001.codfw.wmnet
  • 12:46 moritzm: installing openjdk-11 security updates on buster
  • 12:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:41 taavi: UTC morning deploys done
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T297094)', diff saved to https://phabricator.wikimedia.org/P18286 and previous config saved to /var/cache/conftool/dbconfig/20220103-124117-marostegui.json
  • 12:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:40 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Use new class names for CentralAuth RC feed (duration: 00m 57s)
  • 12:35 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add towiki.ru to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T294190) (duration: 00m 57s)
  • 12:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:29 taavi@deploy1002: Synchronized wmf-config/logos.php: Config: Add a logo for amiwiki (T298439) (3/3) (duration: 00m 57s)
  • 12:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:28 taavi@deploy1002: Synchronized logos/config.yaml: Config: Add a logo for amiwiki (T298439) (2/3) (duration: 00m 57s)
  • 12:26 taavi@deploy1002: Synchronized static/images/project-logos: Config: Add a logo for amiwiki (T298439) (1/3) (duration: 00m 58s)
  • 12:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:22 taavi@deploy1002: Synchronized wmf-config/logos.php: Config: Add a logo for pwnwiki (T298438) (3/3) (duration: 00m 57s)
  • 12:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:21 taavi@deploy1002: Synchronized logos/config.yaml: Config: Add a logo for pwnwiki (T298438) (2/3) (duration: 00m 57s)
  • 12:20 taavi@deploy1002: Synchronized static/images/project-logos: Config: Add a logo for pwnwiki (T298438) (1/2) (duration: 00m 58s)
  • 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:15 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set ContentTranslationContentImportForSectionTranslation for SX (T294642) (duration: 00m 59s)
  • 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T297094)', diff saved to https://phabricator.wikimedia.org/P18285 and previous config saved to /var/cache/conftool/dbconfig/20220103-121131-marostegui.json
  • 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:01 moritzm: installing wireshark security updates on stretch
  • 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T277354)', diff saved to https://phabricator.wikimedia.org/P18284 and previous config saved to /var/cache/conftool/dbconfig/20220103-120011-marostegui.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18283 and previous config saved to /var/cache/conftool/dbconfig/20220103-115627-marostegui.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist from s2 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18282 and previous config saved to /var/cache/conftool/dbconfig/20220103-115403-marostegui.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P18281 and previous config saved to /var/cache/conftool/dbconfig/20220103-114507-marostegui.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18280 and previous config saved to /var/cache/conftool/dbconfig/20220103-114122-marostegui.json
  • 11:37 moritzm: rebalance row_A ganeti group in codfw (to allow to eventually free 2023 of instances)
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P18279 and previous config saved to /var/cache/conftool/dbconfig/20220103-113002-marostegui.json
  • 11:29 elukey: restart cassandra-b on aqs1010 and aqs1015 (instances stuck / trashing, new cluster, not serving live traffic atm)
  • 11:27 oblivian@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T297094)', diff saved to https://phabricator.wikimedia.org/P18278 and previous config saved to /var/cache/conftool/dbconfig/20220103-112617-marostegui.json
  • 11:19 oblivian@cumin2002: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T297094)', diff saved to https://phabricator.wikimedia.org/P18277 and previous config saved to /var/cache/conftool/dbconfig/20220103-111638-marostegui.json
  • 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T297094)', diff saved to https://phabricator.wikimedia.org/P18276 and previous config saved to /var/cache/conftool/dbconfig/20220103-111631-marostegui.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T277354)', diff saved to https://phabricator.wikimedia.org/P18275 and previous config saved to /var/cache/conftool/dbconfig/20220103-111457-marostegui.json
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18274 and previous config saved to /var/cache/conftool/dbconfig/20220103-110126-marostegui.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P18273 and previous config saved to /var/cache/conftool/dbconfig/20220103-104621-marostegui.json
  • 10:41 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked from s2 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18272 and previous config saved to /var/cache/conftool/dbconfig/20220103-103909-marostegui.json
  • 10:32 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T297094)', diff saved to https://phabricator.wikimedia.org/P18271 and previous config saved to /var/cache/conftool/dbconfig/20220103-103116-marostegui.json
  • 10:22 elukey: powercycle an-worker1114 (CPU soft lockup errors in mgmt console)
  • 10:20 elukey: powercycle an-worker1120 (CPU soft lockup errors in mgmt console)
  • 10:19 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host build2001.codfw.wmnet
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T297094)', diff saved to https://phabricator.wikimedia.org/P18270 and previous config saved to /var/cache/conftool/dbconfig/20220103-101116-marostegui.json
  • 10:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 09:59 moritzm: installing ruby2.3 security updates
  • 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T277354)', diff saved to https://phabricator.wikimedia.org/P18269 and previous config saved to /var/cache/conftool/dbconfig/20220103-093003-marostegui.json
  • 09:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:24 moritzm: installing djvulibre security updates on buster
  • 09:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 09:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions and logpager from s2 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18268 and previous config saved to /var/cache/conftool/dbconfig/20220103-085824-marostegui.json
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove special slaves from s2 codfw T263127', diff saved to https://phabricator.wikimedia.org/P18267 and previous config saved to /var/cache/conftool/dbconfig/20220103-085428-marostegui.json
  • 08:49 moritzm: installing libpcap security updates
  • 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 08:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 08:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:28 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 08:25 moritzm: installing zziplib security updates
  • 08:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 08:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 07:51 moritzm: draining primary and secondary instances off ganeti2023
  • 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 07:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2087.codfw.wmnet with reason: Maintenance
  • 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2087.codfw.wmnet with reason: Maintenance
  • 07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2086.codfw.wmnet with reason: Maintenance
  • 07:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2086.codfw.wmnet with reason: Maintenance
  • 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 07:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 07:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 07:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 07:02 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Full roll out of wgMaxExecutionTimeForExpensiveQueries (T297708), Part I (duration: 01m 20s)
  • 07:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 07:00 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Full roll out of wgMaxExecutionTimeForExpensiveQueries (T297708), Part I (duration: 00m 58s)
  • 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db[2077,2095].codfw.wmnet with reason: Maintenance
  • 06:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db[2077,2095].codfw.wmnet with reason: Maintenance
  • 04:21 Amir1: start of running populating actor in revision table on rest of sections. It will take two months to finish (T275246)

2000s

2010s

2020s