Jump to content

Server Admin Log/Archive 50

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2022-03-31

  • 23:45 mutante: gitlab2001 - fdisk /dev/vdb (g, w) (create partition table), (n, w) (create partition) ; mkfs.ext4 /dev/vdb1 (create filesystem); systemctl reset-failed (fix Icinga alert); mkdir /mnt/gitlab-backup; mount /dev/vdb1 /mnt/gitlab-backup ; blkid (get UUID); edit /etc/fstab and insert "UUID=c5235682-ac21-46a9-85ee-9603f694a6a4 /mnt/gitlab-backup ext4 errors=remount-ro 0 2" T274463
  • 23:27 mutante: gitlab2001 - rebooted on ganeti level (needed when adding new virtual hardware), then ran into the usual bug T272555 where you have to manually fix the interface in /etc/network/interfaces T274463
  • 23:21 mutante: gitlab2001 (gitlab-replica.wikimedia.org) - rebooting to add new virtual disk T274463
  • 23:11 ejegg: updated payments-wiki from 47d9bd27 to 6f888c28
  • 23:01 bblack: esams->drmrs failover test begins - T304089
  • 22:34 moritzm: updated CAS to 6.4.6.2
  • 22:28 mutante: ganeti - creating new 100G virtual disk on gitlab1001 T274463
  • 22:24 mutante: ganeti - creating new 100G virtual disk on gitlab2001 T274463
  • 22:16 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 22:03 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:02 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:51 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:48 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:40 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:19 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^(cp1075|cp1079|cp2035|cp3050|cp3051|cp3052|cp3054|cp4022|cp5013|cp5014|cp5015).*
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:17 bblack@cumin1001: conftool action : select; selector: name="^(cp1075|cp1079|cp2035|cp3050|cp3051|cp3052|cp3054|cp4022|cp5013|cp5014|cp5015).*"
  • 21:13 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove unused Flow config (duration: 00m 49s)
  • 21:07 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp5012.eqsin.wmnet
  • 21:07 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:06 thcipriani: utc late backport complete
  • 21:03 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 20:59 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:56 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 20:56 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/GrowthExperiments/modules/ext.growthExperiments.Homepage.SuggestedEdits/MatchModeSelectWidget.less: Backport: Newcomer tasks: always align button and text to the right (T301825) (duration: 00m 50s)
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:49 thcipriani@deploy1002: Synchronized tests: Config (noop -- tests) (duration: 00m 50s)
  • 20:47 thcipriani@deploy1002: Synchronized src/StaticSiteConfiguration.php: Config (noop -- comment change): phpcs: enable and fix PropertyDocumentation.MissingVar (T171115) (duration: 00m 50s)
  • 20:46 thcipriani@deploy1002: Synchronized phpcs.xml: Config (noop): phpcs: enable and fix PropertyDocumentation.MissingVar (T171115) phpcs: rename test files to match class names (T171115) phpcs: enable rules that are already passing (T171115) (duration: 00m 49s)
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:40 mutante: reserving port 4017 for new k8s service request 'image-suggestions' T304891
  • 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:36 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Stop writing to $wmfLocalServices (T45956) (duration: 00m 50s)
  • 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:29 thcipriani@deploy1002: Synchronized wmf-config: Config: Migrate $wmfLocalServices to $wmgLocalServices (T45956) (duration: 00m 51s)
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2007.codfw.wmnet
  • 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:22 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6001.drmrs.wmnet
  • 20:22 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Start writing to $wmgLocalServices the same value as to $wmfLocalServices (T45956) (duration: 00m 50s)
  • 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:21 mutante: contint2002 - reboot (insetup host)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:18 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6001.drmrs.wmnet
  • 20:17 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2007.codfw.wmnet
  • 20:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be
  • 20:16 thcipriani@deploy1002: Synchronized wmf-config/PhpAutoPrepend.php: Config: Migrate $wmfServiceConfig to $wmgServiceConfig (T45956) (duration: 00m 50s)
  • 20:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
  • 20:12 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5001.eqsin.wmnet
  • 20:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1075.eqiad.wmnet
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:11 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2376.codfw.wmnet
  • 20:10 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2374.codfw.wmnet
  • 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:09 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2272.codfw.wmnet
  • 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:09 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2252.codfw.wmnet
  • 20:08 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2271.codfw.wmnet
  • 20:08 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2251.codfw.wmnet
  • 20:07 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
  • 20:07 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5001.eqsin.wmnet
  • 20:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5014.eqsin.wmnet
  • 20:05 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2376.codfw.wmnet
  • 20:05 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2374.codfw.wmnet
  • 20:04 mutante: mw2271,mw2222 - canary appserver, rebooting
  • 20:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2035.codfw.wmnet
  • 20:04 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4005.ulsfo.wmnet
  • 20:01 mutante: mw2251,mw2252 - canary appserver, rebooting
  • 20:00 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4005.ulsfo.wmnet
  • 19:59 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2272.codfw.wmnet
  • 19:59 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2271.codfw.wmnet
  • 19:58 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2252.codfw.wmnet
  • 19:57 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2251.codfw.wmnet
  • 19:55 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3006.esams.wmnet
  • 19:46 mutante: phab2001 - systemctl restart ssh-phab
  • 19:45 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3006.esams.wmnet
  • 19:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3052.esams.wmnet
  • 19:43 rzl: Rolling-restarted zotero to un-wedge wedged pods with offscale high CPU
  • 19:42 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: sync
  • 19:42 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: sync
  • 19:38 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2008.codfw.wmnet
  • 19:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5014.eqsin.wmnet
  • 19:31 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3052.esams.wmnet
  • 19:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3051.esams.wmnet
  • 19:28 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1016.eqiad.wmnet
  • 19:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5015.eqsin.wmnet
  • 19:26 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2008.codfw.wmnet
  • 19:24 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 19:24 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1016.eqiad.wmnet
  • 19:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1015.eqiad.wmnet
  • 19:23 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
  • 19:21 cwhite: remove openjdk-8-jre from eqiad logstash nodes T301770
  • 19:21 mutante: phab2001 - powercycling via mgmt
  • 19:20 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1015.eqiad.wmnet
  • 19:20 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1014.eqiad.wmnet
  • 19:19 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
  • 19:17 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 19:15 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1014.eqiad.wmnet
  • 19:15 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1013.eqiad.wmnet
  • 19:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6002.drmrs.wmnet
  • 19:14 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3051.esams.wmnet
  • 19:14 mutante: phab2001 - git-ssh.codfw - rebooting - might cause pybal alert
  • 19:13 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5015.eqsin.wmnet
  • 19:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4022.ulsfo.wmnet
  • 19:11 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1013.eqiad.wmnet
  • 19:09 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6002.drmrs.wmnet
  • 19:08 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2035.codfw.wmnet
  • 19:07 bblack@cumin1001: conftool action : set/pooled=yes; selector: cluster=ml_staging
  • 19:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1075.eqiad.wmnet
  • 19:07 bblack@cumin1001: conftool action : set/weight=1; selector: cluster=ml_staging
  • 19:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5013.eqsin.wmnet
  • 19:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3050.esams.wmnet
  • 19:06 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5002.eqsin.wmnet
  • 19:05 mutante: doc.wikimedia.org - short downtime due to maintenance, rebooting doc1001
  • 19:02 mutante: testreduce1001 - needed manual nginx restart after reboot to make https://parsoid-rt-tests.wikimedia.org/ work again
  • 19:01 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5002.eqsin.wmnet
  • 19:00 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.1-1+deb11u1_source.changes
  • 19:00 mutante: testreduce1001 - rebooting
  • 18:59 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4006.ulsfo.wmnet
  • 18:59 mutante: https://parsoid-rt-tests.wikimedia.org/ - short downtime due to maintenance
  • 18:59 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4022.ulsfo.wmnet
  • 18:56 mutante: scandium - rebooting
  • 18:54 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4006.ulsfo.wmnet
  • 18:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3050.esams.wmnet
  • 18:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5013.eqsin.wmnet
  • 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3054.esams.wmnet
  • 18:50 mutante: mwdebug1001 - rebooting
  • 18:49 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3005.esams.wmnet
  • 18:43 duesen: removing /var/run/php/use-config-schema from canaries mw1415, mw1438, and mw1448 to disable config schema loading (T304460)
  • 18:41 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3005.esams.wmnet
  • 18:36 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3054.esams.wmnet
  • 18:36 mutante: gerrit-replica.wikimedia.org short downtime, rebooting gerrit2001
  • 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:23 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/TimedMediaHandler/resources/ext.tmh.player.styles.less: Backport: Set noflip for css rule that needs it (T305156) (duration: 00m 51s)
  • 18:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:20 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2009.codfw.wmnet
  • 18:19 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@ba88f51]: 0.3.109 (duration: 07m 24s)
  • 18:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host authdns2001.wikimedia.org
  • 18:13 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.109` on canary `wdqs1003`; proceeding to rest of fleet
  • 18:11 ryankemper@deploy1002: Started deploy [wdqs/wdqs@ba88f51]: 0.3.109
  • 18:11 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.109`. Pre-deploy tests passing on canary `wdqs1003`
  • 18:08 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2009.codfw.wmnet
  • 18:03 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
  • 17:57 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
  • 17:52 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host authdns2001.wikimedia.org
  • 17:47 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host authdns1001.wikimedia.org
  • 17:41 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host authdns1001.wikimedia.org
  • 17:37 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6003.drmrs.wmnet
  • 17:31 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1001.wikimedia.org
  • 17:30 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6003.drmrs.wmnet
  • 17:30 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5003.eqsin.wmnet
  • 17:25 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns1001.wikimedia.org
  • 17:25 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2001.wikimedia.org
  • 17:24 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5003.eqsin.wmnet
  • 17:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4007.ulsfo.wmnet
  • 17:17 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4007.ulsfo.wmnet
  • 17:17 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3007.esams.wmnet
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Maint', diff saved to https://phabricator.wikimedia.org/P24019 and previous config saved to /var/cache/conftool/dbconfig/20220331-171724-ladsgroup.json
  • 17:10 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3007.esams.wmnet
  • 17:10 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2010.codfw.wmnet
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Maint', diff saved to https://phabricator.wikimedia.org/P24018 and previous config saved to /var/cache/conftool/dbconfig/20220331-170221-ladsgroup.json
  • 16:58 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2010.codfw.wmnet
  • 16:58 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
  • 16:57 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6002.wikimedia.org
  • 16:55 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns2001.wikimedia.org
  • 16:54 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3001.wikimedia.org
  • 16:51 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
  • 16:51 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns6002.wikimedia.org
  • 16:51 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5002.wikimedia.org
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Maint', diff saved to https://phabricator.wikimedia.org/P24017 and previous config saved to /var/cache/conftool/dbconfig/20220331-164717-ladsgroup.json
  • 16:47 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns3001.wikimedia.org
  • 16:47 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4001.wikimedia.org
  • 16:42 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns5002.wikimedia.org
  • 16:42 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4002.wikimedia.org
  • 16:37 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns4001.wikimedia.org
  • 16:37 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5001.wikimedia.org
  • 16:33 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns4002.wikimedia.org
  • 16:33 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3002.wikimedia.org
  • 16:33 duesen: creating /var/run/php/use-config-schema on canaries mw1415, mw1438, and mw1448 to enable config schema loading (T304460)
  • 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Maint', diff saved to https://phabricator.wikimedia.org/P24016 and previous config saved to /var/cache/conftool/dbconfig/20220331-163213-ladsgroup.json
  • 16:28 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns5001.wikimedia.org
  • 16:28 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6001.wikimedia.org
  • 16:25 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns3002.wikimedia.org
  • 16:25 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1002.wikimedia.org
  • 16:20 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns6001.wikimedia.org
  • 16:19 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns1002.wikimedia.org
  • 16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: Maint', diff saved to https://phabricator.wikimedia.org/P24015 and previous config saved to /var/cache/conftool/dbconfig/20220331-161709-ladsgroup.json
  • 16:17 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2002.wikimedia.org
  • 16:11 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns2002.wikimedia.org
  • 16:11 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dns2002.wikimedia.org
  • 16:11 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns2002.wikimedia.org
  • 15:59 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:45 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:45 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:44 mmandere: pool cp6016 with HAProxy as TLS termination layer - T290005
  • 15:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster
  • 15:40 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 15:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 15:35 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:18 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
  • 15:15 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
  • 15:13 mmandere: pool cp5009 with HAProxy as TLS termination layer - T290005
  • 15:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:11 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:10 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 12 hosts with reason: reboot for update T304938
  • 15:10 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5009.eqsin.wmnet with OS buster
  • 15:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on 12 hosts with reason: reboot for update T304938
  • 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on durum[1001-1002].eqiad.wmnet with reason: reboot for update T304938
  • 15:05 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on durum[1001-1002].eqiad.wmnet with reason: reboot for update T304938
  • 15:05 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:57 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS buster
  • 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh6002.wikimedia.org with reason: reboot for kernel update T304938
  • 14:56 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh6002.wikimedia.org with reason: reboot for kernel update T304938
  • 14:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh6001.wikimedia.org with reason: reboot for kernel update T304938
  • 14:56 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh6001.wikimedia.org with reason: reboot for kernel update T304938
  • 14:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh5002.wikimedia.org with reason: reboot for kernel update T304938
  • 14:52 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh5002.wikimedia.org with reason: reboot for kernel update T304938
  • 14:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh5001.wikimedia.org with reason: reboot for kernel update T304938
  • 14:52 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh5001.wikimedia.org with reason: reboot for kernel update T304938
  • 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:50 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:47 mmandere: depool cp6016 for reimage - T290005
  • 14:46 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh4002.wikimedia.org with reason: reboot for kernel update T304938
  • 14:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh4002.wikimedia.org with reason: reboot for kernel update T304938
  • 14:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on doh4001.wikimedia.org with reason: reboot for kernel update T304938
  • 14:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on doh4001.wikimedia.org with reason: reboot for kernel update T304938
  • 14:39 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5009.eqsin.wmnet with reason: host reimage
  • 14:36 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5009.eqsin.wmnet with reason: host reimage
  • 14:22 duesen: (late) about 5 hours ago, I removed /var/run/php/use-config-schema from mw1415 to disable config schema loading (T304460)
  • 14:09 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5009.eqsin.wmnet with OS buster
  • 14:05 mmandere@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5009.eqsin.wmnet with OS buster
  • 14:03 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5009.eqsin.wmnet with OS buster
  • 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:02 moritzm: installing vim security updates on buster
  • 14:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1002.wikimedia.org
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:56 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:55 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.5/includes/changetags/ChangeTags.php: Backport: ChangeTags: Use localizer with correct page title to parse messages (T302754) (duration: 00m 51s)
  • 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:53 mmandere: depool cp5009 for reimage - T290005
  • 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1002.wikimedia.org
  • 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2001.wikimedia.org
  • 13:51 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.5/resources/src/mediawiki.special.createaccount/HtmlformChecker.js: Backport: Fix error/warning boxes on signup form (T305098) (duration: 00m 50s)
  • 13:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2001.wikimedia.org
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/CentralAuth/includes/Special/GlobalUsersPager.php: Backport: Revert "GlobalUsersPager: add gu_id to GROUP BY" (duration: 00m 50s)
  • 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:20 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.5/tests/phpunit/structure/SpecialPageFatalTest.php: Backport: Revert "Add SpecialPageFatalTest to @group Database" (no-op) (duration: 00m 50s)
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Configure `mul` language code on Test Wikidata and its clients (T297393) (2/2) (duration: 00m 50s)
  • 13:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Configure `mul` language code on Test Wikidata and its clients (T297393) (1/2) (duration: 00m 51s)
  • 13:03 mmandere: pool cp4023 with HAProxy as TLS termination layer - T290005
  • 12:53 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4023.ulsfo.wmnet with OS buster
  • 12:53 mmandere: pool cp3057 with HAProxy as TLS termination layer - T290005
  • 12:50 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3057.esams.wmnet with OS buster
  • 12:48 XioNoX: analytics1-b/c/d-eqiad: replace firewall filter with strict uRPF - T298087
  • 12:31 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4023.ulsfo.wmnet with reason: host reimage
  • 12:28 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4023.ulsfo.wmnet with reason: host reimage
  • 12:25 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298557)', diff saved to https://phabricator.wikimedia.org/P24013 and previous config saved to /var/cache/conftool/dbconfig/20220331-122247-marostegui.json
  • 12:22 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
  • 12:12 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4023.ulsfo.wmnet with OS buster
  • 12:07 mmandere: depool cp4023 for reimage - T290005
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P24012 and previous config saved to /var/cache/conftool/dbconfig/20220331-120742-marostegui.json
  • 12:04 moritzm: installing wireshark security updates
  • 11:54 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3057.esams.wmnet with OS buster
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P24011 and previous config saved to /var/cache/conftool/dbconfig/20220331-115235-marostegui.json
  • 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
  • 11:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
  • 11:39 mmandere: depool cp3057 for reimage - T290005
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298557)', diff saved to https://phabricator.wikimedia.org/P24010 and previous config saved to /var/cache/conftool/dbconfig/20220331-113730-marostegui.json
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2002.codfw.wmnet
  • 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2002.codfw.wmnet
  • 11:19 moritzm: installing libpcap security updates
  • 11:16 mmandere: pool cp3056 with HAProxy as TLS termination layer - T290005
  • 11:08 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3056.esams.wmnet with OS buster
  • 10:55 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:55 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:53 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:53 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:44 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 10:41 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
  • 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T297189)', diff saved to https://phabricator.wikimedia.org/P24009 and previous config saved to /var/cache/conftool/dbconfig/20220331-102819-marostegui.json
  • 10:26 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2002.codfw.wmnet
  • 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2002.codfw.wmnet
  • 10:14 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3056.esams.wmnet with OS buster
  • 10:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P24007 and previous config saved to /var/cache/conftool/dbconfig/20220331-101314-marostegui.json
  • 10:12 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:12 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb1002.eqiad.wmnet
  • 10:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host miscweb1002.eqiad.wmnet
  • 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb2002.codfw.wmnet
  • 10:00 mmandere: pool cp4029 with HAProxy as TLS termination layer - T290005
  • 10:00 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P24006 and previous config saved to /var/cache/conftool/dbconfig/20220331-095809-marostegui.json
  • 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host miscweb2002.codfw.wmnet
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T298557)', diff saved to https://phabricator.wikimedia.org/P24005 and previous config saved to /var/cache/conftool/dbconfig/20220331-095319-marostegui.json
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P24004 and previous config saved to /var/cache/conftool/dbconfig/20220331-095228-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T297189)', diff saved to https://phabricator.wikimedia.org/P24003 and previous config saved to /var/cache/conftool/dbconfig/20220331-094304-marostegui.json
  • 09:43 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4029.ulsfo.wmnet with OS buster
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P24002 and previous config saved to /var/cache/conftool/dbconfig/20220331-093725-root.json
  • 09:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1003.eqiad.wmnet
  • 09:26 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3056.esams.wmnet with OS buster
  • 09:25 duesen: removed /var/run/php/use-config-schema from mwdebug1002 to disable config schema loading (T304460)
  • 09:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1003.eqiad.wmnet
  • 09:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1002.eqiad.wmnet
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P24001 and previous config saved to /var/cache/conftool/dbconfig/20220331-092221-root.json
  • 09:21 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4029.ulsfo.wmnet with reason: host reimage
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet
  • 09:18 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4029.ulsfo.wmnet with reason: host reimage
  • 09:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1002.eqiad.wmnet
  • 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet
  • 09:16 duesen: created /var/run/php/use-config-schema on canary mw1415 to enable config schema loading (T304460)
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T297189)', diff saved to https://phabricator.wikimedia.org/P24000 and previous config saved to /var/cache/conftool/dbconfig/20220331-091626-marostegui.json
  • 09:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on ms-be1069.eqiad.wmnet with reason: Puppet errors during reimage
  • 09:09 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on ms-be1069.eqiad.wmnet with reason: Puppet errors during reimage
  • 09:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be1069.eqiad.wmnet with reason: Puppet errors during reimage
  • 09:08 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ms-be1069.eqiad.wmnet with reason: Puppet errors during reimage
  • 09:08 duesen: created /var/run/php/use-config-schema on mwdebug1002 to enable config schema loading (T304460)
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23999 and previous config saved to /var/cache/conftool/dbconfig/20220331-090717-root.json
  • 09:02 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4029.ulsfo.wmnet with OS buster
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp1001.wikimedia.org
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
  • 08:58 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1069.eqiad.wmnet with OS stretch
  • 08:57 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 08:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
  • 08:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-corp1001.wikimedia.org
  • 08:54 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 08:53 mmandere: depool cp4029 for reimage - T290005
  • 08:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1001.eqiad.wmnet
  • 08:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1001.eqiad.wmnet
  • 08:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
  • 08:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:40 XioNoX: analytics1-a-eqiad: replace firewall filter with strict uRPF - T298087
  • 08:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
  • 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp2001.wikimedia.org
  • 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:35 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.5 refs T300204
  • 08:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-corp2001.wikimedia.org
  • 08:30 hashar@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/OATHAuth/src/OATHUserRepository.php: Backport: Revert "OATHUserRepository: Stop handling legacy single-key" (T305029) (duration: 00m 51s)
  • 08:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298557)', diff saved to https://phabricator.wikimedia.org/P23997 and previous config saved to /var/cache/conftool/dbconfig/20220331-082525-marostegui.json
  • 08:25 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3056.esams.wmnet with OS buster
  • 08:19 daniel@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/GrowthExperiments/modules/ext.growthExperiments.PostEdit/index.js: Backport: Post-edit dialog: check for presence of preferences.topicFilters (T305057) (duration: 00m 53s)
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P23996 and previous config saved to /var/cache/conftool/dbconfig/20220331-081020-marostegui.json
  • 08:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P23995 and previous config saved to /var/cache/conftool/dbconfig/20220331-075515-marostegui.json
  • 07:41 mmandere: depool cp3056 for reimage - T290005
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298557)', diff saved to https://phabricator.wikimedia.org/P23994 and previous config saved to /var/cache/conftool/dbconfig/20220331-074010-marostegui.json
  • 07:30 daniel@deploy1002: Synchronized multiversion/defines.php: Config: Set MW_USE_CONFIG_SCHEMA constant if file exists. (T304460) (duration: 00m 52s)
  • 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:18 moritzm: updating libapache2-mod-auth-cas on buster hosts
  • 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:49 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 06:48 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23993 and previous config saved to /var/cache/conftool/dbconfig/20220331-063429-ladsgroup.json
  • 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23992 and previous config saved to /var/cache/conftool/dbconfig/20220331-061923-ladsgroup.json
  • 06:12 marostegui: dbmaint s5@eqiad T300381
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 T303798', diff saved to https://phabricator.wikimedia.org/P23991 and previous config saved to /var/cache/conftool/dbconfig/20220331-060820-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T298557)', diff saved to https://phabricator.wikimedia.org/P23990 and previous config saved to /var/cache/conftool/dbconfig/20220331-060517-marostegui.json
  • 06:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 06:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298557)', diff saved to https://phabricator.wikimedia.org/P23989 and previous config saved to /var/cache/conftool/dbconfig/20220331-060509-marostegui.json
  • 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23988 and previous config saved to /var/cache/conftool/dbconfig/20220331-060418-ladsgroup.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1100 to s5 primary and set section read-write T303798', diff saved to https://phabricator.wikimedia.org/P23987 and previous config saved to /var/cache/conftool/dbconfig/20220331-060122-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T303798', diff saved to https://phabricator.wikimedia.org/P23986 and previous config saved to /var/cache/conftool/dbconfig/20220331-060042-root.json
  • 06:00 marostegui: Starting s5 eqiad failover from db1130 to db1100 - T303798
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23985 and previous config saved to /var/cache/conftool/dbconfig/20220331-055004-marostegui.json
  • 05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23984 and previous config saved to /var/cache/conftool/dbconfig/20220331-054913-ladsgroup.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23983 and previous config saved to /var/cache/conftool/dbconfig/20220331-053459-marostegui.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298557)', diff saved to https://phabricator.wikimedia.org/P23981 and previous config saved to /var/cache/conftool/dbconfig/20220331-051954-marostegui.json
  • 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23980 and previous config saved to /var/cache/conftool/dbconfig/20220331-044859-ladsgroup.json
  • 04:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 04:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23979 and previous config saved to /var/cache/conftool/dbconfig/20220331-044851-ladsgroup.json
  • 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 T303798', diff saved to https://phabricator.wikimedia.org/P23978 and previous config saved to /var/cache/conftool/dbconfig/20220331-043906-marostegui.json
  • 04:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 22 hosts with reason: Primary switchover s5 T303798
  • 04:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 22 hosts with reason: Primary switchover s5 T303798
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23977 and previous config saved to /var/cache/conftool/dbconfig/20220331-043346-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23976 and previous config saved to /var/cache/conftool/dbconfig/20220331-041841-ladsgroup.json
  • 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23975 and previous config saved to /var/cache/conftool/dbconfig/20220331-040940-ladsgroup.json
  • 04:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 04:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23974 and previous config saved to /var/cache/conftool/dbconfig/20220331-040916-ladsgroup.json
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23973 and previous config saved to /var/cache/conftool/dbconfig/20220331-040336-ladsgroup.json
  • 03:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23972 and previous config saved to /var/cache/conftool/dbconfig/20220331-035411-ladsgroup.json
  • 03:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T298557)', diff saved to https://phabricator.wikimedia.org/P23971 and previous config saved to /var/cache/conftool/dbconfig/20220331-034709-marostegui.json
  • 03:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 03:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 03:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298557)', diff saved to https://phabricator.wikimedia.org/P23970 and previous config saved to /var/cache/conftool/dbconfig/20220331-034701-marostegui.json
  • 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23969 and previous config saved to /var/cache/conftool/dbconfig/20220331-033906-ladsgroup.json
  • 03:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P23968 and previous config saved to /var/cache/conftool/dbconfig/20220331-033156-marostegui.json
  • 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23967 and previous config saved to /var/cache/conftool/dbconfig/20220331-032401-ladsgroup.json
  • 03:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P23966 and previous config saved to /var/cache/conftool/dbconfig/20220331-031651-marostegui.json
  • 03:15 ejegg: civicrm revision changed from a6f49bb3 to 84c737b6
  • 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23965 and previous config saved to /var/cache/conftool/dbconfig/20220331-030531-ladsgroup.json
  • 03:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 03:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23964 and previous config saved to /var/cache/conftool/dbconfig/20220331-030523-ladsgroup.json
  • 03:04 eileen: civicrm revision changed from a9c323af to a6f49bb3
  • 03:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23963 and previous config saved to /var/cache/conftool/dbconfig/20220331-030321-ladsgroup.json
  • 03:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 03:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 03:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23962 and previous config saved to /var/cache/conftool/dbconfig/20220331-030313-ladsgroup.json
  • 03:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298557)', diff saved to https://phabricator.wikimedia.org/P23961 and previous config saved to /var/cache/conftool/dbconfig/20220331-030146-marostegui.json
  • 02:50 catrope@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Code style-only change to MWConfigCacheGenerator.php (duration: 00m 52s)
  • 02:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23960 and previous config saved to /var/cache/conftool/dbconfig/20220331-025018-ladsgroup.json
  • 02:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23959 and previous config saved to /var/cache/conftool/dbconfig/20220331-024808-ladsgroup.json
  • 02:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23958 and previous config saved to /var/cache/conftool/dbconfig/20220331-023513-ladsgroup.json
  • 02:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23957 and previous config saved to /var/cache/conftool/dbconfig/20220331-023303-ladsgroup.json
  • 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23956 and previous config saved to /var/cache/conftool/dbconfig/20220331-022008-ladsgroup.json
  • 02:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23955 and previous config saved to /var/cache/conftool/dbconfig/20220331-021758-ladsgroup.json
  • 02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23954 and previous config saved to /var/cache/conftool/dbconfig/20220331-021450-ladsgroup.json
  • 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23953 and previous config saved to /var/cache/conftool/dbconfig/20220331-021413-ladsgroup.json
  • 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23952 and previous config saved to /var/cache/conftool/dbconfig/20220331-020643-ladsgroup.json
  • 02:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 02:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23951 and previous config saved to /var/cache/conftool/dbconfig/20220331-020635-ladsgroup.json
  • 01:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23950 and previous config saved to /var/cache/conftool/dbconfig/20220331-015908-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23949 and previous config saved to /var/cache/conftool/dbconfig/20220331-015130-ladsgroup.json
  • 01:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23948 and previous config saved to /var/cache/conftool/dbconfig/20220331-014403-ladsgroup.json
  • 01:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T300775)', diff saved to https://phabricator.wikimedia.org/P23947 and previous config saved to /var/cache/conftool/dbconfig/20220331-014140-marostegui.json
  • 01:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 01:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 01:38 eileen: revision changed from 4bb3ec09 to a9c323af
  • 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23946 and previous config saved to /var/cache/conftool/dbconfig/20220331-013625-ladsgroup.json
  • 01:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23945 and previous config saved to /var/cache/conftool/dbconfig/20220331-012858-ladsgroup.json
  • 01:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T298557)', diff saved to https://phabricator.wikimedia.org/P23944 and previous config saved to /var/cache/conftool/dbconfig/20220331-012734-marostegui.json
  • 01:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 01:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 01:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298557)', diff saved to https://phabricator.wikimedia.org/P23943 and previous config saved to /var/cache/conftool/dbconfig/20220331-012726-marostegui.json
  • 01:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23942 and previous config saved to /var/cache/conftool/dbconfig/20220331-012650-ladsgroup.json
  • 01:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 01:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23941 and previous config saved to /var/cache/conftool/dbconfig/20220331-012637-ladsgroup.json
  • 01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23940 and previous config saved to /var/cache/conftool/dbconfig/20220331-012120-ladsgroup.json
  • 01:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P23939 and previous config saved to /var/cache/conftool/dbconfig/20220331-011221-marostegui.json
  • 01:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23938 and previous config saved to /var/cache/conftool/dbconfig/20220331-011132-ladsgroup.json
  • 00:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P23937 and previous config saved to /var/cache/conftool/dbconfig/20220331-005716-marostegui.json
  • 00:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23936 and previous config saved to /var/cache/conftool/dbconfig/20220331-005627-ladsgroup.json
  • 00:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298557)', diff saved to https://phabricator.wikimedia.org/P23935 and previous config saved to /var/cache/conftool/dbconfig/20220331-004211-marostegui.json
  • 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23934 and previous config saved to /var/cache/conftool/dbconfig/20220331-004122-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23933 and previous config saved to /var/cache/conftool/dbconfig/20220331-003914-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23932 and previous config saved to /var/cache/conftool/dbconfig/20220331-003906-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23931 and previous config saved to /var/cache/conftool/dbconfig/20220331-003834-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 00:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23930 and previous config saved to /var/cache/conftool/dbconfig/20220331-003826-ladsgroup.json
  • 00:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23929 and previous config saved to /var/cache/conftool/dbconfig/20220331-002401-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23928 and previous config saved to /var/cache/conftool/dbconfig/20220331-002321-ladsgroup.json
  • 00:17 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.1-1_source.changes # T299705
  • 00:13 eileen: revision changed from 951ffb1d to 4bb3ec09
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23927 and previous config saved to /var/cache/conftool/dbconfig/20220331-000856-ladsgroup.json
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23926 and previous config saved to /var/cache/conftool/dbconfig/20220331-000816-ladsgroup.json

2022-03-30

  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23925 and previous config saved to /var/cache/conftool/dbconfig/20220330-235351-ladsgroup.json
  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23924 and previous config saved to /var/cache/conftool/dbconfig/20220330-235311-ladsgroup.json
  • 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23923 and previous config saved to /var/cache/conftool/dbconfig/20220330-235143-ladsgroup.json
  • 23:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23922 and previous config saved to /var/cache/conftool/dbconfig/20220330-235131-ladsgroup.json
  • 23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23921 and previous config saved to /var/cache/conftool/dbconfig/20220330-233625-ladsgroup.json
  • 23:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23920 and previous config saved to /var/cache/conftool/dbconfig/20220330-232120-ladsgroup.json
  • 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23919 and previous config saved to /var/cache/conftool/dbconfig/20220330-230914-ladsgroup.json
  • 23:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23918 and previous config saved to /var/cache/conftool/dbconfig/20220330-230905-ladsgroup.json
  • 23:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T298557)', diff saved to https://phabricator.wikimedia.org/P23917 and previous config saved to /var/cache/conftool/dbconfig/20220330-230803-marostegui.json
  • 23:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298557)', diff saved to https://phabricator.wikimedia.org/P23916 and previous config saved to /var/cache/conftool/dbconfig/20220330-230755-marostegui.json
  • 23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23915 and previous config saved to /var/cache/conftool/dbconfig/20220330-230615-ladsgroup.json
  • 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23914 and previous config saved to /var/cache/conftool/dbconfig/20220330-230408-ladsgroup.json
  • 23:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 23:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23913 and previous config saved to /var/cache/conftool/dbconfig/20220330-230336-ladsgroup.json
  • 22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23912 and previous config saved to /var/cache/conftool/dbconfig/20220330-225401-ladsgroup.json
  • 22:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P23911 and previous config saved to /var/cache/conftool/dbconfig/20220330-225250-marostegui.json
  • 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23910 and previous config saved to /var/cache/conftool/dbconfig/20220330-224831-ladsgroup.json
  • 22:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23909 and previous config saved to /var/cache/conftool/dbconfig/20220330-223856-ladsgroup.json
  • 22:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P23908 and previous config saved to /var/cache/conftool/dbconfig/20220330-223745-marostegui.json
  • 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23907 and previous config saved to /var/cache/conftool/dbconfig/20220330-223325-ladsgroup.json
  • 22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23906 and previous config saved to /var/cache/conftool/dbconfig/20220330-222351-ladsgroup.json
  • 22:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298557)', diff saved to https://phabricator.wikimedia.org/P23905 and previous config saved to /var/cache/conftool/dbconfig/20220330-222240-marostegui.json
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23904 and previous config saved to /var/cache/conftool/dbconfig/20220330-221820-ladsgroup.json
  • 22:15 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 21:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23903 and previous config saved to /var/cache/conftool/dbconfig/20220330-211806-ladsgroup.json
  • 21:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 21:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23902 and previous config saved to /var/cache/conftool/dbconfig/20220330-211758-ladsgroup.json
  • 21:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:03 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23900 and previous config saved to /var/cache/conftool/dbconfig/20220330-210253-ladsgroup.json
  • 20:56 ejegg: updated fundraising python tools from 8f5119f6 to af97fc4a
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23899 and previous config saved to /var/cache/conftool/dbconfig/20220330-205529-ladsgroup.json
  • 20:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 20:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23898 and previous config saved to /var/cache/conftool/dbconfig/20220330-205521-ladsgroup.json
  • 20:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 20:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23897 and previous config saved to /var/cache/conftool/dbconfig/20220330-204748-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23896 and previous config saved to /var/cache/conftool/dbconfig/20220330-204016-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23895 and previous config saved to /var/cache/conftool/dbconfig/20220330-203243-ladsgroup.json
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23894 and previous config saved to /var/cache/conftool/dbconfig/20220330-203035-ladsgroup.json
  • 20:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 20:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23893 and previous config saved to /var/cache/conftool/dbconfig/20220330-203028-ladsgroup.json
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23892 and previous config saved to /var/cache/conftool/dbconfig/20220330-202511-ladsgroup.json
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23891 and previous config saved to /var/cache/conftool/dbconfig/20220330-201522-ladsgroup.json
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23890 and previous config saved to /var/cache/conftool/dbconfig/20220330-201006-ladsgroup.json
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T298557)', diff saved to https://phabricator.wikimedia.org/P23889 and previous config saved to /var/cache/conftool/dbconfig/20220330-200236-marostegui.json
  • 20:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 20:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298557)', diff saved to https://phabricator.wikimedia.org/P23888 and previous config saved to /var/cache/conftool/dbconfig/20220330-200229-marostegui.json
  • 20:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23887 and previous config saved to /var/cache/conftool/dbconfig/20220330-200017-ladsgroup.json
  • 19:56 razzi@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test-eqiad cluster: Reboot kafka nodes
  • 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P23886 and previous config saved to /var/cache/conftool/dbconfig/20220330-194723-marostegui.json
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23885 and previous config saved to /var/cache/conftool/dbconfig/20220330-194512-ladsgroup.json
  • 19:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P23884 and previous config saved to /var/cache/conftool/dbconfig/20220330-193218-marostegui.json
  • 19:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23883 and previous config saved to /var/cache/conftool/dbconfig/20220330-192355-ladsgroup.json
  • 19:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 19:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 19:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23882 and previous config saved to /var/cache/conftool/dbconfig/20220330-192347-ladsgroup.json
  • 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298557)', diff saved to https://phabricator.wikimedia.org/P23881 and previous config saved to /var/cache/conftool/dbconfig/20220330-191713-marostegui.json
  • 19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23880 and previous config saved to /var/cache/conftool/dbconfig/20220330-190842-ladsgroup.json
  • 18:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23879 and previous config saved to /var/cache/conftool/dbconfig/20220330-185337-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23878 and previous config saved to /var/cache/conftool/dbconfig/20220330-184458-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23877 and previous config saved to /var/cache/conftool/dbconfig/20220330-184445-ladsgroup.json
  • 18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23876 and previous config saved to /var/cache/conftool/dbconfig/20220330-183832-ladsgroup.json
  • 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23875 and previous config saved to /var/cache/conftool/dbconfig/20220330-182940-ladsgroup.json
  • 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23874 and previous config saved to /var/cache/conftool/dbconfig/20220330-182537-ladsgroup.json
  • 18:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 18:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23873 and previous config saved to /var/cache/conftool/dbconfig/20220330-181435-ladsgroup.json
  • 18:11 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test-eqiad cluster: Reboot kafka nodes
  • 18:08 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
  • 18:03 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
  • 18:01 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
  • 18:00 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host zookeeper-test1002.eqiad.wmnet
  • 18:00 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
  • 18:00 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23872 and previous config saved to /var/cache/conftool/dbconfig/20220330-175930-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23871 and previous config saved to /var/cache/conftool/dbconfig/20220330-175822-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23870 and previous config saved to /var/cache/conftool/dbconfig/20220330-175814-ladsgroup.json
  • 17:47 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1069.eqiad.wmnet with OS stretch
  • 17:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 17:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 17:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T298557)', diff saved to https://phabricator.wikimedia.org/P23869 and previous config saved to /var/cache/conftool/dbconfig/20220330-174426-marostegui.json
  • 17:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 17:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 17:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298557)', diff saved to https://phabricator.wikimedia.org/P23868 and previous config saved to /var/cache/conftool/dbconfig/20220330-174418-marostegui.json
  • 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23867 and previous config saved to /var/cache/conftool/dbconfig/20220330-174309-ladsgroup.json
  • 17:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23866 and previous config saved to /var/cache/conftool/dbconfig/20220330-172913-marostegui.json
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23865 and previous config saved to /var/cache/conftool/dbconfig/20220330-172804-ladsgroup.json
  • 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T297189)', diff saved to https://phabricator.wikimedia.org/P23864 and previous config saved to /var/cache/conftool/dbconfig/20220330-171732-marostegui.json
  • 17:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23862 and previous config saved to /var/cache/conftool/dbconfig/20220330-171408-marostegui.json
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23861 and previous config saved to /var/cache/conftool/dbconfig/20220330-171259-ladsgroup.json
  • 17:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 17:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P23859 and previous config saved to /var/cache/conftool/dbconfig/20220330-170227-marostegui.json
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23858 and previous config saved to /var/cache/conftool/dbconfig/20220330-170150-ladsgroup.json
  • 17:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23857 and previous config saved to /var/cache/conftool/dbconfig/20220330-170142-ladsgroup.json
  • 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298557)', diff saved to https://phabricator.wikimedia.org/P23856 and previous config saved to /var/cache/conftool/dbconfig/20220330-165903-marostegui.json
  • 16:52 topranks: "Manually decommissioning xe-0/0/1 on lsw1-e2-eqiad before reimage of ms-be1069 from scratch, attempt to replicate ARP error seen previously while running debug."
  • 16:52 volans: sudo systemctl reload icinga.service on alert1001
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P23855 and previous config saved to /var/cache/conftool/dbconfig/20220330-164722-marostegui.json
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23854 and previous config saved to /var/cache/conftool/dbconfig/20220330-164637-ladsgroup.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T297189)', diff saved to https://phabricator.wikimedia.org/P23853 and previous config saved to /var/cache/conftool/dbconfig/20220330-163217-marostegui.json
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23852 and previous config saved to /var/cache/conftool/dbconfig/20220330-163132-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 16:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 16:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
  • 16:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
  • 16:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
  • 16:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23850 and previous config saved to /var/cache/conftool/dbconfig/20220330-161626-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23849 and previous config saved to /var/cache/conftool/dbconfig/20220330-161418-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23848 and previous config saved to /var/cache/conftool/dbconfig/20220330-161337-ladsgroup.json
  • 16:04 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab1001.wikimedia.org
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23847 and previous config saved to /var/cache/conftool/dbconfig/20220330-155832-ladsgroup.json
  • 15:52 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab1001.wikimedia.org
  • 15:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T297189)', diff saved to https://phabricator.wikimedia.org/P23845 and previous config saved to /var/cache/conftool/dbconfig/20220330-155139-marostegui.json
  • 15:51 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab-runner2001.codfw.wmnet
  • 15:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 15:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T297189)', diff saved to https://phabricator.wikimedia.org/P23844 and previous config saved to /var/cache/conftool/dbconfig/20220330-155126-marostegui.json
  • 15:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:47 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab-runner2001.codfw.wmnet
  • 15:46 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab-runner1001.eqiad.wmnet
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23843 and previous config saved to /var/cache/conftool/dbconfig/20220330-154326-ladsgroup.json
  • 15:43 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab-runner1001.eqiad.wmnet
  • 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P23842 and previous config saved to /var/cache/conftool/dbconfig/20220330-153621-marostegui.json
  • 15:32 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23841 and previous config saved to /var/cache/conftool/dbconfig/20220330-152821-ladsgroup.json
  • 15:26 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
  • 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23840 and previous config saved to /var/cache/conftool/dbconfig/20220330-152613-ladsgroup.json
  • 15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23839 and previous config saved to /var/cache/conftool/dbconfig/20220330-152539-ladsgroup.json
  • 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2009.codfw.wmnet
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P23838 and previous config saved to /var/cache/conftool/dbconfig/20220330-152116-marostegui.json
  • 15:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1001.eqiad.wmnet
  • 15:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1001.eqiad.wmnet
  • 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2009.codfw.wmnet
  • 15:15 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1003.eqiad.wmnet
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T298557)', diff saved to https://phabricator.wikimedia.org/P23837 and previous config saved to /var/cache/conftool/dbconfig/20220330-151346-marostegui.json
  • 15:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 15:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298557)', diff saved to https://phabricator.wikimedia.org/P23836 and previous config saved to /var/cache/conftool/dbconfig/20220330-151338-marostegui.json
  • 15:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23835 and previous config saved to /var/cache/conftool/dbconfig/20220330-151034-ladsgroup.json
  • 15:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1003.eqiad.wmnet
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T297189)', diff saved to https://phabricator.wikimedia.org/P23834 and previous config saved to /var/cache/conftool/dbconfig/20220330-150611-marostegui.json
  • 15:05 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1002.eqiad.wmnet
  • 15:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2008.codfw.wmnet
  • 14:59 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1002.eqiad.wmnet
  • 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P23833 and previous config saved to /var/cache/conftool/dbconfig/20220330-145833-marostegui.json
  • 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2001.codfw.wmnet
  • 14:56 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2008.codfw.wmnet
  • 14:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2007.codfw.wmnet
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23831 and previous config saved to /var/cache/conftool/dbconfig/20220330-145529-ladsgroup.json
  • 14:55 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
  • 14:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2001.codfw.wmnet
  • 14:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2093.codfw.wmnet with OS bullseye
  • 14:50 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2007.codfw.wmnet
  • 14:47 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
  • 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2001.wikimedia.org
  • 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P23830 and previous config saved to /var/cache/conftool/dbconfig/20220330-144328-marostegui.json
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23829 and previous config saved to /var/cache/conftool/dbconfig/20220330-144023-ladsgroup.json
  • 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp2001.wikimedia.org
  • 14:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
  • 14:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2006.codfw.wmnet
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23828 and previous config saved to /var/cache/conftool/dbconfig/20220330-143252-ladsgroup.json
  • 14:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2093.codfw.wmnet with reason: host reimage
  • 14:31 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
  • 14:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1001.eqiad.wmnet
  • 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2093.codfw.wmnet with reason: host reimage
  • 14:29 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2006.codfw.wmnet
  • 14:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2005.codfw.wmnet
  • 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298557)', diff saved to https://phabricator.wikimedia.org/P23827 and previous config saved to /var/cache/conftool/dbconfig/20220330-142823-marostegui.json
  • 14:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet
  • 14:22 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2005.codfw.wmnet
  • 14:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:19 moritzm: installing remaining tiff security updates
  • 14:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23826 and previous config saved to /var/cache/conftool/dbconfig/20220330-141747-ladsgroup.json
  • 14:15 hashar: deploy1002: `git fetch && git rebase` to catchup with `group1 wikis to 1.39.0-wmf.5` commit which did not get send to Gerrit but got deployed earlier today
  • 14:13 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
  • 14:11 kormat@cumin1001: START - Cookbook sre.hosts.reimage for host db2093.codfw.wmnet with OS bullseye
  • 14:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2003.codfw.wmnet
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T297189)', diff saved to https://phabricator.wikimedia.org/P23825 and previous config saved to /var/cache/conftool/dbconfig/20220330-140556-marostegui.json
  • 14:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T297189)', diff saved to https://phabricator.wikimedia.org/P23824 and previous config saved to /var/cache/conftool/dbconfig/20220330-140549-marostegui.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23823 and previous config saved to /var/cache/conftool/dbconfig/20220330-140242-ladsgroup.json
  • 14:01 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2003.codfw.wmnet
  • 13:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2002.codfw.wmnet
  • 13:55 kormat: stopping orchestrator for backend move T301315
  • 13:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2002.codfw.wmnet
  • 13:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2001.codfw.wmnet
  • 13:51 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:51 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:51 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:51 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P23822 and previous config saved to /var/cache/conftool/dbconfig/20220330-135044-marostegui.json
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23821 and previous config saved to /var/cache/conftool/dbconfig/20220330-134737-ladsgroup.json
  • 13:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2001.codfw.wmnet
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23820 and previous config saved to /var/cache/conftool/dbconfig/20220330-134010-ladsgroup.json
  • 13:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 13:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23819 and previous config saved to /var/cache/conftool/dbconfig/20220330-134002-ladsgroup.json
  • 13:36 jayme: restarting pybal on lvs1019 and lvs2009
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P23818 and previous config saved to /var/cache/conftool/dbconfig/20220330-133538-marostegui.json
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23817 and previous config saved to /var/cache/conftool/dbconfig/20220330-133436-ladsgroup.json
  • 13:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 13:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23816 and previous config saved to /var/cache/conftool/dbconfig/20220330-133423-ladsgroup.json
  • 13:33 jayme: restarting pybal on lvs1020 and lvs2010
  • 13:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2003.codfw.wmnet
  • 13:30 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2003.codfw.wmnet
  • 13:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2002.codfw.wmnet
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23815 and previous config saved to /var/cache/conftool/dbconfig/20220330-132457-ladsgroup.json
  • 13:22 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2002.codfw.wmnet
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T297189)', diff saved to https://phabricator.wikimedia.org/P23814 and previous config saved to /var/cache/conftool/dbconfig/20220330-132033-marostegui.json
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23813 and previous config saved to /var/cache/conftool/dbconfig/20220330-131918-ladsgroup.json
  • 13:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2001.codfw.wmnet
  • 13:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2001.codfw.wmnet
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23812 and previous config saved to /var/cache/conftool/dbconfig/20220330-130952-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23811 and previous config saved to /var/cache/conftool/dbconfig/20220330-130413-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23810 and previous config saved to /var/cache/conftool/dbconfig/20220330-125447-ladsgroup.json
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23809 and previous config saved to /var/cache/conftool/dbconfig/20220330-125239-ladsgroup.json
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23808 and previous config saved to /var/cache/conftool/dbconfig/20220330-125201-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23807 and previous config saved to /var/cache/conftool/dbconfig/20220330-124908-ladsgroup.json
  • 12:41 Amir1: start of templatelinks backfill on s3 (T299424)
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T298557)', diff saved to https://phabricator.wikimedia.org/P23806 and previous config saved to /var/cache/conftool/dbconfig/20220330-123931-marostegui.json
  • 12:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23805 and previous config saved to /var/cache/conftool/dbconfig/20220330-123656-ladsgroup.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T297189)', diff saved to https://phabricator.wikimedia.org/P23804 and previous config saved to /var/cache/conftool/dbconfig/20220330-123249-marostegui.json
  • 12:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:27 mmandere: pool cp2028 with HAProxy as TLS termination layer - T290005
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23802 and previous config saved to /var/cache/conftool/dbconfig/20220330-122151-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23801 and previous config saved to /var/cache/conftool/dbconfig/20220330-120646-ladsgroup.json
  • 12:05 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2028.codfw.wmnet with OS buster
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23800 and previous config saved to /var/cache/conftool/dbconfig/20220330-120439-ladsgroup.json
  • 12:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23799 and previous config saved to /var/cache/conftool/dbconfig/20220330-120426-ladsgroup.json
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23798 and previous config saved to /var/cache/conftool/dbconfig/20220330-115839-ladsgroup.json
  • 11:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23797 and previous config saved to /var/cache/conftool/dbconfig/20220330-115831-ladsgroup.json
  • 11:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23796 and previous config saved to /var/cache/conftool/dbconfig/20220330-114921-ladsgroup.json
  • 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 6 hosts with reason: Maintenance
  • 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 6 hosts with reason: Maintenance
  • 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 11:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23795 and previous config saved to /var/cache/conftool/dbconfig/20220330-114326-ladsgroup.json
  • 11:43 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23794 and previous config saved to /var/cache/conftool/dbconfig/20220330-113416-ladsgroup.json
  • 11:30 moritzm: updating libapache2-mod-auth-cas on buster hosts
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23793 and previous config saved to /var/cache/conftool/dbconfig/20220330-112821-ladsgroup.json
  • 11:24 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2028.codfw.wmnet with OS buster
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23792 and previous config saved to /var/cache/conftool/dbconfig/20220330-111911-ladsgroup.json
  • 11:19 XioNoX: apply urpf strict filter to eqiad cloud-hosts vlan - T285461
  • 11:15 mmandere: depool cp2028 for reimage - T290005
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23791 and previous config saved to /var/cache/conftool/dbconfig/20220330-111316-ladsgroup.json
  • 11:12 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 11:12 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23790 and previous config saved to /var/cache/conftool/dbconfig/20220330-110701-ladsgroup.json
  • 11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23789 and previous config saved to /var/cache/conftool/dbconfig/20220330-110654-ladsgroup.json
  • 11:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 12 hosts with reason: Maintenance
  • 11:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 12 hosts with reason: Maintenance
  • 11:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 11:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 11:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:59 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1009.eqiad.wmnet
  • 10:52 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1009.eqiad.wmnet
  • 10:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
  • 10:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
  • 10:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 10:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300775)', diff saved to https://phabricator.wikimedia.org/P23788 and previous config saved to /var/cache/conftool/dbconfig/20220330-105210-marostegui.json
  • 10:52 moritzm: installing glibc updates from Bullseye 11.3 point release
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23787 and previous config saved to /var/cache/conftool/dbconfig/20220330-105149-ladsgroup.json
  • 10:40 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1008.eqiad.wmnet
  • 10:38 mmandere: pool cp2030 with HAProxy as TLS termination layer - T290005
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P23786 and previous config saved to /var/cache/conftool/dbconfig/20220330-103705-marostegui.json
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23785 and previous config saved to /var/cache/conftool/dbconfig/20220330-103644-ladsgroup.json
  • 10:34 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1008.eqiad.wmnet
  • 10:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T297189)', diff saved to https://phabricator.wikimedia.org/P23784 and previous config saved to /var/cache/conftool/dbconfig/20220330-102701-marostegui.json
  • 10:26 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2030.codfw.wmnet with OS buster
  • 10:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1007.eqiad.wmnet
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P23783 and previous config saved to /var/cache/conftool/dbconfig/20220330-102200-marostegui.json
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23782 and previous config saved to /var/cache/conftool/dbconfig/20220330-102138-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23781 and previous config saved to /var/cache/conftool/dbconfig/20220330-101931-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23780 and previous config saved to /var/cache/conftool/dbconfig/20220330-101918-ladsgroup.json
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23779 and previous config saved to /var/cache/conftool/dbconfig/20220330-101847-ladsgroup.json
  • 10:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 10:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23778 and previous config saved to /var/cache/conftool/dbconfig/20220330-101839-ladsgroup.json
  • 10:14 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1007.eqiad.wmnet
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P23777 and previous config saved to /var/cache/conftool/dbconfig/20220330-101156-marostegui.json
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300775)', diff saved to https://phabricator.wikimedia.org/P23776 and previous config saved to /var/cache/conftool/dbconfig/20220330-100654-marostegui.json
  • 10:06 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1006.eqiad.wmnet
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23775 and previous config saved to /var/cache/conftool/dbconfig/20220330-100413-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23774 and previous config saved to /var/cache/conftool/dbconfig/20220330-100333-ladsgroup.json
  • 10:03 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2030.codfw.wmnet with reason: host reimage
  • 10:01 XioNoX: cumin1001:~$ sudo cumin 'ganeti[1005-1028].eqiad.wmnet' 'sysctl -w net.ipv6.conf.analytics.accept_ra=0' - T305034
  • 09:59 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2030.codfw.wmnet with reason: host reimage
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P23773 and previous config saved to /var/cache/conftool/dbconfig/20220330-095651-marostegui.json
  • 09:55 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1006.eqiad.wmnet
  • 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1005.eqiad.wmnet
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23772 and previous config saved to /var/cache/conftool/dbconfig/20220330-094908-ladsgroup.json
  • 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1001.wikimedia.org
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23771 and previous config saved to /var/cache/conftool/dbconfig/20220330-094829-ladsgroup.json
  • 09:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp1001.wikimedia.org
  • 09:43 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1005.eqiad.wmnet
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T297189)', diff saved to https://phabricator.wikimedia.org/P23770 and previous config saved to /var/cache/conftool/dbconfig/20220330-094146-marostegui.json
  • 09:40 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2030.codfw.wmnet with OS buster
  • 09:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:35 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1004.eqiad.wmnet
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23769 and previous config saved to /var/cache/conftool/dbconfig/20220330-093403-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23768 and previous config saved to /var/cache/conftool/dbconfig/20220330-093324-ladsgroup.json
  • 09:32 mmandere: depool cp2030 for reimage - T290005
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23767 and previous config saved to /var/cache/conftool/dbconfig/20220330-093156-ladsgroup.json
  • 09:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 09:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23766 and previous config saved to /var/cache/conftool/dbconfig/20220330-093148-ladsgroup.json
  • 09:27 XioNoX: ganeti1025:~$ sudo sysctl -w sysctl net.ipv6.conf.analytics.accept_ra=0 - T305034
  • 09:26 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
  • 09:26 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ores1004.eqiad.wmnet
  • 09:26 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
  • 09:25 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ores1004.eqiad.wmnet
  • 09:25 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
  • 09:25 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ores1004.eqiad.wmnet
  • 09:24 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
  • 09:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2003.wikimedia.org
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23765 and previous config saved to /var/cache/conftool/dbconfig/20220330-091643-ladsgroup.json
  • 09:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2003.wikimedia.org
  • 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1003.wikimedia.org
  • 09:09 mmandere: pool cp2032 with HAProxy as TLS termination layer - T290005
  • 09:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1003.wikimedia.org
  • 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23764 and previous config saved to /var/cache/conftool/dbconfig/20220330-090138-ladsgroup.json
  • 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6001.wikimedia.org
  • 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5001.wikimedia.org
  • 08:53 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2032.codfw.wmnet with OS buster
  • 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6001.wikimedia.org
  • 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5001.wikimedia.org
  • 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23763 and previous config saved to /var/cache/conftool/dbconfig/20220330-085010-ladsgroup.json
  • 08:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23762 and previous config saved to /var/cache/conftool/dbconfig/20220330-085003-ladsgroup.json
  • 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4001.wikimedia.org
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23761 and previous config saved to /var/cache/conftool/dbconfig/20220330-084633-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23760 and previous config saved to /var/cache/conftool/dbconfig/20220330-084425-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23759 and previous config saved to /var/cache/conftool/dbconfig/20220330-084353-ladsgroup.json
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3001.wikimedia.org
  • 08:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4001.wikimedia.org
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T297189)', diff saved to https://phabricator.wikimedia.org/P23758 and previous config saved to /var/cache/conftool/dbconfig/20220330-083826-marostegui.json
  • 08:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3001.wikimedia.org
  • 08:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T297189)', diff saved to https://phabricator.wikimedia.org/P23757 and previous config saved to /var/cache/conftool/dbconfig/20220330-083819-marostegui.json
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23756 and previous config saved to /var/cache/conftool/dbconfig/20220330-083458-ladsgroup.json
  • 08:33 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
  • 08:30 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
  • 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23755 and previous config saved to /var/cache/conftool/dbconfig/20220330-082848-ladsgroup.json
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P23754 and previous config saved to /var/cache/conftool/dbconfig/20220330-082314-marostegui.json
  • 08:20 XioNoX: temporarily apply log only RPF filter on eqiad analytics-a
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23753 and previous config saved to /var/cache/conftool/dbconfig/20220330-081952-ladsgroup.json
  • 08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23752 and previous config saved to /var/cache/conftool/dbconfig/20220330-081343-ladsgroup.json
  • 08:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298557)', diff saved to https://phabricator.wikimedia.org/P23751 and previous config saved to /var/cache/conftool/dbconfig/20220330-081128-marostegui.json
  • 08:11 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2032.codfw.wmnet with OS buster
  • 08:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.5 (duration: 01m 00s)
  • 08:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.5
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P23750 and previous config saved to /var/cache/conftool/dbconfig/20220330-080808-marostegui.json
  • 08:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1004.eqiad.wmnet
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23749 and previous config saved to /var/cache/conftool/dbconfig/20220330-080447-ladsgroup.json
  • 08:03 mmandere: depool cp2032 for reimage - T290005
  • 08:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1003.eqiad.wmnet
  • 08:02 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1004.eqiad.wmnet
  • 08:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1003.eqiad.wmnet
  • 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23748 and previous config saved to /var/cache/conftool/dbconfig/20220330-075838-ladsgroup.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P23747 and previous config saved to /var/cache/conftool/dbconfig/20220330-075623-marostegui.json
  • 07:55 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1003.eqiad.wmnet
  • 07:54 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1003.eqiad.wmnet
  • 07:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2004.codfw.wmnet
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T297189)', diff saved to https://phabricator.wikimedia.org/P23746 and previous config saved to /var/cache/conftool/dbconfig/20220330-075303-marostegui.json
  • 07:50 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2004.codfw.wmnet
  • 07:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1002.eqiad.wmnet
  • 07:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2003.codfw.wmnet
  • 07:44 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2003.codfw.wmnet
  • 07:42 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1002.eqiad.wmnet
  • 07:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P23745 and previous config saved to /var/cache/conftool/dbconfig/20220330-074118-marostegui.json
  • 07:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet
  • 07:33 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
  • 07:33 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
  • 07:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
  • 07:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet
  • 07:31 moritzm: updating libapache2-mod-auth-cas on bullseye hosts
  • 07:27 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
  • 07:26 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1002.eqiad.wmnet
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298557)', diff saved to https://phabricator.wikimedia.org/P23744 and previous config saved to /var/cache/conftool/dbconfig/20220330-072613-marostegui.json
  • 07:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23743 and previous config saved to /var/cache/conftool/dbconfig/20220330-072045-ladsgroup.json
  • 07:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23742 and previous config saved to /var/cache/conftool/dbconfig/20220330-072037-ladsgroup.json
  • 07:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T297189)', diff saved to https://phabricator.wikimedia.org/P23741 and previous config saved to /var/cache/conftool/dbconfig/20220330-071650-marostegui.json
  • 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
  • 07:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2009.codfw.wmnet
  • 07:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2009.codfw.wmnet
  • 07:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2008.codfw.wmnet
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:08 taavi: UTC morning deploys done
  • 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:08 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Realtime Preview on testwiki (T302506) (duration: 00m 56s)
  • 07:06 elukey: restart rsyslog on ml-serve1002
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23740 and previous config saved to /var/cache/conftool/dbconfig/20220330-070604-root.json
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23739 and previous config saved to /var/cache/conftool/dbconfig/20220330-070532-ladsgroup.json
  • 07:03 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2008.codfw.wmnet
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23738 and previous config saved to /var/cache/conftool/dbconfig/20220330-065822-ladsgroup.json
  • 06:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23737 and previous config saved to /var/cache/conftool/dbconfig/20220330-065814-ladsgroup.json
  • 06:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2007.codfw.wmnet
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23736 and previous config saved to /var/cache/conftool/dbconfig/20220330-065100-root.json
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23735 and previous config saved to /var/cache/conftool/dbconfig/20220330-065027-ladsgroup.json
  • 06:49 jayme: updated scap to 4.5.0 on all hosts - T304134
  • 06:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2007.codfw.wmnet
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23734 and previous config saved to /var/cache/conftool/dbconfig/20220330-064309-ladsgroup.json
  • 06:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23733 and previous config saved to /var/cache/conftool/dbconfig/20220330-064037-root.json
  • 06:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2006.codfw.wmnet
  • 06:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23732 and previous config saved to /var/cache/conftool/dbconfig/20220330-063556-root.json
  • 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23731 and previous config saved to /var/cache/conftool/dbconfig/20220330-063522-ladsgroup.json
  • 06:35 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 06:34 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ml-serve1001.eqiad.wmnet
  • 06:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 06:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2006.codfw.wmnet
  • 06:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2005.codfw.wmnet
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23730 and previous config saved to /var/cache/conftool/dbconfig/20220330-062804-ladsgroup.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23729 and previous config saved to /var/cache/conftool/dbconfig/20220330-062533-root.json
  • 06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23728 and previous config saved to /var/cache/conftool/dbconfig/20220330-062203-ladsgroup.json
  • 06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23727 and previous config saved to /var/cache/conftool/dbconfig/20220330-062155-ladsgroup.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23726 and previous config saved to /var/cache/conftool/dbconfig/20220330-062052-root.json
  • 06:20 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2005.codfw.wmnet
  • 06:15 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
  • 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23725 and previous config saved to /var/cache/conftool/dbconfig/20220330-061259-ladsgroup.json
  • 06:11 elukey: restart rsyslogd on ml-serve1001
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23724 and previous config saved to /var/cache/conftool/dbconfig/20220330-061051-ladsgroup.json
  • 06:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23723 and previous config saved to /var/cache/conftool/dbconfig/20220330-061042-ladsgroup.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23722 and previous config saved to /var/cache/conftool/dbconfig/20220330-061029-root.json
  • 06:07 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23721 and previous config saved to /var/cache/conftool/dbconfig/20220330-060650-ladsgroup.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23720 and previous config saved to /var/cache/conftool/dbconfig/20220330-060548-root.json
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23719 and previous config saved to /var/cache/conftool/dbconfig/20220330-055537-ladsgroup.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23718 and previous config saved to /var/cache/conftool/dbconfig/20220330-055525-root.json
  • 05:51 marostegui: dbmaint s6@eqiad T297189
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23717 and previous config saved to /var/cache/conftool/dbconfig/20220330-055145-ladsgroup.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P23716 and previous config saved to /var/cache/conftool/dbconfig/20220330-055045-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T298557)', diff saved to https://phabricator.wikimedia.org/P23715 and previous config saved to /var/cache/conftool/dbconfig/20220330-054548-marostegui.json
  • 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23714 and previous config saved to /var/cache/conftool/dbconfig/20220330-054032-ladsgroup.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23713 and previous config saved to /var/cache/conftool/dbconfig/20220330-054021-root.json
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P23712 and previous config saved to /var/cache/conftool/dbconfig/20220330-053745-root.json
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23711 and previous config saved to /var/cache/conftool/dbconfig/20220330-053640-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23710 and previous config saved to /var/cache/conftool/dbconfig/20220330-052525-ladsgroup.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23709 and previous config saved to /var/cache/conftool/dbconfig/20220330-052516-root.json
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23708 and previous config saved to /var/cache/conftool/dbconfig/20220330-052344-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23707 and previous config saved to /var/cache/conftool/dbconfig/20220330-052320-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23706 and previous config saved to /var/cache/conftool/dbconfig/20220330-052312-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23705 and previous config saved to /var/cache/conftool/dbconfig/20220330-052259-ladsgroup.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P23704 and previous config saved to /var/cache/conftool/dbconfig/20220330-052241-root.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for reboot', diff saved to https://phabricator.wikimedia.org/P23703 and previous config saved to /var/cache/conftool/dbconfig/20220330-051524-root.json
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23702 and previous config saved to /var/cache/conftool/dbconfig/20220330-051012-root.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23701 and previous config saved to /var/cache/conftool/dbconfig/20220330-050808-ladsgroup.json
  • 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23700 and previous config saved to /var/cache/conftool/dbconfig/20220330-050754-ladsgroup.json
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 for downgrade', diff saved to https://phabricator.wikimedia.org/P23699 and previous config saved to /var/cache/conftool/dbconfig/20220330-050406-root.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T297189)', diff saved to https://phabricator.wikimedia.org/P23698 and previous config saved to /var/cache/conftool/dbconfig/20220330-045747-marostegui.json
  • 04:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 04:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23697 and previous config saved to /var/cache/conftool/dbconfig/20220330-045303-ladsgroup.json
  • 04:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23696 and previous config saved to /var/cache/conftool/dbconfig/20220330-045249-ladsgroup.json
  • 04:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23695 and previous config saved to /var/cache/conftool/dbconfig/20220330-043758-ladsgroup.json
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23694 and previous config saved to /var/cache/conftool/dbconfig/20220330-043744-ladsgroup.json
  • 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23693 and previous config saved to /var/cache/conftool/dbconfig/20220330-043536-ladsgroup.json
  • 04:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 04:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23692 and previous config saved to /var/cache/conftool/dbconfig/20220330-043528-ladsgroup.json
  • 04:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23691 and previous config saved to /var/cache/conftool/dbconfig/20220330-042443-ladsgroup.json
  • 04:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23690 and previous config saved to /var/cache/conftool/dbconfig/20220330-042435-ladsgroup.json
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23689 and previous config saved to /var/cache/conftool/dbconfig/20220330-042023-ladsgroup.json
  • 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23688 and previous config saved to /var/cache/conftool/dbconfig/20220330-040930-ladsgroup.json
  • 04:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23687 and previous config saved to /var/cache/conftool/dbconfig/20220330-040518-ladsgroup.json
  • 03:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23686 and previous config saved to /var/cache/conftool/dbconfig/20220330-035425-ladsgroup.json
  • 03:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23685 and previous config saved to /var/cache/conftool/dbconfig/20220330-035013-ladsgroup.json
  • 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23684 and previous config saved to /var/cache/conftool/dbconfig/20220330-033920-ladsgroup.json
  • 03:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23683 and previous config saved to /var/cache/conftool/dbconfig/20220330-032617-ladsgroup.json
  • 03:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 03:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 03:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23682 and previous config saved to /var/cache/conftool/dbconfig/20220330-032610-ladsgroup.json
  • 03:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23681 and previous config saved to /var/cache/conftool/dbconfig/20220330-032201-ladsgroup.json
  • 03:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 03:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23680 and previous config saved to /var/cache/conftool/dbconfig/20220330-032154-ladsgroup.json
  • 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23679 and previous config saved to /var/cache/conftool/dbconfig/20220330-031105-ladsgroup.json
  • 03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23678 and previous config saved to /var/cache/conftool/dbconfig/20220330-030649-ladsgroup.json
  • 02:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23677 and previous config saved to /var/cache/conftool/dbconfig/20220330-025600-ladsgroup.json
  • 02:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20220330-025139-ladsgroup.json
  • 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23676 and previous config saved to /var/cache/conftool/dbconfig/20220330-024055-ladsgroup.json
  • 02:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23675 and previous config saved to /var/cache/conftool/dbconfig/20220330-023634-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23674 and previous config saved to /var/cache/conftool/dbconfig/20220330-023426-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 02:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 02:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 02:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 02:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23673 and previous config saved to /var/cache/conftool/dbconfig/20220330-023344-ladsgroup.json
  • 02:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23672 and previous config saved to /var/cache/conftool/dbconfig/20220330-021839-ladsgroup.json
  • 02:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T300775)', diff saved to https://phabricator.wikimedia.org/P23671 and previous config saved to /var/cache/conftool/dbconfig/20220330-021111-marostegui.json
  • 02:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 02:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 02:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 02:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 02:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300775)', diff saved to https://phabricator.wikimedia.org/P23670 and previous config saved to /var/cache/conftool/dbconfig/20220330-021058-marostegui.json
  • 02:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23669 and previous config saved to /var/cache/conftool/dbconfig/20220330-020334-ladsgroup.json
  • 01:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P23668 and previous config saved to /var/cache/conftool/dbconfig/20220330-015552-marostegui.json
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23667 and previous config saved to /var/cache/conftool/dbconfig/20220330-015527-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23666 and previous config saved to /var/cache/conftool/dbconfig/20220330-015519-ladsgroup.json
  • 01:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23665 and previous config saved to /var/cache/conftool/dbconfig/20220330-014829-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23664 and previous config saved to /var/cache/conftool/dbconfig/20220330-014621-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 01:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23663 and previous config saved to /var/cache/conftool/dbconfig/20220330-014549-ladsgroup.json
  • 01:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P23662 and previous config saved to /var/cache/conftool/dbconfig/20220330-014047-marostegui.json
  • 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23661 and previous config saved to /var/cache/conftool/dbconfig/20220330-014014-ladsgroup.json
  • 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23660 and previous config saved to /var/cache/conftool/dbconfig/20220330-013044-ladsgroup.json
  • 01:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300775)', diff saved to https://phabricator.wikimedia.org/P23659 and previous config saved to /var/cache/conftool/dbconfig/20220330-012542-marostegui.json
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23658 and previous config saved to /var/cache/conftool/dbconfig/20220330-012509-ladsgroup.json
  • 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23657 and previous config saved to /var/cache/conftool/dbconfig/20220330-011539-ladsgroup.json
  • 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23656 and previous config saved to /var/cache/conftool/dbconfig/20220330-011004-ladsgroup.json
  • 01:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23655 and previous config saved to /var/cache/conftool/dbconfig/20220330-010034-ladsgroup.json
  • 00:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23654 and previous config saved to /var/cache/conftool/dbconfig/20220330-002523-ladsgroup.json
  • 00:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 00:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23653 and previous config saved to /var/cache/conftool/dbconfig/20220330-002515-ladsgroup.json
  • 00:24 catrope@deploy1002: Finished scap: Update Kashmiri namespace names (T304790) (duration: 12m 29s)
  • 00:12 catrope@deploy1002: Started scap: Update Kashmiri namespace names (T304790)
  • 00:12 catrope@deploy1002: scap failed: RuntimeError Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back. (duration: 00m 28s)
  • 00:11 catrope@deploy1002: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back.
  • 00:11 catrope@deploy1002: Started scap: Update Kashmiri namespace names (T304790)
  • 00:10 catrope@deploy1002: scap failed: RuntimeError Scap failed!: 8/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back. (duration: 00m 28s)
  • 00:10 catrope@deploy1002: Scap failed!: 8/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back.
  • 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23652 and previous config saved to /var/cache/conftool/dbconfig/20220330-001010-ladsgroup.json
  • 00:09 catrope@deploy1002: Started scap: Update Kashmiri namespace names (T304790)
  • 00:07 catrope@deploy1002: scap failed: RuntimeError Scap failed!: 6/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back. (duration: 04m 32s)
  • 00:07 catrope@deploy1002: Scap failed!: 6/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back.
  • 00:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:02 catrope@deploy1002: Started scap: Update Kashmiri namespace names (T304790)
  • 00:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23651 and previous config saved to /var/cache/conftool/dbconfig/20220330-000019-ladsgroup.json
  • 00:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23650 and previous config saved to /var/cache/conftool/dbconfig/20220330-000011-ladsgroup.json

2022-03-29

  • 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23649 and previous config saved to /var/cache/conftool/dbconfig/20220329-235505-ladsgroup.json
  • 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23648 and previous config saved to /var/cache/conftool/dbconfig/20220329-234506-ladsgroup.json
  • 23:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23647 and previous config saved to /var/cache/conftool/dbconfig/20220329-234000-ladsgroup.json
  • 23:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23646 and previous config saved to /var/cache/conftool/dbconfig/20220329-233001-ladsgroup.json
  • 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23645 and previous config saved to /var/cache/conftool/dbconfig/20220329-231456-ladsgroup.json
  • 23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23644 and previous config saved to /var/cache/conftool/dbconfig/20220329-231248-ladsgroup.json
  • 23:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23643 and previous config saved to /var/cache/conftool/dbconfig/20220329-231205-ladsgroup.json
  • 22:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23642 and previous config saved to /var/cache/conftool/dbconfig/20220329-225700-ladsgroup.json
  • 22:50 mutante: cumin1001 - systemctl start httpbb_hourly_appserver fixed Icinga alert after gerrit:774981 T205361
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23641 and previous config saved to /var/cache/conftool/dbconfig/20220329-224652-ladsgroup.json
  • 22:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 22:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23640 and previous config saved to /var/cache/conftool/dbconfig/20220329-224644-ladsgroup.json
  • 22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23639 and previous config saved to /var/cache/conftool/dbconfig/20220329-224155-ladsgroup.json
  • 22:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 22:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 22:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 22:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 22:38 mutante: mwdebug2001 - rebooting
  • 22:36 mutante: mwdebug2002 - rebooting
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23638 and previous config saved to /var/cache/conftool/dbconfig/20220329-223139-ladsgroup.json
  • 22:31 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 22:30 mutante: moscovium (rt.wikimedia.org) - rebooting
  • 22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23637 and previous config saved to /var/cache/conftool/dbconfig/20220329-222650-ladsgroup.json
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23636 and previous config saved to /var/cache/conftool/dbconfig/20220329-222141-ladsgroup.json
  • 22:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23635 and previous config saved to /var/cache/conftool/dbconfig/20220329-222128-ladsgroup.json
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23634 and previous config saved to /var/cache/conftool/dbconfig/20220329-221634-ladsgroup.json
  • 22:14 mutante: doc1001 - rebooting (doc.wikimedia.org)
  • 22:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23633 and previous config saved to /var/cache/conftool/dbconfig/20220329-220623-ladsgroup.json
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23632 and previous config saved to /var/cache/conftool/dbconfig/20220329-220128-ladsgroup.json
  • 21:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:54 mutante: cumin1001 systemctl start httpbb_hourly_appserver
  • 21:54 mutante: cumin1001 systemctl status httpbb_hourly_appserver
  • 21:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23631 and previous config saved to /var/cache/conftool/dbconfig/20220329-215118-ladsgroup.json
  • 21:48 mutante: doc1002 - rebooting
  • 21:46 mutante: doc2001 - rebooting
  • 21:38 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23630 and previous config saved to /var/cache/conftool/dbconfig/20220329-213613-ladsgroup.json
  • 21:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 21:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 21:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23629 and previous config saved to /var/cache/conftool/dbconfig/20220329-212804-ladsgroup.json
  • 21:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 21:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 21:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23628 and previous config saved to /var/cache/conftool/dbconfig/20220329-212756-ladsgroup.json
  • 21:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:22 mutante: aphlict1001 - manually starting aphlict service after reboot (was needed for some reason)
  • 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs2007.codfw.wmnet
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:17 mutante: aphlict1001 - rebooting - this will temp break Phabricator realtime notifications but will be back shortly
  • 21:17 mutante: planet1002 - rebooting
  • 21:14 mutante: planet2002 - rebooting
  • 21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23627 and previous config saved to /var/cache/conftool/dbconfig/20220329-211251-ladsgroup.json
  • 21:10 mutante: phab1004 - rebooting
  • 21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23626 and previous config saved to /var/cache/conftool/dbconfig/20220329-210856-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23625 and previous config saved to /var/cache/conftool/dbconfig/20220329-210848-ladsgroup.json
  • 21:05 mutante: phab2002 - rebooting
  • 21:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 20:59 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix I7ce58529cdd320a9500dc215291ef1c369cee9d3: Rearranging restriction levels and add editautopatrolprotected for eliminators. (T303579) (duration: 00m 56s)
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23624 and previous config saved to /var/cache/conftool/dbconfig/20220329-205746-ladsgroup.json
  • 20:57 catrope@deploy1002: Synchronized php-1.39.0-wmf.5/skins/Vector/skin.json: Backport: Restore the classes skin-vector and skin-vector-search-vue to body (duration: 00m 55s)
  • 20:54 catrope@deploy1002: Synchronized php-1.39.0-wmf.4/skins/Vector: Backport: Revert: End migration mode (duration: 00m 53s)
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23623 and previous config saved to /var/cache/conftool/dbconfig/20220329-205343-ladsgroup.json
  • 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:50 catrope@deploy1002: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back.
  • 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23622 and previous config saved to /var/cache/conftool/dbconfig/20220329-204241-ladsgroup.json
  • 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23621 and previous config saved to /var/cache/conftool/dbconfig/20220329-204034-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 20:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 20:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23620 and previous config saved to /var/cache/conftool/dbconfig/20220329-204021-ladsgroup.json
  • 20:39 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add wikimedia.com to wgNoFollowDomainExceptions (T304555) (duration: 01m 06s)
  • 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23619 and previous config saved to /var/cache/conftool/dbconfig/20220329-203838-ladsgroup.json
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:36 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [config]: Deploy gdi-safety-survey to ES,EN,FR and PT wikis (duration: 00m 56s)
  • 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:30 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Config for new android schemas (duration: 01m 00s)
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23618 and previous config saved to /var/cache/conftool/dbconfig/20220329-202516-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23617 and previous config saved to /var/cache/conftool/dbconfig/20220329-202333-ladsgroup.json
  • 20:20 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 20:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 20:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T297189)', diff saved to https://phabricator.wikimedia.org/P23616 and previous config saved to /var/cache/conftool/dbconfig/20220329-201611-marostegui.json
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23615 and previous config saved to /var/cache/conftool/dbconfig/20220329-201041-ladsgroup.json
  • 20:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 20:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23614 and previous config saved to /var/cache/conftool/dbconfig/20220329-201011-ladsgroup.json
  • 20:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P23613 and previous config saved to /var/cache/conftool/dbconfig/20220329-200106-marostegui.json
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23612 and previous config saved to /var/cache/conftool/dbconfig/20220329-195505-ladsgroup.json
  • 19:48 eileen: civicrm revision changed from 1c5d10e1 to 951ffb1d
  • 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P23611 and previous config saved to /var/cache/conftool/dbconfig/20220329-194601-marostegui.json
  • 19:43 mforns@deploy1002: Finished deploy [analytics/refinery@8e9f97c] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8e9f97c] (duration: 07m 17s)
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23610 and previous config saved to /var/cache/conftool/dbconfig/20220329-194256-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23609 and previous config saved to /var/cache/conftool/dbconfig/20220329-194248-ladsgroup.json
  • 19:40 moritzm: uploaded cachelib 0.4.1-2~wmf1 to bullseye-wikimedia T301638
  • 19:35 mforns@deploy1002: Started deploy [analytics/refinery@8e9f97c] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8e9f97c]
  • 19:35 mforns@deploy1002: Finished deploy [analytics/refinery@8e9f97c] (thin): Regular analytics weekly train THIN [analytics/refinery@8e9f97c] (duration: 00m 08s)
  • 19:35 mforns@deploy1002: Started deploy [analytics/refinery@8e9f97c] (thin): Regular analytics weekly train THIN [analytics/refinery@8e9f97c]
  • 19:35 mforns@deploy1002: Finished deploy [analytics/refinery@8e9f97c]: Regular analytics weekly train [analytics/refinery@8e9f97c] (duration: 21m 13s)
  • 19:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T297189)', diff saved to https://phabricator.wikimedia.org/P23608 and previous config saved to /var/cache/conftool/dbconfig/20220329-193055-marostegui.json
  • 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 19:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 19:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23607 and previous config saved to /var/cache/conftool/dbconfig/20220329-192743-ladsgroup.json
  • 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T300775)', diff saved to https://phabricator.wikimedia.org/P23606 and previous config saved to /var/cache/conftool/dbconfig/20220329-191738-marostegui.json
  • 19:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 19:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300775)', diff saved to https://phabricator.wikimedia.org/P23605 and previous config saved to /var/cache/conftool/dbconfig/20220329-191731-marostegui.json
  • 19:14 mforns@deploy1002: Started deploy [analytics/refinery@8e9f97c]: Regular analytics weekly train [analytics/refinery@8e9f97c]
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23604 and previous config saved to /var/cache/conftool/dbconfig/20220329-191238-ladsgroup.json
  • 19:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2001.wikimedia.org
  • 19:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P23603 and previous config saved to /var/cache/conftool/dbconfig/20220329-190226-marostegui.json
  • 19:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
  • 18:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
  • 18:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23602 and previous config saved to /var/cache/conftool/dbconfig/20220329-185733-ladsgroup.json
  • 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23601 and previous config saved to /var/cache/conftool/dbconfig/20220329-185526-ladsgroup.json
  • 18:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23600 and previous config saved to /var/cache/conftool/dbconfig/20220329-185454-ladsgroup.json
  • 18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P23599 and previous config saved to /var/cache/conftool/dbconfig/20220329-184720-marostegui.json
  • 18:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23598 and previous config saved to /var/cache/conftool/dbconfig/20220329-183949-ladsgroup.json
  • 18:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300775)', diff saved to https://phabricator.wikimedia.org/P23597 and previous config saved to /var/cache/conftool/dbconfig/20220329-183215-marostegui.json
  • 18:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T297189)', diff saved to https://phabricator.wikimedia.org/P23596 and previous config saved to /var/cache/conftool/dbconfig/20220329-183041-marostegui.json
  • 18:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 18:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 18:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P23595 and previous config saved to /var/cache/conftool/dbconfig/20220329-183034-marostegui.json
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23594 and previous config saved to /var/cache/conftool/dbconfig/20220329-182444-ladsgroup.json
  • 18:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P23593 and previous config saved to /var/cache/conftool/dbconfig/20220329-181529-marostegui.json
  • 18:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23592 and previous config saved to /var/cache/conftool/dbconfig/20220329-180938-ladsgroup.json
  • 18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:02 moritzm: restarting fpm on mw canaries to pick up new libtiff
  • 18:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P23591 and previous config saved to /var/cache/conftool/dbconfig/20220329-180023-marostegui.json
  • 17:47 moritzm: installing tiff security updates
  • 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P23590 and previous config saved to /var/cache/conftool/dbconfig/20220329-174518-marostegui.json
  • 17:29 mutante: gitlab2001 - systemctl reset-failed
  • 17:23 mutante: gitlab2001 - did not come back from reboot via cookbook. logged in via console. then "s/ens5/ens13" in /etc/network/interfaces ; reboot ; issue was like T272555 and others
  • 17:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:13 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Testing deploy of superset 1.4.2 to staging
  • 17:13 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Testing deploy of superset 1.4.2 to staging
  • 17:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2003.codfw.wmnet
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23589 and previous config saved to /var/cache/conftool/dbconfig/20220329-170924-ladsgroup.json
  • 17:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 17:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23588 and previous config saved to /var/cache/conftool/dbconfig/20220329-170916-ladsgroup.json
  • 17:04 klausman@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-staging2001.codfw.wmnet
  • 17:04 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
  • 17:03 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2003.codfw.wmnet
  • 17:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2002.codfw.wmnet
  • 17:00 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 17:00 aokoth@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM gitlab2001.wikimedia.org
  • 16:55 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2002.codfw.wmnet
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23586 and previous config saved to /var/cache/conftool/dbconfig/20220329-165411-ladsgroup.json
  • 16:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2001.codfw.wmnet
  • 16:45 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2001.codfw.wmnet
  • 16:39 hashar@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 16:39 hashar@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23584 and previous config saved to /var/cache/conftool/dbconfig/20220329-163906-ladsgroup.json
  • 16:39 hashar@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 16:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:38 hashar@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 16:38 hashar@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 16:37 hashar@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 16:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P23583 and previous config saved to /var/cache/conftool/dbconfig/20220329-163503-marostegui.json
  • 16:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 16:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 16:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T297189)', diff saved to https://phabricator.wikimedia.org/P23582 and previous config saved to /var/cache/conftool/dbconfig/20220329-163455-marostegui.json
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23581 and previous config saved to /var/cache/conftool/dbconfig/20220329-162401-ladsgroup.json
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23580 and previous config saved to /var/cache/conftool/dbconfig/20220329-162153-ladsgroup.json
  • 16:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23579 and previous config saved to /var/cache/conftool/dbconfig/20220329-162146-ladsgroup.json
  • 16:21 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 16:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P23578 and previous config saved to /var/cache/conftool/dbconfig/20220329-161950-marostegui.json
  • 16:19 aokoth@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab2001.wikimedia.org
  • 16:17 aokoth@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM gitlab2001.wikimedia.org
  • 16:17 aokoth@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab2001.wikimedia.org
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23577 and previous config saved to /var/cache/conftool/dbconfig/20220329-160640-ladsgroup.json
  • 16:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P23576 and previous config saved to /var/cache/conftool/dbconfig/20220329-160446-marostegui.json
  • 15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23575 and previous config saved to /var/cache/conftool/dbconfig/20220329-155415-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23574 and previous config saved to /var/cache/conftool/dbconfig/20220329-155135-ladsgroup.json
  • 15:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T297189)', diff saved to https://phabricator.wikimedia.org/P23573 and previous config saved to /var/cache/conftool/dbconfig/20220329-154941-marostegui.json
  • 15:47 jayme@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 18s)
  • 15:47 jayme@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 15:46 jayme: updated scap to 4.5.0 on canary hosts - T304134
  • 15:43 jayme: imported scap 4.5.0 to strets-/buster-/bullseye-wikimedia - T304134
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23572 and previous config saved to /var/cache/conftool/dbconfig/20220329-153910-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23571 and previous config saved to /var/cache/conftool/dbconfig/20220329-153630-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23570 and previous config saved to /var/cache/conftool/dbconfig/20220329-153423-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23569 and previous config saved to /var/cache/conftool/dbconfig/20220329-153410-ladsgroup.json
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23568 and previous config saved to /var/cache/conftool/dbconfig/20220329-152405-ladsgroup.json
  • 15:22 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 15:20 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 15:20 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 15:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 15:19 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23567 and previous config saved to /var/cache/conftool/dbconfig/20220329-151905-ladsgroup.json
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23566 and previous config saved to /var/cache/conftool/dbconfig/20220329-150900-ladsgroup.json
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23565 and previous config saved to /var/cache/conftool/dbconfig/20220329-150359-ladsgroup.json
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T297189)', diff saved to https://phabricator.wikimedia.org/P23564 and previous config saved to /var/cache/conftool/dbconfig/20220329-150253-marostegui.json
  • 15:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T297189)', diff saved to https://phabricator.wikimedia.org/P23563 and previous config saved to /var/cache/conftool/dbconfig/20220329-150245-marostegui.json
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23562 and previous config saved to /var/cache/conftool/dbconfig/20220329-144854-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23561 and previous config saved to /var/cache/conftool/dbconfig/20220329-144848-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23560 and previous config saved to /var/cache/conftool/dbconfig/20220329-144835-ladsgroup.json
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23559 and previous config saved to /var/cache/conftool/dbconfig/20220329-144747-ladsgroup.json
  • 14:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P23558 and previous config saved to /var/cache/conftool/dbconfig/20220329-144740-marostegui.json
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23557 and previous config saved to /var/cache/conftool/dbconfig/20220329-144739-ladsgroup.json
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23556 and previous config saved to /var/cache/conftool/dbconfig/20220329-143330-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23555 and previous config saved to /var/cache/conftool/dbconfig/20220329-143234-ladsgroup.json
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23554 and previous config saved to /var/cache/conftool/dbconfig/20220329-141825-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23553 and previous config saved to /var/cache/conftool/dbconfig/20220329-141729-ladsgroup.json
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23552 and previous config saved to /var/cache/conftool/dbconfig/20220329-140320-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23551 and previous config saved to /var/cache/conftool/dbconfig/20220329-140224-ladsgroup.json
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23550 and previous config saved to /var/cache/conftool/dbconfig/20220329-140017-ladsgroup.json
  • 14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23549 and previous config saved to /var/cache/conftool/dbconfig/20220329-140009-ladsgroup.json
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23548 and previous config saved to /var/cache/conftool/dbconfig/20220329-134504-ladsgroup.json
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23547 and previous config saved to /var/cache/conftool/dbconfig/20220329-132959-ladsgroup.json
  • 13:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set write both for all wikis except s1 and s4 (T299421) (duration: 00m 55s)
  • 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:19 urbanecm: UTC afternoon B&C window done
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: d632476: 64226d7: Set IPInfo config for path to MaxMind files (T304604) (duration: 00m 54s)
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23546 and previous config saved to /var/cache/conftool/dbconfig/20220329-131453-ladsgroup.json
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T297189)', diff saved to https://phabricator.wikimedia.org/P23545 and previous config saved to /var/cache/conftool/dbconfig/20220329-131251-marostegui.json
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23544 and previous config saved to /var/cache/conftool/dbconfig/20220329-131246-ladsgroup.json
  • 13:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T297189)', diff saved to https://phabricator.wikimedia.org/P23543 and previous config saved to /var/cache/conftool/dbconfig/20220329-131238-marostegui.json
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23542 and previous config saved to /var/cache/conftool/dbconfig/20220329-131159-ladsgroup.json
  • 13:10 XioNoX: roolback: temporarily apply urpf with action: log only, on cr1-eqiad:xe-3/0/4.1118
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23541 and previous config saved to /var/cache/conftool/dbconfig/20220329-130741-ladsgroup.json
  • 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23540 and previous config saved to /var/cache/conftool/dbconfig/20220329-130733-ladsgroup.json
  • 13:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P23539 and previous config saved to /var/cache/conftool/dbconfig/20220329-125733-marostegui.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23538 and previous config saved to /var/cache/conftool/dbconfig/20220329-125654-ladsgroup.json
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23537 and previous config saved to /var/cache/conftool/dbconfig/20220329-125228-ladsgroup.json
  • 12:51 XioNoX: temporarily apply urpf with action: log only, on cr1-eqiad:xe-3/0/4.1118
  • 12:44 mmandere: pool cp2034 with HAProxy as TLS termination layer - T290005
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P23536 and previous config saved to /var/cache/conftool/dbconfig/20220329-124227-marostegui.json
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23535 and previous config saved to /var/cache/conftool/dbconfig/20220329-124148-ladsgroup.json
  • 12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23534 and previous config saved to /var/cache/conftool/dbconfig/20220329-123723-ladsgroup.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T297189)', diff saved to https://phabricator.wikimedia.org/P23533 and previous config saved to /var/cache/conftool/dbconfig/20220329-122722-marostegui.json
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23532 and previous config saved to /var/cache/conftool/dbconfig/20220329-122643-ladsgroup.json
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23531 and previous config saved to /var/cache/conftool/dbconfig/20220329-122436-ladsgroup.json
  • 12:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23530 and previous config saved to /var/cache/conftool/dbconfig/20220329-122404-ladsgroup.json
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23529 and previous config saved to /var/cache/conftool/dbconfig/20220329-122218-ladsgroup.json
  • 12:17 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2034.codfw.wmnet with OS buster
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T300775)', diff saved to https://phabricator.wikimedia.org/P23528 and previous config saved to /var/cache/conftool/dbconfig/20220329-121248-marostegui.json
  • 12:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300775)', diff saved to https://phabricator.wikimedia.org/P23527 and previous config saved to /var/cache/conftool/dbconfig/20220329-121240-marostegui.json
  • 12:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23526 and previous config saved to /var/cache/conftool/dbconfig/20220329-120859-ladsgroup.json
  • 12:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:02 hashar@deploy1002: Synchronized php-1.39.0-wmf.5/skins/Timeless/includes/TimelessTemplate.php: Use null coalescing operator - T304917 (duration: 06m 50s)
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P23525 and previous config saved to /var/cache/conftool/dbconfig/20220329-115735-marostegui.json
  • 11:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:56 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23524 and previous config saved to /var/cache/conftool/dbconfig/20220329-115354-ladsgroup.json
  • 11:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P23523 and previous config saved to /var/cache/conftool/dbconfig/20220329-114230-marostegui.json
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23522 and previous config saved to /var/cache/conftool/dbconfig/20220329-113849-ladsgroup.json
  • 11:33 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2034.codfw.wmnet with OS buster
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P23521 and previous config saved to /var/cache/conftool/dbconfig/20220329-112958-root.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300775)', diff saved to https://phabricator.wikimedia.org/P23520 and previous config saved to /var/cache/conftool/dbconfig/20220329-112725-marostegui.json
  • 11:25 mmandere: depool cp2034 for reimage - T290005
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23519 and previous config saved to /var/cache/conftool/dbconfig/20220329-112109-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23518 and previous config saved to /var/cache/conftool/dbconfig/20220329-112101-ladsgroup.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P23517 and previous config saved to /var/cache/conftool/dbconfig/20220329-111454-root.json
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23516 and previous config saved to /var/cache/conftool/dbconfig/20220329-110555-ladsgroup.json
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T297189)', diff saved to https://phabricator.wikimedia.org/P23515 and previous config saved to /var/cache/conftool/dbconfig/20220329-110024-marostegui.json
  • 11:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P23514 and previous config saved to /var/cache/conftool/dbconfig/20220329-110016-marostegui.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P23513 and previous config saved to /var/cache/conftool/dbconfig/20220329-105950-root.json
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23512 and previous config saved to /var/cache/conftool/dbconfig/20220329-105050-ladsgroup.json
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P23511 and previous config saved to /var/cache/conftool/dbconfig/20220329-104511-marostegui.json
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P23510 and previous config saved to /var/cache/conftool/dbconfig/20220329-104446-root.json
  • 10:43 mmandere: pool cp2027 with HAProxy as TLS termination layer - T290005
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23509 and previous config saved to /var/cache/conftool/dbconfig/20220329-103834-ladsgroup.json
  • 10:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23508 and previous config saved to /var/cache/conftool/dbconfig/20220329-103826-ladsgroup.json
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23507 and previous config saved to /var/cache/conftool/dbconfig/20220329-103544-ladsgroup.json
  • 10:35 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2027.codfw.wmnet with OS buster
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P23506 and previous config saved to /var/cache/conftool/dbconfig/20220329-103006-marostegui.json
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P23505 and previous config saved to /var/cache/conftool/dbconfig/20220329-102942-root.json
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23504 and previous config saved to /var/cache/conftool/dbconfig/20220329-102321-ladsgroup.json
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P23503 and previous config saved to /var/cache/conftool/dbconfig/20220329-101501-marostegui.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P23502 and previous config saved to /var/cache/conftool/dbconfig/20220329-101439-root.json
  • 10:13 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
  • 10:10 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23501 and previous config saved to /var/cache/conftool/dbconfig/20220329-100821-root.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23500 and previous config saved to /var/cache/conftool/dbconfig/20220329-100816-ladsgroup.json
  • 10:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 10:02 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 10:02 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 1%: After reimage', diff saved to https://phabricator.wikimedia.org/P23499 and previous config saved to /var/cache/conftool/dbconfig/20220329-095935-root.json
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1157.eqiad.wmnet with OS bullseye
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23498 and previous config saved to /var/cache/conftool/dbconfig/20220329-095317-root.json
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23497 and previous config saved to /var/cache/conftool/dbconfig/20220329-095310-ladsgroup.json
  • 09:51 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS buster
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23496 and previous config saved to /var/cache/conftool/dbconfig/20220329-095103-ladsgroup.json
  • 09:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 09:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23495 and previous config saved to /var/cache/conftool/dbconfig/20220329-095026-ladsgroup.json
  • 09:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23494 and previous config saved to /var/cache/conftool/dbconfig/20220329-094342-ladsgroup.json
  • 09:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23493 and previous config saved to /var/cache/conftool/dbconfig/20220329-094334-ladsgroup.json
  • 09:43 mmandere: depool cp2027 for reimage - T290005
  • 09:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: host reimage
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23492 and previous config saved to /var/cache/conftool/dbconfig/20220329-093807-root.json
  • 09:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: host reimage
  • 09:35 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23491 and previous config saved to /var/cache/conftool/dbconfig/20220329-093521-ladsgroup.json
  • 09:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:31 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.5 refs T300204
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23490 and previous config saved to /var/cache/conftool/dbconfig/20220329-092829-ladsgroup.json
  • 09:28 hashar@deploy1002: Pruned MediaWiki: 1.39.0-wmf.1 (duration: 03m 49s)
  • 09:24 hashar@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.5 (duration: 77m 17s)
  • 09:24 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1157.eqiad.wmnet with OS bullseye
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23489 and previous config saved to /var/cache/conftool/dbconfig/20220329-092303-root.json
  • 09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23488 and previous config saved to /var/cache/conftool/dbconfig/20220329-092016-ladsgroup.json
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23487 and previous config saved to /var/cache/conftool/dbconfig/20220329-091324-ladsgroup.json
  • 09:11 marostegui: dbmaint s3@eqiad T298294
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23486 and previous config saved to /var/cache/conftool/dbconfig/20220329-090759-root.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P23485 and previous config saved to /var/cache/conftool/dbconfig/20220329-090737-marostegui.json
  • 09:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23484 and previous config saved to /var/cache/conftool/dbconfig/20220329-090510-ladsgroup.json
  • 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23483 and previous config saved to /var/cache/conftool/dbconfig/20220329-090303-ladsgroup.json
  • 09:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23482 and previous config saved to /var/cache/conftool/dbconfig/20220329-090250-ladsgroup.json
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23481 and previous config saved to /var/cache/conftool/dbconfig/20220329-085819-ladsgroup.json
  • 08:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23480 and previous config saved to /var/cache/conftool/dbconfig/20220329-084745-ladsgroup.json
  • 08:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:41 marostegui: dbmaint s3@eqiad T298557
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23479 and previous config saved to /var/cache/conftool/dbconfig/20220329-083240-ladsgroup.json
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23478 and previous config saved to /var/cache/conftool/dbconfig/20220329-081735-ladsgroup.json
  • 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23477 and previous config saved to /var/cache/conftool/dbconfig/20220329-081527-ladsgroup.json
  • 08:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 08:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23476 and previous config saved to /var/cache/conftool/dbconfig/20220329-081519-ladsgroup.json
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23475 and previous config saved to /var/cache/conftool/dbconfig/20220329-081124-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23474 and previous config saved to /var/cache/conftool/dbconfig/20220329-081116-ladsgroup.json
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:07 hashar@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.5
  • 08:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:02 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host hppxetest2001.codfw.wmnet with OS bullseye
  • 08:01 marostegui: dbmaint s3@eqiad T298563
  • 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23473 and previous config saved to /var/cache/conftool/dbconfig/20220329-080014-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23472 and previous config saved to /var/cache/conftool/dbconfig/20220329-075611-ladsgroup.json
  • 07:48 marostegui: dbmaint s3@eqiad T298554
  • 07:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23471 and previous config saved to /var/cache/conftool/dbconfig/20220329-074509-ladsgroup.json
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23470 and previous config saved to /var/cache/conftool/dbconfig/20220329-074106-ladsgroup.json
  • 07:37 marostegui: dbmaint s6@eqiad T297189
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P23469 and previous config saved to /var/cache/conftool/dbconfig/20220329-073703-root.json
  • 07:36 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host hppxetest2001.codfw.wmnet with OS bullseye
  • 07:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host hppxetest2001.codfw.wmnet
  • 07:34 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host hppxetest2001.codfw.wmnet
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23468 and previous config saved to /var/cache/conftool/dbconfig/20220329-073004-ladsgroup.json
  • 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23467 and previous config saved to /var/cache/conftool/dbconfig/20220329-072756-ladsgroup.json
  • 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23466 and previous config saved to /var/cache/conftool/dbconfig/20220329-072744-ladsgroup.json
  • 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23465 and previous config saved to /var/cache/conftool/dbconfig/20220329-072601-ladsgroup.json
  • 07:24 taavi: UTC morning deploys done
  • 07:23 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add viwiki eliminators to wgContentTranslationPublishRequirements (T299636) (duration: 00m 50s)
  • 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23464 and previous config saved to /var/cache/conftool/dbconfig/20220329-071239-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23463 and previous config saved to /var/cache/conftool/dbconfig/20220329-071148-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 07:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23462 and previous config saved to /var/cache/conftool/dbconfig/20220329-071140-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23461 and previous config saved to /var/cache/conftool/dbconfig/20220329-065734-ladsgroup.json
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23460 and previous config saved to /var/cache/conftool/dbconfig/20220329-065635-ladsgroup.json
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23459 and previous config saved to /var/cache/conftool/dbconfig/20220329-064229-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23458 and previous config saved to /var/cache/conftool/dbconfig/20220329-064130-ladsgroup.json
  • 06:40 _joe_: restarting varnish text-fe on cp1079
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23457 and previous config saved to /var/cache/conftool/dbconfig/20220329-064021-ladsgroup.json
  • 06:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 06:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23456 and previous config saved to /var/cache/conftool/dbconfig/20220329-064013-ladsgroup.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298556)', diff saved to https://phabricator.wikimedia.org/P23455 and previous config saved to /var/cache/conftool/dbconfig/20220329-062912-marostegui.json
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23454 and previous config saved to /var/cache/conftool/dbconfig/20220329-062625-ladsgroup.json
  • 06:25 marostegui: dbmaint s3@eqiad T300775
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23453 and previous config saved to /var/cache/conftool/dbconfig/20220329-062508-ladsgroup.json
  • 06:17 marostegui: dbmaint s3@eqiad T300381
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23452 and previous config saved to /var/cache/conftool/dbconfig/20220329-061407-marostegui.json
  • 06:11 marostegui: Maintenance on db1157 (old s3 master) T301848
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23451 and previous config saved to /var/cache/conftool/dbconfig/20220329-061004-ladsgroup.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157 T301850', diff saved to https://phabricator.wikimedia.org/P23450 and previous config saved to /var/cache/conftool/dbconfig/20220329-060532-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1123 to s3 primary and set section read-write T301850', diff saved to https://phabricator.wikimedia.org/P23449 and previous config saved to /var/cache/conftool/dbconfig/20220329-060059-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T301850', diff saved to https://phabricator.wikimedia.org/P23448 and previous config saved to /var/cache/conftool/dbconfig/20220329-060024-marostegui.json
  • 06:00 marostegui: Starting s3 eqiad failover from db1157 to db1123 - T301850
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23447 and previous config saved to /var/cache/conftool/dbconfig/20220329-055902-marostegui.json
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23446 and previous config saved to /var/cache/conftool/dbconfig/20220329-055544-ladsgroup.json
  • 05:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23445 and previous config saved to /var/cache/conftool/dbconfig/20220329-055458-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23444 and previous config saved to /var/cache/conftool/dbconfig/20220329-055251-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298556)', diff saved to https://phabricator.wikimedia.org/P23443 and previous config saved to /var/cache/conftool/dbconfig/20220329-054357-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T298556)', diff saved to https://phabricator.wikimedia.org/P23442 and previous config saved to /var/cache/conftool/dbconfig/20220329-052331-marostegui.json
  • 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T300775)', diff saved to https://phabricator.wikimedia.org/P23441 and previous config saved to /var/cache/conftool/dbconfig/20220329-051951-marostegui.json
  • 05:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 05:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300775)', diff saved to https://phabricator.wikimedia.org/P23440 and previous config saved to /var/cache/conftool/dbconfig/20220329-051943-marostegui.json
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P23439 and previous config saved to /var/cache/conftool/dbconfig/20220329-050438-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1123 with weight 0 T301850', diff saved to https://phabricator.wikimedia.org/P23438 and previous config saved to /var/cache/conftool/dbconfig/20220329-050234-root.json
  • 05:02 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 20 hosts with reason: Primary switchover s3 T301850
  • 05:02 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 20 hosts with reason: Primary switchover s3 T301850
  • 04:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P23437 and previous config saved to /var/cache/conftool/dbconfig/20220329-044933-marostegui.json
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300775)', diff saved to https://phabricator.wikimedia.org/P23436 and previous config saved to /var/cache/conftool/dbconfig/20220329-043428-marostegui.json
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-03-28

  • 23:15 eileen: civicrm revision 15d22bd1 -> 1c5d10e1
  • 23:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T300775)', diff saved to https://phabricator.wikimedia.org/P23434 and previous config saved to /var/cache/conftool/dbconfig/20220328-230012-marostegui.json
  • 23:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 23:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 23:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300775)', diff saved to https://phabricator.wikimedia.org/P23433 and previous config saved to /var/cache/conftool/dbconfig/20220328-230004-marostegui.json
  • 22:52 ejegg: updated fundraising python tools from 409c80b7 to 8f5119f6
  • 22:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P23431 and previous config saved to /var/cache/conftool/dbconfig/20220328-224459-marostegui.json
  • 22:39 rzl: rzl@cumin2002:~$ sudo cumin A:mw 'enable-puppet T205361'
  • 22:31 rzl: rzl@cumin2002:~$ sudo cumin A:mw 'disable-puppet T205361'
  • 22:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P23430 and previous config saved to /var/cache/conftool/dbconfig/20220328-222953-marostegui.json
  • 22:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300775)', diff saved to https://phabricator.wikimedia.org/P23429 and previous config saved to /var/cache/conftool/dbconfig/20220328-221448-marostegui.json
  • 21:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:32 sbassett: Undeployed sec patch for T285159, which caused a high volume of errors on the canaries
  • 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:12 eileen: civicrm revision 4e5b37c3 -> 15d22bd1
  • 21:09 eileen: tools revision changed from d1d7b100 to 409c80b7
  • 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:06 eileen: revision changed from d1d7b100 to 409c80b7
  • 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:03 sbassett@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Deploy CS-labs.php config to set StopForumSpam to enforce on beta (duration: 01m 03s)
  • 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:34 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.4/extensions/VisualEditor/modules/ve-mw/ui/ve.ui.MWSequenceRegistry.js: f32ae21: Disable backtick sequence in ve-mw while conflict with Catalan is investigated (T304804) (duration: 00m 57s)
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:22 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: dfa9638: Stop writing to $wmfAllServices (T45956) (duration: 00m 55s)
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e8a5b3b: GrowthExperiments: Add more expanded topics for GLAM campaign (T301029) (duration: 00m 50s)
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:14 herron: pruned /var/log/apache2/puppetmaster.puppet.log.[123]* on puppetmaster1001 T304898
  • 19:20 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 19:09 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 19:09 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=phab2001-vcs.codfw.wmnet
  • 19:07 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=phab2001.codfw.wmnet
  • 18:53 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=phab2001.codfw.wmnet
  • 18:50 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=phab2001.codfw.wmnet
  • 17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298556)', diff saved to https://phabricator.wikimedia.org/P23426 and previous config saved to /var/cache/conftool/dbconfig/20220328-173340-marostegui.json
  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P23425 and previous config saved to /var/cache/conftool/dbconfig/20220328-171835-marostegui.json
  • 17:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1142.eqiad.wmnet with OS buster
  • 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P23424 and previous config saved to /var/cache/conftool/dbconfig/20220328-170330-marostegui.json
  • 16:59 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298556)', diff saved to https://phabricator.wikimedia.org/P23423 and previous config saved to /var/cache/conftool/dbconfig/20220328-164825-marostegui.json
  • 16:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1142.eqiad.wmnet with OS buster
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T300775)', diff saved to https://phabricator.wikimedia.org/P23422 and previous config saved to /var/cache/conftool/dbconfig/20220328-163903-marostegui.json
  • 16:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 16:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300775)', diff saved to https://phabricator.wikimedia.org/P23421 and previous config saved to /var/cache/conftool/dbconfig/20220328-163855-marostegui.json
  • 16:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 16:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T298556)', diff saved to https://phabricator.wikimedia.org/P23420 and previous config saved to /var/cache/conftool/dbconfig/20220328-162644-marostegui.json
  • 16:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298556)', diff saved to https://phabricator.wikimedia.org/P23419 and previous config saved to /var/cache/conftool/dbconfig/20220328-162633-marostegui.json
  • 16:24 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P23418 and previous config saved to /var/cache/conftool/dbconfig/20220328-162350-marostegui.json
  • 16:22 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:20 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:19 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:16 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:14 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:13 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P23417 and previous config saved to /var/cache/conftool/dbconfig/20220328-161128-marostegui.json
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P23416 and previous config saved to /var/cache/conftool/dbconfig/20220328-160845-marostegui.json
  • 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P23415 and previous config saved to /var/cache/conftool/dbconfig/20220328-155622-marostegui.json
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300775)', diff saved to https://phabricator.wikimedia.org/P23414 and previous config saved to /var/cache/conftool/dbconfig/20220328-155340-marostegui.json
  • 15:52 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@b5b63c3]: (no justification provided) (duration: 02m 09s)
  • 15:50 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@b5b63c3]: (no justification provided)
  • 15:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298556)', diff saved to https://phabricator.wikimedia.org/P23413 and previous config saved to /var/cache/conftool/dbconfig/20220328-154117-marostegui.json
  • 15:39 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 15:38 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 54s)
  • 15:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2027.codfw.wmnet with OS buster
  • 15:23 moritzm: imported libapache2-mod-auth-cas 1.2-1+wmf11u2 to apt.wikimedia.org/bullseye-wikimedia
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T298556)', diff saved to https://phabricator.wikimedia.org/P23412 and previous config saved to /var/cache/conftool/dbconfig/20220328-152114-marostegui.json
  • 15:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 15:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298556)', diff saved to https://phabricator.wikimedia.org/P23411 and previous config saved to /var/cache/conftool/dbconfig/20220328-152105-marostegui.json
  • 15:15 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubernetes[1001-1004].eqiad.wmnet
  • 15:15 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubernetes[2001-2004].codfw.wmnet
  • 15:11 moritzm: imported libapache2-mod-auth-cas 1.2-1+wmf10u2 to apt.wikimedia.org/buster-wikimedia
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P23410 and previous config saved to /var/cache/conftool/dbconfig/20220328-150600-marostegui.json
  • 15:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2027.codfw.wmnet with OS buster
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P23409 and previous config saved to /var/cache/conftool/dbconfig/20220328-145055-marostegui.json
  • 14:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:47 inflatador: 'bking@cumin1001 repooling wdqs services in IAD ref T302494'
  • 14:46 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=eqiad
  • 14:45 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs*,name=eqiad
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298556)', diff saved to https://phabricator.wikimedia.org/P23408 and previous config saved to /var/cache/conftool/dbconfig/20220328-143550-marostegui.json
  • 14:28 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:20 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 14:20 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:19 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T298556)', diff saved to https://phabricator.wikimedia.org/P23407 and previous config saved to /var/cache/conftool/dbconfig/20220328-141552-marostegui.json
  • 14:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 14:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298556)', diff saved to https://phabricator.wikimedia.org/P23406 and previous config saved to /var/cache/conftool/dbconfig/20220328-141544-marostegui.json
  • 14:15 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts kubernetes[2001-2004].codfw.wmnet
  • 14:13 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts kubernetes[1001-1004].eqiad.wmnet
  • 14:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:12 akosiaris: decommission kubernetes100[1-4]. T303044
  • 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:07 mmandere: pool cp2029 with HAProxy as TLS termination layer - T290005
  • 14:06 taavi: deploy security patch for T226212
  • 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23405 and previous config saved to /var/cache/conftool/dbconfig/20220328-140039-marostegui.json
  • 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: Migrate $wmfAllServices to $wmgAllServices (T45956) (5/5, prod noop) (duration: 01m 04s)
  • 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CirrusSearch-labs.php: Config: Migrate $wmfAllServices to $wmgAllServices (T45956) (4/5, prod noop) (duration: 01m 07s)
  • 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/filebackend.php: Config: Migrate $wmfAllServices to $wmgAllServices (T45956) (3/5) (duration: 00m 51s)
  • 13:54 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Migrate $wmfAllServices to $wmgAllServices (T45956) (2/5) (duration: 00m 56s)
  • 13:53 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CirrusSearch-production.php: Config: Migrate $wmfAllServices to $wmgAllServices (T45956) (1/5) (duration: 00m 51s)
  • 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/throttle.php: Config: Throttle: Add rule for Bard College class project on enwiki (T304687) (duration: 00m 54s)
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23404 and previous config saved to /var/cache/conftool/dbconfig/20220328-134534-marostegui.json
  • 13:40 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2029.codfw.wmnet with OS buster
  • 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298556)', diff saved to https://phabricator.wikimedia.org/P23403 and previous config saved to /var/cache/conftool/dbconfig/20220328-133029-marostegui.json
  • 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:23 lucaswerkmeister-wmde@deploy1002: Synchronized phpcs.xml: Config: phpcs: narrow some exclusions only needed for cirrusTest.php (T171115) (2/2) (duration: 00m 55s)
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:22 lucaswerkmeister-wmde@deploy1002: Synchronized tests/cirrusTest.php: Config: phpcs: narrow some exclusions only needed for cirrusTest.php (T171115) (1/2) (duration: 00m 56s)
  • 13:18 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
  • 13:17 lucaswerkmeister-wmde@deploy1002: Synchronized phpcs.xml: Config: phpcs: enable passing rule UnusedGlobalVariables (T171115) (includes phpcs.xml change from previous sync) (duration: 00m 56s)
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:15 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
  • 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:13 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: phpcs: enable and fix SingleSpaceBeforeSingleLineComment (T171115) (phpcs.xml will be synced with next patch) (duration: 01m 01s)
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Write "unexpectedUnconnectedPage" page prop everywhere (duration: 00m 56s)
  • 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:03 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:57 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2029.codfw.wmnet with OS buster
  • 12:50 mmandere: depool cp2029 for reimage - T290005
  • 12:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:44 moritzm: installing Intel microcode updates 2022-02-07 on Buster
  • 12:44 mmandere: pool cp2031 with HAProxy as TLS termination layer - T290005
  • 12:43 urbanecm: Clear signup authentication throttle per https://wikitech.wikimedia.org/wiki/Increasing_account_creation_threshold for 195.113.155.4 (T304836)
  • 12:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:41 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: 3ba524d: throttle: Add rule for Czech Wikigap 2022 (T304836) (duration: 00m 52s)
  • 12:40 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 12:39 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 12:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2031.codfw.wmnet with OS buster
  • 12:36 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 12:36 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 12:34 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 12:34 jayme@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 12:32 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 12:31 jayme@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T298556)', diff saved to https://phabricator.wikimedia.org/P23402 and previous config saved to /var/cache/conftool/dbconfig/20220328-123015-marostegui.json
  • 12:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298556)', diff saved to https://phabricator.wikimedia.org/P23401 and previous config saved to /var/cache/conftool/dbconfig/20220328-123007-marostegui.json
  • 12:16 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P23400 and previous config saved to /var/cache/conftool/dbconfig/20220328-121501-marostegui.json
  • 12:13 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P23399 and previous config saved to /var/cache/conftool/dbconfig/20220328-115956-marostegui.json
  • 11:55 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS buster
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298556)', diff saved to https://phabricator.wikimedia.org/P23398 and previous config saved to /var/cache/conftool/dbconfig/20220328-114451-marostegui.json
  • 11:44 mmandere: depool cp2031 for reimage - T290005
  • 11:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 11:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 11:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 11:25 moritzm: installing Intel microcode updates 2022-02-07 on Bullseye
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T298556)', diff saved to https://phabricator.wikimedia.org/P23397 and previous config saved to /var/cache/conftool/dbconfig/20220328-112352-marostegui.json
  • 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298556)', diff saved to https://phabricator.wikimedia.org/P23396 and previous config saved to /var/cache/conftool/dbconfig/20220328-112345-marostegui.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P23395 and previous config saved to /var/cache/conftool/dbconfig/20220328-110839-marostegui.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P23394 and previous config saved to /var/cache/conftool/dbconfig/20220328-105333-marostegui.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298556)', diff saved to https://phabricator.wikimedia.org/P23393 and previous config saved to /var/cache/conftool/dbconfig/20220328-103828-marostegui.json
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T300775)', diff saved to https://phabricator.wikimedia.org/P23392 and previous config saved to /var/cache/conftool/dbconfig/20220328-102915-marostegui.json
  • 10:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 10:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23391 and previous config saved to /var/cache/conftool/dbconfig/20220328-102014-root.json
  • 10:17 mmandere: pool cp2033 with HAProxy as TLS termination layer - T290005
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T298556)', diff saved to https://phabricator.wikimedia.org/P23390 and previous config saved to /var/cache/conftool/dbconfig/20220328-101712-marostegui.json
  • 10:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298556)', diff saved to https://phabricator.wikimedia.org/P23389 and previous config saved to /var/cache/conftool/dbconfig/20220328-101704-marostegui.json
  • 10:13 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2033.codfw.wmnet with OS buster
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23387 and previous config saved to /var/cache/conftool/dbconfig/20220328-100511-root.json
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P23386 and previous config saved to /var/cache/conftool/dbconfig/20220328-100159-marostegui.json
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23385 and previous config saved to /var/cache/conftool/dbconfig/20220328-095007-root.json
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P23384 and previous config saved to /var/cache/conftool/dbconfig/20220328-094653-marostegui.json
  • 09:46 moritzm: installing Linux 4.9.303 on Stretch hosts
  • 09:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2033.codfw.wmnet with reason: host reimage
  • 09:43 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2033.codfw.wmnet with reason: host reimage
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23383 and previous config saved to /var/cache/conftool/dbconfig/20220328-093503-root.json
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298556)', diff saved to https://phabricator.wikimedia.org/P23382 and previous config saved to /var/cache/conftool/dbconfig/20220328-093148-marostegui.json
  • 09:24 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2033.codfw.wmnet with OS buster
  • 09:13 moritzm: installing Linux 4.19.235 on Buster hosts
  • 09:11 mmandere: depool cp2033 for reimage - T290005
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T298556)', diff saved to https://phabricator.wikimedia.org/P23379 and previous config saved to /var/cache/conftool/dbconfig/20220328-091041-marostegui.json
  • 09:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298556)', diff saved to https://phabricator.wikimedia.org/P23378 and previous config saved to /var/cache/conftool/dbconfig/20220328-091033-marostegui.json
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After downgrade ', diff saved to https://phabricator.wikimedia.org/P23377 and previous config saved to /var/cache/conftool/dbconfig/20220328-090445-root.json
  • 09:03 moritzm: installing Linux 5.10.106 on Bullseye hosts
  • 08:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P23376 and previous config saved to /var/cache/conftool/dbconfig/20220328-085528-marostegui.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23375 and previous config saved to /var/cache/conftool/dbconfig/20220328-085507-root.json
  • 08:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:50 jynus: deploy new alerting (0.7.1) for db backups at alert1001 T138562
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After downgrade ', diff saved to https://phabricator.wikimedia.org/P23374 and previous config saved to /var/cache/conftool/dbconfig/20220328-084941-root.json
  • 08:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:47 marostegui: dbmaint s1@eqiad T304812
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 T304812', diff saved to https://phabricator.wikimedia.org/P23373 and previous config saved to /var/cache/conftool/dbconfig/20220328-084705-marostegui.json
  • 08:46 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable WRITE BOTH for templatelinks normalization in more wikis (T299421) (duration: 00m 54s)
  • 08:46 _joe_: uploading conftool 2.0.0, T302471
  • 08:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable videojs in the second batch of wikis (T248418) (duration: 00m 55s)
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P23371 and previous config saved to /var/cache/conftool/dbconfig/20220328-084023-marostegui.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23370 and previous config saved to /var/cache/conftool/dbconfig/20220328-084003-root.json
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After downgrade ', diff saved to https://phabricator.wikimedia.org/P23369 and previous config saved to /var/cache/conftool/dbconfig/20220328-083437-root.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298556)', diff saved to https://phabricator.wikimedia.org/P23368 and previous config saved to /var/cache/conftool/dbconfig/20220328-082518-marostegui.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23367 and previous config saved to /var/cache/conftool/dbconfig/20220328-082459-root.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After downgrade ', diff saved to https://phabricator.wikimedia.org/P23366 and previous config saved to /var/cache/conftool/dbconfig/20220328-081933-root.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23365 and previous config saved to /var/cache/conftool/dbconfig/20220328-080955-root.json
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23364 and previous config saved to /var/cache/conftool/dbconfig/20220328-080841-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: After downgrade ', diff saved to https://phabricator.wikimedia.org/P23363 and previous config saved to /var/cache/conftool/dbconfig/20220328-080429-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T298556)', diff saved to https://phabricator.wikimedia.org/P23362 and previous config saved to /var/cache/conftool/dbconfig/20220328-080409-marostegui.json
  • 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298556)', diff saved to https://phabricator.wikimedia.org/P23361 and previous config saved to /var/cache/conftool/dbconfig/20220328-080401-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23360 and previous config saved to /var/cache/conftool/dbconfig/20220328-075451-root.json
  • 07:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23359 and previous config saved to /var/cache/conftool/dbconfig/20220328-075337-root.json
  • 07:51 marostegui: dbmaint s1@codfw T304812
  • 07:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23358 and previous config saved to /var/cache/conftool/dbconfig/20220328-074856-marostegui.json
  • 07:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:39 moritzm: updated d-i images for Buster 10.12 release T304546
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23357 and previous config saved to /var/cache/conftool/dbconfig/20220328-073833-root.json
  • 07:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:34 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Remove unused CentralAuth settings (2/2) (duration: 00m 55s)
  • 07:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:33 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused CentralAuth settings (1/2) (duration: 00m 56s)
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23356 and previous config saved to /var/cache/conftool/dbconfig/20220328-073351-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23355 and previous config saved to /var/cache/conftool/dbconfig/20220328-072329-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298556)', diff saved to https://phabricator.wikimedia.org/P23354 and previous config saved to /var/cache/conftool/dbconfig/20220328-071846-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P23353 and previous config saved to /var/cache/conftool/dbconfig/20220328-071427-marostegui.json
  • 07:13 moritzm: updated d-i images for Bullseye 11.3 release T304599
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23352 and previous config saved to /var/cache/conftool/dbconfig/20220328-070825-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After schema downgrade ', diff saved to https://phabricator.wikimedia.org/P23351 and previous config saved to /var/cache/conftool/dbconfig/20220328-070700-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23350 and previous config saved to /var/cache/conftool/dbconfig/20220328-070154-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23349 and previous config saved to /var/cache/conftool/dbconfig/20220328-070139-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for downgrade', diff saved to https://phabricator.wikimedia.org/P23348 and previous config saved to /var/cache/conftool/dbconfig/20220328-070056-marostegui.json
  • 06:52 elukey: reboot ml-serve-ctrl1002 - ganeti console available but slow (attempted to root login but never get to input the password)
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After schema downgrade ', diff saved to https://phabricator.wikimedia.org/P23347 and previous config saved to /var/cache/conftool/dbconfig/20220328-065156-root.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T298556)', diff saved to https://phabricator.wikimedia.org/P23346 and previous config saved to /var/cache/conftool/dbconfig/20220328-065048-marostegui.json
  • 06:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 06:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298556)', diff saved to https://phabricator.wikimedia.org/P23345 and previous config saved to /var/cache/conftool/dbconfig/20220328-065040-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23344 and previous config saved to /var/cache/conftool/dbconfig/20220328-064650-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23343 and previous config saved to /var/cache/conftool/dbconfig/20220328-064635-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After schema downgrade ', diff saved to https://phabricator.wikimedia.org/P23342 and previous config saved to /var/cache/conftool/dbconfig/20220328-063652-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P23341 and previous config saved to /var/cache/conftool/dbconfig/20220328-063535-marostegui.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23340 and previous config saved to /var/cache/conftool/dbconfig/20220328-063146-root.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23339 and previous config saved to /var/cache/conftool/dbconfig/20220328-063131-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After schema downgrade ', diff saved to https://phabricator.wikimedia.org/P23338 and previous config saved to /var/cache/conftool/dbconfig/20220328-062149-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P23337 and previous config saved to /var/cache/conftool/dbconfig/20220328-062030-marostegui.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23336 and previous config saved to /var/cache/conftool/dbconfig/20220328-061642-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23335 and previous config saved to /var/cache/conftool/dbconfig/20220328-061627-root.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After schema downgrade ', diff saved to https://phabricator.wikimedia.org/P23334 and previous config saved to /var/cache/conftool/dbconfig/20220328-060645-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298556)', diff saved to https://phabricator.wikimedia.org/P23333 and previous config saved to /var/cache/conftool/dbconfig/20220328-060525-marostegui.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for downgrade', diff saved to https://phabricator.wikimedia.org/P23332 and previous config saved to /var/cache/conftool/dbconfig/20220328-060239-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23331 and previous config saved to /var/cache/conftool/dbconfig/20220328-060138-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 10%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23330 and previous config saved to /var/cache/conftool/dbconfig/20220328-060123-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 for downgrade', diff saved to https://phabricator.wikimedia.org/P23329 and previous config saved to /var/cache/conftool/dbconfig/20220328-054552-marostegui.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T298556)', diff saved to https://phabricator.wikimedia.org/P23328 and previous config saved to /var/cache/conftool/dbconfig/20220328-053816-marostegui.json
  • 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 05:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 12 hosts with reason: Maintenance
  • 05:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 12 hosts with reason: Maintenance
  • 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:32 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: sync
  • 05:32 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: sync
  • 04:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23327 and previous config saved to /var/cache/conftool/dbconfig/20220328-042334-ladsgroup.json
  • 04:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23326 and previous config saved to /var/cache/conftool/dbconfig/20220328-040829-ladsgroup.json
  • 03:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23325 and previous config saved to /var/cache/conftool/dbconfig/20220328-035323-ladsgroup.json
  • 03:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23324 and previous config saved to /var/cache/conftool/dbconfig/20220328-033818-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23323 and previous config saved to /var/cache/conftool/dbconfig/20220328-023804-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 02:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23322 and previous config saved to /var/cache/conftool/dbconfig/20220328-023756-ladsgroup.json
  • 02:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23321 and previous config saved to /var/cache/conftool/dbconfig/20220328-022251-ladsgroup.json
  • 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23320 and previous config saved to /var/cache/conftool/dbconfig/20220328-020746-ladsgroup.json
  • 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23319 and previous config saved to /var/cache/conftool/dbconfig/20220328-015241-ladsgroup.json
  • 01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23318 and previous config saved to /var/cache/conftool/dbconfig/20220328-012553-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 01:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23317 and previous config saved to /var/cache/conftool/dbconfig/20220328-012543-ladsgroup.json
  • 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23316 and previous config saved to /var/cache/conftool/dbconfig/20220328-011038-ladsgroup.json
  • 00:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23315 and previous config saved to /var/cache/conftool/dbconfig/20220328-005533-ladsgroup.json
  • 00:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23314 and previous config saved to /var/cache/conftool/dbconfig/20220328-004027-ladsgroup.json
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23313 and previous config saved to /var/cache/conftool/dbconfig/20220328-001707-ladsgroup.json
  • 00:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 00:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance

2022-03-27

  • 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23312 and previous config saved to /var/cache/conftool/dbconfig/20220327-235516-ladsgroup.json
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23311 and previous config saved to /var/cache/conftool/dbconfig/20220327-234011-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23310 and previous config saved to /var/cache/conftool/dbconfig/20220327-232506-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23309 and previous config saved to /var/cache/conftool/dbconfig/20220327-231001-ladsgroup.json
  • 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23308 and previous config saved to /var/cache/conftool/dbconfig/20220327-224707-ladsgroup.json
  • 22:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23307 and previous config saved to /var/cache/conftool/dbconfig/20220327-224659-ladsgroup.json
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23306 and previous config saved to /var/cache/conftool/dbconfig/20220327-223154-ladsgroup.json
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23305 and previous config saved to /var/cache/conftool/dbconfig/20220327-221649-ladsgroup.json
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23304 and previous config saved to /var/cache/conftool/dbconfig/20220327-220143-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23303 and previous config saved to /var/cache/conftool/dbconfig/20220327-215440-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 21:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23302 and previous config saved to /var/cache/conftool/dbconfig/20220327-215432-ladsgroup.json
  • 21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23301 and previous config saved to /var/cache/conftool/dbconfig/20220327-213927-ladsgroup.json
  • 21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23300 and previous config saved to /var/cache/conftool/dbconfig/20220327-212422-ladsgroup.json
  • 21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23299 and previous config saved to /var/cache/conftool/dbconfig/20220327-210917-ladsgroup.json
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23298 and previous config saved to /var/cache/conftool/dbconfig/20220327-204604-ladsgroup.json
  • 20:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 20:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:20 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: sync
  • 20:20 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: sync
  • 19:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 19:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23297 and previous config saved to /var/cache/conftool/dbconfig/20220327-195258-ladsgroup.json
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23296 and previous config saved to /var/cache/conftool/dbconfig/20220327-193753-ladsgroup.json
  • 19:35 _joe_: $ sudo cumin -b1 -s20 'A:mw-api and P{mw13[56-82].eqiad.wmnet}' 'restart-php7.2-fpm'
  • 19:25 _joe_: restarting php on mw1380
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23295 and previous config saved to /var/cache/conftool/dbconfig/20220327-192247-ladsgroup.json
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23294 and previous config saved to /var/cache/conftool/dbconfig/20220327-190742-ladsgroup.json
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23293 and previous config saved to /var/cache/conftool/dbconfig/20220327-184107-ladsgroup.json
  • 18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23292 and previous config saved to /var/cache/conftool/dbconfig/20220327-184059-ladsgroup.json
  • 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23291 and previous config saved to /var/cache/conftool/dbconfig/20220327-182554-ladsgroup.json
  • 18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23290 and previous config saved to /var/cache/conftool/dbconfig/20220327-181049-ladsgroup.json
  • 17:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23289 and previous config saved to /var/cache/conftool/dbconfig/20220327-175544-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23288 and previous config saved to /var/cache/conftool/dbconfig/20220327-165530-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 16:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23287 and previous config saved to /var/cache/conftool/dbconfig/20220327-165522-ladsgroup.json
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23286 and previous config saved to /var/cache/conftool/dbconfig/20220327-164017-ladsgroup.json
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23285 and previous config saved to /var/cache/conftool/dbconfig/20220327-162511-ladsgroup.json
  • 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23284 and previous config saved to /var/cache/conftool/dbconfig/20220327-161006-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23283 and previous config saved to /var/cache/conftool/dbconfig/20220327-154357-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 15:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 15:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 14:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 14:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23282 and previous config saved to /var/cache/conftool/dbconfig/20220327-145341-ladsgroup.json
  • 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23281 and previous config saved to /var/cache/conftool/dbconfig/20220327-143835-ladsgroup.json
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23280 and previous config saved to /var/cache/conftool/dbconfig/20220327-142330-ladsgroup.json
  • 14:20 elukey: roll restart of wqds-blazegraph-public codfw
  • 14:18 elukey: restart blazegraph on wdqs2003
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23279 and previous config saved to /var/cache/conftool/dbconfig/20220327-140825-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23278 and previous config saved to /var/cache/conftool/dbconfig/20220327-134411-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23277 and previous config saved to /var/cache/conftool/dbconfig/20220327-134358-ladsgroup.json
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23276 and previous config saved to /var/cache/conftool/dbconfig/20220327-132852-ladsgroup.json
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23275 and previous config saved to /var/cache/conftool/dbconfig/20220327-131347-ladsgroup.json
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23274 and previous config saved to /var/cache/conftool/dbconfig/20220327-125842-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23273 and previous config saved to /var/cache/conftool/dbconfig/20220327-125128-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23272 and previous config saved to /var/cache/conftool/dbconfig/20220327-125120-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23271 and previous config saved to /var/cache/conftool/dbconfig/20220327-123615-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23270 and previous config saved to /var/cache/conftool/dbconfig/20220327-122110-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23269 and previous config saved to /var/cache/conftool/dbconfig/20220327-120604-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23268 and previous config saved to /var/cache/conftool/dbconfig/20220327-114152-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23267 and previous config saved to /var/cache/conftool/dbconfig/20220327-112003-ladsgroup.json
  • 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23266 and previous config saved to /var/cache/conftool/dbconfig/20220327-110457-ladsgroup.json
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23265 and previous config saved to /var/cache/conftool/dbconfig/20220327-104952-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23264 and previous config saved to /var/cache/conftool/dbconfig/20220327-103447-ladsgroup.json
  • 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23263 and previous config saved to /var/cache/conftool/dbconfig/20220327-101022-ladsgroup.json
  • 10:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 10:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23262 and previous config saved to /var/cache/conftool/dbconfig/20220327-101014-ladsgroup.json
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23261 and previous config saved to /var/cache/conftool/dbconfig/20220327-095509-ladsgroup.json
  • 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23260 and previous config saved to /var/cache/conftool/dbconfig/20220327-094004-ladsgroup.json
  • 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23259 and previous config saved to /var/cache/conftool/dbconfig/20220327-092459-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23258 and previous config saved to /var/cache/conftool/dbconfig/20220327-085741-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23257 and previous config saved to /var/cache/conftool/dbconfig/20220327-085733-ladsgroup.json
  • 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23256 and previous config saved to /var/cache/conftool/dbconfig/20220327-084228-ladsgroup.json
  • 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23255 and previous config saved to /var/cache/conftool/dbconfig/20220327-082723-ladsgroup.json
  • 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23254 and previous config saved to /var/cache/conftool/dbconfig/20220327-081218-ladsgroup.json
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23253 and previous config saved to /var/cache/conftool/dbconfig/20220327-071203-ladsgroup.json
  • 07:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23252 and previous config saved to /var/cache/conftool/dbconfig/20220327-071156-ladsgroup.json
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23251 and previous config saved to /var/cache/conftool/dbconfig/20220327-065651-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23250 and previous config saved to /var/cache/conftool/dbconfig/20220327-064146-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23249 and previous config saved to /var/cache/conftool/dbconfig/20220327-062641-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23248 and previous config saved to /var/cache/conftool/dbconfig/20220327-055108-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23247 and previous config saved to /var/cache/conftool/dbconfig/20220327-055100-ladsgroup.json
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23246 and previous config saved to /var/cache/conftool/dbconfig/20220327-053555-ladsgroup.json
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23245 and previous config saved to /var/cache/conftool/dbconfig/20220327-052050-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23244 and previous config saved to /var/cache/conftool/dbconfig/20220327-050545-ladsgroup.json
  • 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23243 and previous config saved to /var/cache/conftool/dbconfig/20220327-044235-ladsgroup.json
  • 04:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 04:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 04:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 04:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23242 and previous config saved to /var/cache/conftool/dbconfig/20220327-042041-ladsgroup.json
  • 04:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23241 and previous config saved to /var/cache/conftool/dbconfig/20220327-040536-ladsgroup.json
  • 03:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23240 and previous config saved to /var/cache/conftool/dbconfig/20220327-035031-ladsgroup.json
  • 03:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23239 and previous config saved to /var/cache/conftool/dbconfig/20220327-033526-ladsgroup.json
  • 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23238 and previous config saved to /var/cache/conftool/dbconfig/20220327-031115-ladsgroup.json
  • 03:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 03:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23237 and previous config saved to /var/cache/conftool/dbconfig/20220327-031108-ladsgroup.json
  • 02:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23236 and previous config saved to /var/cache/conftool/dbconfig/20220327-025603-ladsgroup.json
  • 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23235 and previous config saved to /var/cache/conftool/dbconfig/20220327-024057-ladsgroup.json
  • 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23234 and previous config saved to /var/cache/conftool/dbconfig/20220327-022552-ladsgroup.json
  • 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23233 and previous config saved to /var/cache/conftool/dbconfig/20220327-015848-ladsgroup.json
  • 01:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 01:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23232 and previous config saved to /var/cache/conftool/dbconfig/20220327-015840-ladsgroup.json
  • 01:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23231 and previous config saved to /var/cache/conftool/dbconfig/20220327-014335-ladsgroup.json
  • 01:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23230 and previous config saved to /var/cache/conftool/dbconfig/20220327-012829-ladsgroup.json
  • 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23229 and previous config saved to /var/cache/conftool/dbconfig/20220327-011324-ladsgroup.json
  • 00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23228 and previous config saved to /var/cache/conftool/dbconfig/20220327-005010-ladsgroup.json
  • 00:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 00:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 00:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 00:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 00:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 00:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23227 and previous config saved to /var/cache/conftool/dbconfig/20220327-000023-ladsgroup.json

2022-03-26

  • 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23226 and previous config saved to /var/cache/conftool/dbconfig/20220326-234517-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23225 and previous config saved to /var/cache/conftool/dbconfig/20220326-233012-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23224 and previous config saved to /var/cache/conftool/dbconfig/20220326-231507-ladsgroup.json
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23223 and previous config saved to /var/cache/conftool/dbconfig/20220326-224955-ladsgroup.json
  • 22:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23222 and previous config saved to /var/cache/conftool/dbconfig/20220326-224947-ladsgroup.json
  • 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23221 and previous config saved to /var/cache/conftool/dbconfig/20220326-223442-ladsgroup.json
  • 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23220 and previous config saved to /var/cache/conftool/dbconfig/20220326-221937-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23219 and previous config saved to /var/cache/conftool/dbconfig/20220326-220432-ladsgroup.json
  • 21:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23218 and previous config saved to /var/cache/conftool/dbconfig/20220326-210417-ladsgroup.json
  • 21:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 21:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 21:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23217 and previous config saved to /var/cache/conftool/dbconfig/20220326-210409-ladsgroup.json
  • 20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23216 and previous config saved to /var/cache/conftool/dbconfig/20220326-204904-ladsgroup.json
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23214 and previous config saved to /var/cache/conftool/dbconfig/20220326-203359-ladsgroup.json
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23213 and previous config saved to /var/cache/conftool/dbconfig/20220326-201854-ladsgroup.json
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23212 and previous config saved to /var/cache/conftool/dbconfig/20220326-195245-ladsgroup.json
  • 19:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 19:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 19:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23211 and previous config saved to /var/cache/conftool/dbconfig/20220326-190244-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23210 and previous config saved to /var/cache/conftool/dbconfig/20220326-184739-ladsgroup.json
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23209 and previous config saved to /var/cache/conftool/dbconfig/20220326-183234-ladsgroup.json
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23208 and previous config saved to /var/cache/conftool/dbconfig/20220326-181729-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23207 and previous config saved to /var/cache/conftool/dbconfig/20220326-175315-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23206 and previous config saved to /var/cache/conftool/dbconfig/20220326-175302-ladsgroup.json
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23205 and previous config saved to /var/cache/conftool/dbconfig/20220326-173757-ladsgroup.json
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23204 and previous config saved to /var/cache/conftool/dbconfig/20220326-172250-ladsgroup.json
  • 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23203 and previous config saved to /var/cache/conftool/dbconfig/20220326-170745-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23202 and previous config saved to /var/cache/conftool/dbconfig/20220326-170047-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 17:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23201 and previous config saved to /var/cache/conftool/dbconfig/20220326-170039-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23200 and previous config saved to /var/cache/conftool/dbconfig/20220326-164534-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23199 and previous config saved to /var/cache/conftool/dbconfig/20220326-163029-ladsgroup.json
  • 16:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23198 and previous config saved to /var/cache/conftool/dbconfig/20220326-161523-ladsgroup.json
  • 16:00 Amir1: start of mwscript maintenance/migrateLinksTable.php --wiki enwiki --table templatelinks --sleep 2 on beta cluster (T299424)
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23197 and previous config saved to /var/cache/conftool/dbconfig/20220326-155025-ladsgroup.json
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23196 and previous config saved to /var/cache/conftool/dbconfig/20220326-152835-ladsgroup.json
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23195 and previous config saved to /var/cache/conftool/dbconfig/20220326-151330-ladsgroup.json
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23194 and previous config saved to /var/cache/conftool/dbconfig/20220326-145825-ladsgroup.json
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23193 and previous config saved to /var/cache/conftool/dbconfig/20220326-144320-ladsgroup.json
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23192 and previous config saved to /var/cache/conftool/dbconfig/20220326-141912-ladsgroup.json
  • 14:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23191 and previous config saved to /var/cache/conftool/dbconfig/20220326-141904-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23190 and previous config saved to /var/cache/conftool/dbconfig/20220326-140359-ladsgroup.json
  • 13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23189 and previous config saved to /var/cache/conftool/dbconfig/20220326-134854-ladsgroup.json
  • 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23188 and previous config saved to /var/cache/conftool/dbconfig/20220326-133349-ladsgroup.json
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23187 and previous config saved to /var/cache/conftool/dbconfig/20220326-130701-ladsgroup.json
  • 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23186 and previous config saved to /var/cache/conftool/dbconfig/20220326-130653-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23185 and previous config saved to /var/cache/conftool/dbconfig/20220326-125148-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23184 and previous config saved to /var/cache/conftool/dbconfig/20220326-123643-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23183 and previous config saved to /var/cache/conftool/dbconfig/20220326-122136-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23182 and previous config saved to /var/cache/conftool/dbconfig/20220326-112122-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23181 and previous config saved to /var/cache/conftool/dbconfig/20220326-112114-ladsgroup.json
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23180 and previous config saved to /var/cache/conftool/dbconfig/20220326-110609-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23179 and previous config saved to /var/cache/conftool/dbconfig/20220326-105104-ladsgroup.json
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23178 and previous config saved to /var/cache/conftool/dbconfig/20220326-103559-ladsgroup.json
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23177 and previous config saved to /var/cache/conftool/dbconfig/20220326-100918-ladsgroup.json
  • 10:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 10:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23176 and previous config saved to /var/cache/conftool/dbconfig/20220326-100911-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23175 and previous config saved to /var/cache/conftool/dbconfig/20220326-095405-ladsgroup.json
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23174 and previous config saved to /var/cache/conftool/dbconfig/20220326-093900-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23173 and previous config saved to /var/cache/conftool/dbconfig/20220326-092355-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23172 and previous config saved to /var/cache/conftool/dbconfig/20220326-085938-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 08:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23171 and previous config saved to /var/cache/conftool/dbconfig/20220326-083731-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23170 and previous config saved to /var/cache/conftool/dbconfig/20220326-082225-ladsgroup.json
  • 08:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23169 and previous config saved to /var/cache/conftool/dbconfig/20220326-080720-ladsgroup.json
  • 07:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23168 and previous config saved to /var/cache/conftool/dbconfig/20220326-075215-ladsgroup.json
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23167 and previous config saved to /var/cache/conftool/dbconfig/20220326-072702-ladsgroup.json
  • 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23166 and previous config saved to /var/cache/conftool/dbconfig/20220326-072654-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23165 and previous config saved to /var/cache/conftool/dbconfig/20220326-071149-ladsgroup.json
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23164 and previous config saved to /var/cache/conftool/dbconfig/20220326-065644-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23163 and previous config saved to /var/cache/conftool/dbconfig/20220326-064139-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23162 and previous config saved to /var/cache/conftool/dbconfig/20220326-062131-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23161 and previous config saved to /var/cache/conftool/dbconfig/20220326-062123-ladsgroup.json
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23160 and previous config saved to /var/cache/conftool/dbconfig/20220326-060618-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23159 and previous config saved to /var/cache/conftool/dbconfig/20220326-055113-ladsgroup.json
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23158 and previous config saved to /var/cache/conftool/dbconfig/20220326-053607-ladsgroup.json
  • 05:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23157 and previous config saved to /var/cache/conftool/dbconfig/20220326-051140-ladsgroup.json
  • 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 04:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 04:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 04:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 04:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 04:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 04:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 04:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 04:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 04:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23156 and previous config saved to /var/cache/conftool/dbconfig/20220326-042136-ladsgroup.json
  • 04:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23155 and previous config saved to /var/cache/conftool/dbconfig/20220326-040631-ladsgroup.json
  • 03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23154 and previous config saved to /var/cache/conftool/dbconfig/20220326-035126-ladsgroup.json
  • 03:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23153 and previous config saved to /var/cache/conftool/dbconfig/20220326-033621-ladsgroup.json
  • 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23152 and previous config saved to /var/cache/conftool/dbconfig/20220326-025754-ladsgroup.json
  • 02:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 02:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23151 and previous config saved to /var/cache/conftool/dbconfig/20220326-025746-ladsgroup.json
  • 02:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23150 and previous config saved to /var/cache/conftool/dbconfig/20220326-024241-ladsgroup.json
  • 02:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23149 and previous config saved to /var/cache/conftool/dbconfig/20220326-022736-ladsgroup.json
  • 02:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23148 and previous config saved to /var/cache/conftool/dbconfig/20220326-021231-ladsgroup.json
  • 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23147 and previous config saved to /var/cache/conftool/dbconfig/20220326-011216-ladsgroup.json
  • 01:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 01:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23146 and previous config saved to /var/cache/conftool/dbconfig/20220326-011209-ladsgroup.json
  • 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23145 and previous config saved to /var/cache/conftool/dbconfig/20220326-005704-ladsgroup.json
  • 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23144 and previous config saved to /var/cache/conftool/dbconfig/20220326-004159-ladsgroup.json
  • 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23143 and previous config saved to /var/cache/conftool/dbconfig/20220326-002653-ladsgroup.json

2022-03-25

  • 23:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23142 and previous config saved to /var/cache/conftool/dbconfig/20220325-235855-ladsgroup.json
  • 23:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 23:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 23:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 23:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 23:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23141 and previous config saved to /var/cache/conftool/dbconfig/20220325-230540-ladsgroup.json
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23140 and previous config saved to /var/cache/conftool/dbconfig/20220325-225035-ladsgroup.json
  • 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23139 and previous config saved to /var/cache/conftool/dbconfig/20220325-223530-ladsgroup.json
  • 22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23138 and previous config saved to /var/cache/conftool/dbconfig/20220325-222025-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23137 and previous config saved to /var/cache/conftool/dbconfig/20220325-215400-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23136 and previous config saved to /var/cache/conftool/dbconfig/20220325-215346-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23135 and previous config saved to /var/cache/conftool/dbconfig/20220325-213841-ladsgroup.json
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23134 and previous config saved to /var/cache/conftool/dbconfig/20220325-212336-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23133 and previous config saved to /var/cache/conftool/dbconfig/20220325-210831-ladsgroup.json
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23132 and previous config saved to /var/cache/conftool/dbconfig/20220325-210136-ladsgroup.json
  • 21:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 21:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23131 and previous config saved to /var/cache/conftool/dbconfig/20220325-210128-ladsgroup.json
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23130 and previous config saved to /var/cache/conftool/dbconfig/20220325-204623-ladsgroup.json
  • 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23129 and previous config saved to /var/cache/conftool/dbconfig/20220325-203118-ladsgroup.json
  • 20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23128 and previous config saved to /var/cache/conftool/dbconfig/20220325-201613-ladsgroup.json
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23127 and previous config saved to /var/cache/conftool/dbconfig/20220325-195137-ladsgroup.json
  • 19:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23126 and previous config saved to /var/cache/conftool/dbconfig/20220325-192923-ladsgroup.json
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23125 and previous config saved to /var/cache/conftool/dbconfig/20220325-191416-ladsgroup.json
  • 19:10 mutante: copying dump from deploy server to dumps server: scp -3 deploy1002.eqiad.wmnet:/srv/miscweb/static-bugzilla.tar.gz labstore1006.wikimedia.org:~ (T284193)
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23124 and previous config saved to /var/cache/conftool/dbconfig/20220325-185911-ladsgroup.json
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23123 and previous config saved to /var/cache/conftool/dbconfig/20220325-184406-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23122 and previous config saved to /var/cache/conftool/dbconfig/20220325-181439-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23121 and previous config saved to /var/cache/conftool/dbconfig/20220325-181431-ladsgroup.json
  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23120 and previous config saved to /var/cache/conftool/dbconfig/20220325-175926-ladsgroup.json
  • 17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23119 and previous config saved to /var/cache/conftool/dbconfig/20220325-174421-ladsgroup.json
  • 17:42 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23118 and previous config saved to /var/cache/conftool/dbconfig/20220325-172916-ladsgroup.json
  • 17:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23117 and previous config saved to /var/cache/conftool/dbconfig/20220325-170154-ladsgroup.json
  • 17:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23116 and previous config saved to /var/cache/conftool/dbconfig/20220325-170146-ladsgroup.json
  • 16:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23115 and previous config saved to /var/cache/conftool/dbconfig/20220325-164641-ladsgroup.json
  • 16:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23114 and previous config saved to /var/cache/conftool/dbconfig/20220325-163136-ladsgroup.json
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23112 and previous config saved to /var/cache/conftool/dbconfig/20220325-161631-ladsgroup.json
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23111 and previous config saved to /var/cache/conftool/dbconfig/20220325-154705-ladsgroup.json
  • 15:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 15:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23110 and previous config saved to /var/cache/conftool/dbconfig/20220325-154658-ladsgroup.json
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23109 and previous config saved to /var/cache/conftool/dbconfig/20220325-153152-ladsgroup.json
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23108 and previous config saved to /var/cache/conftool/dbconfig/20220325-151647-ladsgroup.json
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23107 and previous config saved to /var/cache/conftool/dbconfig/20220325-150141-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23101 and previous config saved to /var/cache/conftool/dbconfig/20220325-143545-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23100 and previous config saved to /var/cache/conftool/dbconfig/20220325-141301-ladsgroup.json
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23099 and previous config saved to /var/cache/conftool/dbconfig/20220325-140850-root.json
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23098 and previous config saved to /var/cache/conftool/dbconfig/20220325-135756-ladsgroup.json
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23097 and previous config saved to /var/cache/conftool/dbconfig/20220325-135346-root.json
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23096 and previous config saved to /var/cache/conftool/dbconfig/20220325-134251-ladsgroup.json
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23095 and previous config saved to /var/cache/conftool/dbconfig/20220325-133842-root.json
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23094 and previous config saved to /var/cache/conftool/dbconfig/20220325-132746-ladsgroup.json
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23093 and previous config saved to /var/cache/conftool/dbconfig/20220325-132338-root.json
  • 13:22 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23092 and previous config saved to /var/cache/conftool/dbconfig/20220325-130834-root.json
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23091 and previous config saved to /var/cache/conftool/dbconfig/20220325-130146-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23090 and previous config saved to /var/cache/conftool/dbconfig/20220325-130138-ladsgroup.json
  • 12:49 hoo: Updated operations/dumps/dcat on snapshot10(08|09|11|12|13) from d4886f6 to a1f46e4
  • 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23089 and previous config saved to /var/cache/conftool/dbconfig/20220325-124633-ladsgroup.json
  • 12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23088 and previous config saved to /var/cache/conftool/dbconfig/20220325-123128-ladsgroup.json
  • 12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23086 and previous config saved to /var/cache/conftool/dbconfig/20220325-121623-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23085 and previous config saved to /var/cache/conftool/dbconfig/20220325-120708-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23084 and previous config saved to /var/cache/conftool/dbconfig/20220325-120701-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23083 and previous config saved to /var/cache/conftool/dbconfig/20220325-115156-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23082 and previous config saved to /var/cache/conftool/dbconfig/20220325-113651-ladsgroup.json
  • 11:24 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23081 and previous config saved to /var/cache/conftool/dbconfig/20220325-112145-ladsgroup.json
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T302658)', diff saved to https://phabricator.wikimedia.org/P23080 and previous config saved to /var/cache/conftool/dbconfig/20220325-110217-marostegui.json
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23079 and previous config saved to /var/cache/conftool/dbconfig/20220325-104712-marostegui.json
  • 10:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23078 and previous config saved to /var/cache/conftool/dbconfig/20220325-103310-ladsgroup.json
  • 10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23077 and previous config saved to /var/cache/conftool/dbconfig/20220325-103207-marostegui.json
  • 10:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
  • 10:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1005.eqiad.wmnet
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T302658)', diff saved to https://phabricator.wikimedia.org/P23076 and previous config saved to /var/cache/conftool/dbconfig/20220325-101701-marostegui.json
  • 10:11 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1005.eqiad.wmnet
  • 10:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23075 and previous config saved to /var/cache/conftool/dbconfig/20220325-101016-ladsgroup.json
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23074 and previous config saved to /var/cache/conftool/dbconfig/20220325-095511-ladsgroup.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T302658)', diff saved to https://phabricator.wikimedia.org/P23073 and previous config saved to /var/cache/conftool/dbconfig/20220325-094031-marostegui.json
  • 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T302658)', diff saved to https://phabricator.wikimedia.org/P23072 and previous config saved to /var/cache/conftool/dbconfig/20220325-094023-marostegui.json
  • 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23071 and previous config saved to /var/cache/conftool/dbconfig/20220325-094006-ladsgroup.json
  • 09:27 moritzm: updating libapache2-mod-auth-cas on moscovium/debmonitor1002
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23070 and previous config saved to /var/cache/conftool/dbconfig/20220325-092518-marostegui.json
  • 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23069 and previous config saved to /var/cache/conftool/dbconfig/20220325-092500-ladsgroup.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23068 and previous config saved to /var/cache/conftool/dbconfig/20220325-091013-marostegui.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T302658)', diff saved to https://phabricator.wikimedia.org/P23067 and previous config saved to /var/cache/conftool/dbconfig/20220325-085508-marostegui.json
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23066 and previous config saved to /var/cache/conftool/dbconfig/20220325-082446-ladsgroup.json
  • 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T302658)', diff saved to https://phabricator.wikimedia.org/P23065 and previous config saved to /var/cache/conftool/dbconfig/20220325-080403-marostegui.json
  • 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T302658)', diff saved to https://phabricator.wikimedia.org/P23064 and previous config saved to /var/cache/conftool/dbconfig/20220325-080355-marostegui.json
  • 08:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23063 and previous config saved to /var/cache/conftool/dbconfig/20220325-075610-ladsgroup.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23062 and previous config saved to /var/cache/conftool/dbconfig/20220325-074850-marostegui.json
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23061 and previous config saved to /var/cache/conftool/dbconfig/20220325-074105-ladsgroup.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23060 and previous config saved to /var/cache/conftool/dbconfig/20220325-073345-marostegui.json
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23059 and previous config saved to /var/cache/conftool/dbconfig/20220325-072559-ladsgroup.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T302658)', diff saved to https://phabricator.wikimedia.org/P23058 and previous config saved to /var/cache/conftool/dbconfig/20220325-071840-marostegui.json
  • 07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23057 and previous config saved to /var/cache/conftool/dbconfig/20220325-071054-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23056 and previous config saved to /var/cache/conftool/dbconfig/20220325-064139-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 06:31 _joe_: deleting a couple zotero pods with excessive number of restarts
  • 06:29 marostegui: dbmaint s4@eqiad T300775
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P23055 and previous config saved to /var/cache/conftool/dbconfig/20220325-060723-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T302658)', diff saved to https://phabricator.wikimedia.org/P23054 and previous config saved to /var/cache/conftool/dbconfig/20220325-054705-marostegui.json
  • 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for testing', diff saved to https://phabricator.wikimedia.org/P23053 and previous config saved to /var/cache/conftool/dbconfig/20220325-053037-marostegui.json
  • 00:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2027.codfw.wmnet with OS buster

2022-03-24

  • 23:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2027.codfw.wmnet with OS buster
  • 23:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2027.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23050 and previous config saved to /var/cache/conftool/dbconfig/20220324-223031-marostegui.json
  • 22:19 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23049 and previous config saved to /var/cache/conftool/dbconfig/20220324-221526-marostegui.json
  • 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host restbase2027.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:10 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 22:07 ebernhardson: restart wcqs-blazegraph on wcqs2001 to resolve intermittant BlazegraphFreeAllocatorsDecreasingRapidly
  • 22:06 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23048 and previous config saved to /var/cache/conftool/dbconfig/20220324-220021-marostegui.json
  • 21:54 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23047 and previous config saved to /var/cache/conftool/dbconfig/20220324-214515-marostegui.json
  • 21:42 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 21:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:11 inflatador: bking@cumin1001 restarting blazegraph on wdqs[1003-1013].eqiad.wmnet for T293862
  • 21:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4338532: fawiki: Set celebration logo for new vector (T304314; 2/2) (duration: 00m 53s)
  • 21:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-fawiki-new-year.png: 4338532: fawiki: Set celebration logo for new vector (T304314; 1/2) (duration: 00m 50s)
  • 21:07 thcipriani@deploy1002: Finished deploy [releng/phatality@15f8ec0]: Deploying phatality updates for opensearch 1.2.0 (duration: 00m 13s)
  • 21:07 thcipriani@deploy1002: Started deploy [releng/phatality@15f8ec0]: Deploying phatality updates for opensearch 1.2.0
  • 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:03 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 50s)
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:43 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Start writing to $wmgAllServices the same value as to $wmfAllServices (T45956) (duration: 01m 17s)
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:31 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Stop writing to certain $wmf* global variables (T45956) (part 3) (duration: 00m 55s)
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:29 thcipriani@deploy1002: Synchronized docroot/noc/db.php: Config: Stop writing to certain $wmf* global variables (T45956) (part II) (duration: 00m 51s)
  • 20:28 thcipriani@deploy1002: Synchronized tests: Config: Stop writing to certain $wmf* global variables (T45956) (part I) (duration: 00m 50s)
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:23 thcipriani@deploy1002: Synchronized portals: Config: Bumping portals to master (T282012) (duration: 00m 52s)
  • 20:22 thcipriani@deploy1002: Synchronized portals/wikipedia.org/assets: Config: Bumping portals to master (T282012) (duration: 00m 52s)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 20:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23045 and previous config saved to /var/cache/conftool/dbconfig/20220324-201305-marostegui.json
  • 20:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 20:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 20:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23044 and previous config saved to /var/cache/conftool/dbconfig/20220324-201257-marostegui.json
  • 20:08 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Use $wmgUseRestbaseVRS in comment (T45956) (duration: 01m 05s)
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:03 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23043 and previous config saved to /var/cache/conftool/dbconfig/20220324-195752-marostegui.json
  • 19:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23042 and previous config saved to /var/cache/conftool/dbconfig/20220324-194246-marostegui.json
  • 19:35 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 19:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23041 and previous config saved to /var/cache/conftool/dbconfig/20220324-192741-marostegui.json
  • 19:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1148.eqiad.wmnet with OS buster
  • 19:20 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1147.eqiad.wmnet with OS buster
  • 19:02 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1142.eqiad.wmnet with OS buster
  • 18:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1142.eqiad.wmnet with OS buster
  • 18:41 cstone: civicrm revision changed from b6ceb722 to 4e5b37c3
  • 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23040 and previous config saved to /var/cache/conftool/dbconfig/20220324-183654-root.json
  • 18:36 razzi: razzi@deneb:~$ sudo docker system prune (reclaimed 33GB)
  • 18:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1146.eqiad.wmnet with OS buster
  • 18:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1144.eqiad.wmnet with OS buster
  • 18:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1145.eqiad.wmnet with OS buster
  • 18:26 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1143.eqiad.wmnet with OS buster
  • 18:26 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1142.eqiad.wmnet with OS buster
  • 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23039 and previous config saved to /var/cache/conftool/dbconfig/20220324-182150-root.json
  • 18:17 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 18:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1148.eqiad.wmnet with OS buster
  • 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1147.eqiad.wmnet with OS buster
  • 18:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
  • 18:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23038 and previous config saved to /var/cache/conftool/dbconfig/20220324-180646-root.json
  • 18:05 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 17:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS buster
  • 17:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1144.eqiad.wmnet with OS buster
  • 17:58 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 17:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
  • 17:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1142.eqiad.wmnet with OS buster
  • 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23037 and previous config saved to /var/cache/conftool/dbconfig/20220324-175142-root.json
  • 17:44 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 17:36 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23036 and previous config saved to /var/cache/conftool/dbconfig/20220324-173638-root.json
  • 17:36 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 17:36 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 17:36 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 17:35 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23035 and previous config saved to /var/cache/conftool/dbconfig/20220324-173450-marostegui.json
  • 17:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 17:34 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 17:32 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 17:32 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 17:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:12 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:10 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:07 urbanecm@deploy1002: Synchronized logos/config.yaml: 05d55a9: fawiki: Set new year celebration (T304314; 3/3) (duration: 00m 49s)
  • 17:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:06 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 05d55a9: fawiki: Set new year celebration (T304314; 2/3) (duration: 00m 49s)
  • 17:04 urbanecm@deploy1002: Synchronized static/images/project-logos/: 05d55a9: fawiki: Set new year celebration (T304314; 1/3) (duration: 00m 50s)
  • 17:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1145.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1146.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS bullseye
  • 16:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1146.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1144.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
  • 16:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1145.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
  • 16:29 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.4 refs T300203
  • 16:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:25 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.4 refs T300203 (duration: 01m 06s)
  • 16:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
  • 16:24 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.4 refs T300203
  • 16:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
  • 16:20 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4 refs T300203
  • 16:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1144.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:19 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS bullseye
  • 16:13 brennen: trainsperiment (T300203): blockers clear, logs triaged, rolling 1.39.0-wmf.4 out to all wikis again
  • 16:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 16:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1142.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1001.eqiad.wmnet with OS bullseye
  • 15:56 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
  • 15:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
  • 15:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1142.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS bullseye
  • 15:24 XioNoX: codfw: disable BGP to DE-CIX for link move
  • 15:03 moritzm: installing openssl1.0 security updates on stretch
  • 14:39 moritzm: installing containerd updates on ml-serve*
  • 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23030 and previous config saved to /var/cache/conftool/dbconfig/20220324-143149-marostegui.json
  • 14:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 14:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 14:26 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23029 and previous config saved to /var/cache/conftool/dbconfig/20220324-142233-root.json
  • 14:11 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2002-dev.codfw.wmnet with OS bullseye
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23028 and previous config saved to /var/cache/conftool/dbconfig/20220324-140729-root.json
  • 14:00 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: host reimage
  • 13:57 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: host reimage
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23026 and previous config saved to /var/cache/conftool/dbconfig/20220324-135225-root.json
  • 13:43 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2002-dev.codfw.wmnet with OS bullseye
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23025 and previous config saved to /var/cache/conftool/dbconfig/20220324-133721-root.json
  • 13:34 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudgw2001-dev.codfw.wmnet with OS bullseye
  • 13:26 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: T45956 (duration: 00m 49s)
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 reedy@deploy1002: Synchronized multiversion/: T45956 (duration: 00m 50s)
  • 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23024 and previous config saved to /var/cache/conftool/dbconfig/20220324-132217-root.json
  • 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:21 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
  • 13:18 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:15 reedy@deploy1002: Synchronized tests/: T45956 (duration: 00m 49s)
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:10 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T292802 (duration: 00m 50s)
  • 12:54 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2001-dev.codfw.wmnet with OS bullseye
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158 for schema change', diff saved to https://phabricator.wikimedia.org/P23023 and previous config saved to /var/cache/conftool/dbconfig/20220324-125225-marostegui.json
  • 11:47 jynus: updating eqiad swift-commonswiki backups of originals T299764
  • 11:26 mmandere: pool cp1076 with HAProxy as TLS termination layer - T290005
  • 11:22 jbond: puppet cert clean rendering.svc.eqiad.wmnet
  • 11:21 jbond: removing old api.svc.codfw.wmnet.pem and appservers.svc.codfw.wmnet.pem from root@puppetmaster1001:/var/lib/puppet/server/ssl/ca/signed#
  • 11:15 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1017.eqiad.wmnet with OS bullseye
  • 11:14 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 11:10 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1076.eqiad.wmnet with OS buster
  • 11:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1017.eqiad.wmnet with reason: host reimage
  • 11:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1017.eqiad.wmnet with reason: host reimage
  • 10:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
  • 10:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
  • 10:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
  • 10:46 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
  • 10:45 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1017.eqiad.wmnet with OS bullseye
  • 10:43 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
  • 10:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
  • 10:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
  • 10:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1014.eqiad.wmnet with OS bullseye
  • 10:34 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
  • 10:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
  • 10:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: host reimage
  • 10:27 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1076.eqiad.wmnet with OS buster
  • 10:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
  • 10:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: host reimage
  • 10:20 mmandere: depool cp1076 for reimage - T290005
  • 10:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
  • 10:09 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1014.eqiad.wmnet with OS bullseye
  • 10:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
  • 09:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet
  • 09:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1096.eqiad.wmnet
  • 09:31 mmandere: pool cp1078 with HAProxy as TLS termination layer - T290005
  • 09:30 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1078.eqiad.wmnet with OS buster
  • 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:28 jnuche@deploy1002: Synchronized php-1.39.0-wmf.4/includes/Linker.php: (no justification provided) (duration: 00m 50s)
  • 09:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:08 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
  • 09:05 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
  • 09:00 oblivian@puppetmaster1001: conftool action : set/enabled=true; selector: name=parameter_q,cluster=cache-text
  • 08:48 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1078.eqiad.wmnet with OS buster
  • 08:45 oblivian@puppetmaster1001: conftool action : set/enabled=false; selector: name=parameter_q,cluster=cache-text
  • 08:44 marostegui: dbmaint s7@eqiad T302658
  • 08:43 oblivian@puppetmaster1001: conftool action : set/enabled=true; selector: name=parameter_q,cluster=cache-text
  • 08:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1013.eqiad.wmnet with OS bullseye
  • 08:36 mmandere: depool cp1078 for reimage - T290005
  • 08:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: host reimage
  • 08:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: host reimage
  • 08:12 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1013.eqiad.wmnet with OS bullseye
  • 08:11 marostegui: dbmaint s7@codfw T302658
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After testing', diff saved to https://phabricator.wikimedia.org/P23022 and previous config saved to /var/cache/conftool/dbconfig/20220324-080528-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After testing', diff saved to https://phabricator.wikimedia.org/P23021 and previous config saved to /var/cache/conftool/dbconfig/20220324-075024-root.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23020 and previous config saved to /var/cache/conftool/dbconfig/20220324-074841-root.json
  • 07:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1012.eqiad.wmnet with OS bullseye
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After testing', diff saved to https://phabricator.wikimedia.org/P23019 and previous config saved to /var/cache/conftool/dbconfig/20220324-073520-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23018 and previous config saved to /var/cache/conftool/dbconfig/20220324-073337-root.json
  • 07:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: host reimage
  • 07:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: host reimage
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After testing', diff saved to https://phabricator.wikimedia.org/P23017 and previous config saved to /var/cache/conftool/dbconfig/20220324-072017-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23016 and previous config saved to /var/cache/conftool/dbconfig/20220324-071832-root.json
  • 07:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1012.eqiad.wmnet with OS bullseye
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After testing', diff saved to https://phabricator.wikimedia.org/P23015 and previous config saved to /var/cache/conftool/dbconfig/20220324-070513-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23014 and previous config saved to /var/cache/conftool/dbconfig/20220324-070327-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for testing', diff saved to https://phabricator.wikimedia.org/P23013 and previous config saved to /var/cache/conftool/dbconfig/20220324-065940-marostegui.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23012 and previous config saved to /var/cache/conftool/dbconfig/20220324-064823-root.json
  • 06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 01:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 01:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 00:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bullseye
  • 00:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bullseye
  • 00:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1045.eqiad.wmnet with OS bullseye
  • 00:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
  • 00:07 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 00:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
  • 00:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
  • 00:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 00:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage

2022-03-23

  • 23:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bullseye
  • 23:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bullseye
  • 23:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bullseye
  • 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:38 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.3 refs T300203
  • 23:34 brennen: trainsperiment (T300203): reverting to 1.39.0-wmf.3 on all wikis for T304564; will move forward again after a fix.
  • 23:25 cwhite: remove openjdk-8-jre from codfw logstash nodes T301770
  • 23:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bullseye
  • 22:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 22:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 22:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1042.eqiad.wmnet with OS bullseye
  • 22:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bullseye
  • 22:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bullseye
  • 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 22:23 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 22:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 22:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 22:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bullseye
  • 22:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bullseye
  • 21:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bullseye
  • 21:42 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - T301956
  • 21:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 21:31 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Enable split A/B testing on beta cluster (T301584) (duration: 00m 50s)
  • 21:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bullseye
  • 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:15 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Allow autoconfirmed users to view basic IP information (T303858) and Enable IPInfo on testwiki (T260598) (duration: 00m 50s)
  • 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bullseye
  • 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bullseye
  • 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bullseye
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:40 catrope@deploy1002: Synchronized wmf-config/extension-list: Config: DynamicSidebar: remove unused extension (T304006) (duration: 00m 49s)
  • 20:34 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: DynamicSidebar: remove from InitialiseSettings (duration: 00m 51s)
  • 20:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 20:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 20:32 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bullseye
  • 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:18 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - T301956
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bullseye
  • 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bullseye
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:13 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: DynamicSidebar: remove from CommonSettings (T304006) (duration: 00m 50s)
  • 20:10 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wikitech: Remove DynamicSidebar (T304006) (duration: 00m 52s)
  • 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:01 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - T301956
  • 19:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - T301956
  • 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:37 brennen: trainsperiment (T300203): 1.39.0-wmf.4 on all wikis; logs seem clean - end of train deployment activities for the week, unless bugs emerge
  • 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:23 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.4 refs T300203
  • 19:23 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - T301956
  • 19:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bullseye
  • 19:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bullseye
  • 19:10 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - T301956
  • 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:09 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.4 refs T300203 (duration: 00m 52s)
  • 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.4 refs T300203
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:59 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4 refs T300203
  • 18:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 18:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 18:53 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4 refs T300203
  • 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 18:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 18:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:47 brennen: trainsperiment (T300203): 1.39.0-wmf.4 on testwikis; proceeding to groups 0-2 with 15 minute intervals for watching logs
  • 18:46 brennen@deploy1002: Pruned MediaWiki: 1.38.0-wmf.26 (duration: 02m 05s)
  • 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:42 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.4 refs T300203 (duration: 49m 41s)
  • 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bullseye
  • 18:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bullseye
  • 18:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:52 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.4 refs T300203
  • 17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bullseye
  • 17:48 brennen: trainsperiment (T300203): starting prep for 1.39.0-wmf.4
  • 17:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bullseye
  • 17:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1028.eqiad.wmnet with OS bullseye
  • 17:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 17:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 17:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 17:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 17:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
  • 17:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
  • 17:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bullseye
  • 16:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bullseye
  • 16:58 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 16:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS bullseye
  • 16:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 16:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1011.eqiad.wmnet with OS bullseye
  • 16:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1032.eqiad.wmnet with OS bullseye
  • 16:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
  • 16:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
  • 16:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1011.eqiad.wmnet with OS bullseye
  • 16:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 16:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 15:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bullseye
  • 15:39 urbanecm: foreachwikiindblist wikipedia extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments # T304052
  • 15:38 urbanecm: Created shnwikivoyage and guwwiki
  • 15:31 mmandere: pool cp1080 with HAProxy as TLS termination layer - T290005
  • 15:28 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1080.eqiad.wmnet with OS buster
  • 15:27 urbanecm@deploy1002: Synchronized langlist: Creating guwwiki (T303727) (duration: 01m 04s)
  • 15:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating guwwiki (T303727) (duration: 01m 07s)
  • 15:25 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating guwwiki (T303727) (duration: 01m 05s)
  • 15:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1031.eqiad.wmnet with OS bullseye
  • 15:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating guwwiki (T303727) (duration: 01m 06s)
  • 15:23 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating guwwiki (T303727)
  • 15:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:21 urbanecm@deploy1002: Synchronized dblists: Creating guwwiki (T303727) (duration: 01m 10s)
  • 15:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:19 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating guwwiki (T303727) (duration: 01m 05s)
  • 15:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating shnwikivoyage (T302797) (duration: 01m 05s)
  • 15:14 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating shnwikivoyage (T302797) (duration: 01m 05s)
  • 15:13 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating shnwikivoyage (T302797) (duration: 01m 05s)
  • 15:12 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating shnwikivoyage (T302797)
  • 15:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:09 urbanecm@deploy1002: Synchronized dblists: Creating shnwikivoyage (T302797) (duration: 01m 05s)
  • 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:08 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating shnwikivoyage (T302797) (duration: 01m 05s)
  • 15:05 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
  • 15:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 15:01 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
  • 15:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1030.eqiad.wmnet with OS bullseye
  • 14:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 14:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 14:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1031.eqiad.wmnet with OS bullseye
  • 14:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1030.eqiad.wmnet with reason: host reimage
  • 14:44 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1030.eqiad.wmnet with reason: host reimage
  • 14:44 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS buster
  • 14:41 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.3/extensions/WikimediaMaintenance/addWiki.php: 9a0aed0: addWiki: Create GrowthExperiment tables for all new Wikipedias (T304052) (duration: 01m 06s)
  • 14:38 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1085.eqiad.wmnet
  • 14:37 mmandere: depool cp1080 for reimage - T290005
  • 14:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1030.eqiad.wmnet with OS bullseye
  • 14:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:28 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 14:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 14:23 bblack: reboot cp1085 (downtimed)
  • 14:20 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 14:19 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wcqs1002.eqiad.wmnet
  • 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1029.eqiad.wmnet with OS bullseye
  • 14:11 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1027.eqiad.wmnet with OS bullseye
  • 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:06 mmandere: pool cp1082 with HAProxy as TLS termination layer - T290005
  • 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 14:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 14:00 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1082.eqiad.wmnet with OS buster
  • 14:00 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1029.eqiad.wmnet with reason: host reimage
  • 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:57 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 13:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1029.eqiad.wmnet with reason: host reimage
  • 13:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1010.eqiad.wmnet with OS bullseye
  • 13:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
  • 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:48 Lucas_WMDE: UTC afternoon backport window done
  • 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Enable Wikibase REST API on beta wikidata (T302959) (2/2, production no-op) (duration: 01m 05s)
  • 13:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Enable Wikibase REST API on beta wikidata (T302959) (1/2, production no-op) (duration: 01m 07s)
  • 13:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
  • 13:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1029.eqiad.wmnet with OS bullseye
  • 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T300775)', diff saved to https://phabricator.wikimedia.org/P23010 and previous config saved to /var/cache/conftool/dbconfig/20220323-134153-marostegui.json
  • 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300775)', diff saved to https://phabricator.wikimedia.org/P23009 and previous config saved to /var/cache/conftool/dbconfig/20220323-134140-marostegui.json
  • 13:39 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Write "unexpectedUnconnectedPage" page prop on Test Wikidata clients (duration: 01m 10s)
  • 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
  • 13:38 moritzm: restarting superset for OpenSSL update
  • 13:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
  • 13:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1027.eqiad.wmnet with OS bullseye
  • 13:34 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
  • 13:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23008 and previous config saved to /var/cache/conftool/dbconfig/20220323-132635-marostegui.json
  • 13:19 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1010.eqiad.wmnet with OS bullseye
  • 13:16 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1082.eqiad.wmnet with OS buster
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23005 and previous config saved to /var/cache/conftool/dbconfig/20220323-131130-marostegui.json
  • 13:07 mmandere: depool cp1082 for reimage - T290005
  • 12:58 moritzm: installing bind security updates
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300775)', diff saved to https://phabricator.wikimedia.org/P23004 and previous config saved to /var/cache/conftool/dbconfig/20220323-125625-marostegui.json
  • 12:29 moritzm: restarting Turnilo for OpenSSL update
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 after testing', diff saved to https://phabricator.wikimedia.org/P23003 and previous config saved to /var/cache/conftool/dbconfig/20220323-120749-marostegui.json
  • 11:34 jbond: upload new puppetboard_3.1.0-1+deb11u1_all.deb
  • 11:33 moritzm: installing apache security updates on stretch
  • 11:00 mmandere: pool cp1081 with HAProxy as TLS termination layer - T290005
  • 10:58 moritzm: restarting apache on matomo1002/piwik.wikimedia.org
  • 10:52 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1081.eqiad.wmnet with OS buster
  • 10:30 moritzm: restarting ntpd
  • 10:28 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
  • 10:24 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 some more weight T301879', diff saved to https://phabricator.wikimedia.org/P23002 and previous config saved to /var/cache/conftool/dbconfig/20220323-101816-marostegui.json
  • 10:07 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1081.eqiad.wmnet with OS buster
  • 09:56 mmandere: depool cp1081 for reimage - T290005
  • 09:43 mmandere: pool cp1079 with HAProxy as TLS termination layer - T290005
  • 09:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1079.eqiad.wmnet with OS buster
  • 09:24 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:17 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:15 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
  • 09:11 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
  • 09:06 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:54 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS buster
  • 08:54 moritzm: restarting spamassassin/clamav on otrs1001/ticket.wikimedia.org
  • 08:51 mmandere@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1079.eqiad.wmnet with OS buster
  • 08:47 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS buster
  • 08:43 moritzm: installing openssl security updates
  • 08:36 mmandere: depool cp1079 for reimage - T290005
  • 08:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1009.eqiad.wmnet with OS bullseye
  • 08:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
  • 08:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
  • 07:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1009.eqiad.wmnet with OS bullseye
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P23001 and previous config saved to /var/cache/conftool/dbconfig/20220323-074408-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P23000 and previous config saved to /var/cache/conftool/dbconfig/20220323-072904-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P22999 and previous config saved to /var/cache/conftool/dbconfig/20220323-071400-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P22998 and previous config saved to /var/cache/conftool/dbconfig/20220323-065856-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P22997 and previous config saved to /var/cache/conftool/dbconfig/20220323-064353-root.json
  • 06:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1112.eqiad.wmnet with OS bullseye
  • 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1112.eqiad.wmnet with reason: host reimage
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1112.eqiad.wmnet with reason: host reimage
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1112.eqiad.wmnet with OS bullseye
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for reimage', diff saved to https://phabricator.wikimedia.org/P22996 and previous config saved to /var/cache/conftool/dbconfig/20220323-060533-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 with low weight T301879', diff saved to https://phabricator.wikimedia.org/P22995 and previous config saved to /var/cache/conftool/dbconfig/20220323-060351-marostegui.json
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:20 ejegg: updated payments-wiki from 3048f0aa to 28e24856
  • 00:11 cjming: end running skin preference update script T299104

2022-03-22

  • 23:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 23:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1024.eqiad.wmnet with reason: host reimage
  • 23:35 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1024.eqiad.wmnet with reason: host reimage
  • 23:23 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 23:11 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:41 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:41 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 22:27 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 22:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 22:25 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 22:24 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:24 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 22:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 22:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 22:20 ryankemper: T301511 Mutated cirrus codfw cluster settings to what [I think] they should be, see https://phabricator.wikimedia.org/T301511#7798415; forcing re-check
  • 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T300775)', diff saved to https://phabricator.wikimedia.org/P22993 and previous config saved to /var/cache/conftool/dbconfig/20220322-221503-marostegui.json
  • 22:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 22:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 22:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300775)', diff saved to https://phabricator.wikimedia.org/P22992 and previous config saved to /var/cache/conftool/dbconfig/20220322-221455-marostegui.json
  • 22:09 ryankemper: T301511 Forcing recheck of codfw cirrus setting check
  • 22:04 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
  • 22:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1026.eqiad.wmnet with reason: host reimage
  • 21:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22991 and previous config saved to /var/cache/conftool/dbconfig/20220322-215950-marostegui.json
  • 21:59 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 21:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
  • 21:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1026.eqiad.wmnet with reason: host reimage
  • 21:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 21:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22990 and previous config saved to /var/cache/conftool/dbconfig/20220322-214445-marostegui.json
  • 21:39 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 21:35 ryankemper: T301511 Fixed elastic* eqiad cross-cluster search settings (see https://phabricator.wikimedia.org/T301511#7798267) to resolve the `ElasticSearch setting check` alerts in eqiad
  • 21:33 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300775)', diff saved to https://phabricator.wikimedia.org/P22989 and previous config saved to /var/cache/conftool/dbconfig/20220322-212939-marostegui.json
  • 21:21 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:18 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:05 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:37 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:32 urbanecm: UTC late backport window done
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ce18d4e: testwiki: enable testing of topics match mode for GLAM events (T301825) (duration: 01m 06s)
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 17caf03: Enable EventGate logging for WikipediaPortal schema (T271163) (duration: 01m 54s)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22986 and previous config saved to /var/cache/conftool/dbconfig/20220322-191049-marostegui.json
  • 19:04 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 19:02 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22985 and previous config saved to /var/cache/conftool/dbconfig/20220322-185542-marostegui.json
  • 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22984 and previous config saved to /var/cache/conftool/dbconfig/20220322-184037-marostegui.json
  • 18:30 razzi: remove old karapace1001 known hosts following reimage: `razzi@puppetmaster1001:~$ ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "karapace1001.eqiad.wmnet"`
  • 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22982 and previous config saved to /var/cache/conftool/dbconfig/20220322-182531-marostegui.json
  • 18:01 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@c4d0736]: (no justification provided) (duration: 05m 16s)
  • 17:55 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@c4d0736]: (no justification provided)
  • 17:50 dcausse@deploy1002: Started scap: (no justification provided)
  • 17:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1004.eqiad.wmnet with OS bullseye
  • 17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22981 and previous config saved to /var/cache/conftool/dbconfig/20220322-173301-marostegui.json
  • 17:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298557)', diff saved to https://phabricator.wikimedia.org/P22980 and previous config saved to /var/cache/conftool/dbconfig/20220322-173253-marostegui.json
  • 17:25 brennen: trainsperiment (T300203): with 1.39.0-wmf.3 on all wikis, we're paused for a planned catchup window - nothing to do at the moment, we'll deploy 1.39.0-wmf.4 tomorrow (2022-03-23).
  • 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22979 and previous config saved to /var/cache/conftool/dbconfig/20220322-171748-marostegui.json
  • 17:15 taavi: deploy security patch for T304354
  • 17:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1004.eqiad.wmnet with reason: host reimage
  • 17:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1004.eqiad.wmnet with reason: host reimage
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22978 and previous config saved to /var/cache/conftool/dbconfig/20220322-170243-marostegui.json
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298557)', diff saved to https://phabricator.wikimedia.org/P22974 and previous config saved to /var/cache/conftool/dbconfig/20220322-164738-marostegui.json
  • 16:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1004.eqiad.wmnet with OS bullseye
  • 16:35 ebernhardson: T303548 start wikidatawiki reindexing on eqiad codfw and cloudelastic cirrus clusters
  • 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298557)', diff saved to https://phabricator.wikimedia.org/P22973 and previous config saved to /var/cache/conftool/dbconfig/20220322-162917-marostegui.json
  • 16:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298557)', diff saved to https://phabricator.wikimedia.org/P22972 and previous config saved to /var/cache/conftool/dbconfig/20220322-162904-marostegui.json
  • 16:27 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
  • 16:27 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
  • 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
  • 16:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
  • 16:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 16:17 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 16:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
  • 16:16 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22971 and previous config saved to /var/cache/conftool/dbconfig/20220322-161359-marostegui.json
  • 16:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
  • 16:13 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:11 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:09 btullis@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 16:07 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 16:00 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 15:59 moritzm: imported jvmquake 1.0.1 for stretch/buster (JDK8) and bullseye (JDK11)
  • 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22970 and previous config saved to /var/cache/conftool/dbconfig/20220322-155854-marostegui.json
  • 15:56 btullis@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298557)', diff saved to https://phabricator.wikimedia.org/P22969 and previous config saved to /var/cache/conftool/dbconfig/20220322-154349-marostegui.json
  • 15:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
  • 15:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
  • 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298557)', diff saved to https://phabricator.wikimedia.org/P22968 and previous config saved to /var/cache/conftool/dbconfig/20220322-152508-marostegui.json
  • 15:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22967 and previous config saved to /var/cache/conftool/dbconfig/20220322-152247-root.json
  • 15:17 hashar: Gerrit 3.3.10 up and running T304226
  • 15:14 hashar: Stopping Gerrit for security update T304226
  • 15:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit1001 T304226 (duration: 00m 10s)
  • 15:13 hashar@deploy1002: Started deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit1001 T304226
  • 15:10 hashar: Upgrading and starting Gerrit on gerrit2001 (replica)
  • 15:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 15:06 hashar@deploy1002: Finished deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit2001 T304226 (duration: 00m 12s)
  • 15:06 hashar@deploy1002: Started deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit2001 T304226
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298557)', diff saved to https://phabricator.wikimedia.org/P22965 and previous config saved to /var/cache/conftool/dbconfig/20220322-144855-marostegui.json
  • 14:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 14:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298557)', diff saved to https://phabricator.wikimedia.org/P22964 and previous config saved to /var/cache/conftool/dbconfig/20220322-144847-marostegui.json
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22963 and previous config saved to /var/cache/conftool/dbconfig/20220322-143341-marostegui.json
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22962 and previous config saved to /var/cache/conftool/dbconfig/20220322-141836-marostegui.json
  • 13:52 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 13:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 13:44 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1004.eqiad.wmnet
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298557)', diff saved to https://phabricator.wikimedia.org/P22960 and previous config saved to /var/cache/conftool/dbconfig/20220322-134148-marostegui.json
  • 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:40 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 13:40 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.3 refs T300203
  • 13:36 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1004.eqiad.wmnet
  • 13:35 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1003.eqiad.wmnet
  • 13:33 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2001-dev.codfw.wmnet
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1003.eqiad.wmnet
  • 13:27 jnuche@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.3 refs T300203 (duration: 00m 52s)
  • 13:26 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.3 refs T300203
  • 13:26 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
  • 13:25 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2002-dev.codfw.wmnet
  • 13:20 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
  • 13:19 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudgw2002-dev.codfw.wmnet
  • 13:19 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
  • 12:54 moritzm: installing 5.10.103 kernels on servers running a kernel from buster backports T303179
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 12:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 12:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 12:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 12:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22959 and previous config saved to /var/cache/conftool/dbconfig/20220322-124117-root.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22958 and previous config saved to /var/cache/conftool/dbconfig/20220322-124109-root.json
  • 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 after testing', diff saved to https://phabricator.wikimedia.org/P22957 and previous config saved to /var/cache/conftool/dbconfig/20220322-123056-marostegui.json
  • 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22956 and previous config saved to /var/cache/conftool/dbconfig/20220322-122613-root.json
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22955 and previous config saved to /var/cache/conftool/dbconfig/20220322-122605-root.json
  • 12:24 marostegui: dbmaint s3@eqiad T300600
  • 12:24 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable WRITE BOTH on rest of s6 for templatelinks normalization (T299421) (duration: 00m 54s)
  • 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:21 marostegui: dbmaint s7@eqiad T300992
  • 12:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:18 marostegui: dbmaint s6@eqiad T300992
  • 12:17 marostegui: dbmaint s5@eqiad T300992
  • 12:16 marostegui: dbmaint s8@eqiad T300992
  • 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:12 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable WRITE BOTH for templatelinks normalization in wikitech (T299421) (duration: 01m 41s)
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22954 and previous config saved to /var/cache/conftool/dbconfig/20220322-121110-root.json
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22953 and previous config saved to /var/cache/conftool/dbconfig/20220322-121101-root.json
  • 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22952 and previous config saved to /var/cache/conftool/dbconfig/20220322-120123-marostegui.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22951 and previous config saved to /var/cache/conftool/dbconfig/20220322-115606-root.json
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22950 and previous config saved to /var/cache/conftool/dbconfig/20220322-115557-root.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22949 and previous config saved to /var/cache/conftool/dbconfig/20220322-114618-marostegui.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22948 and previous config saved to /var/cache/conftool/dbconfig/20220322-114102-root.json
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22946 and previous config saved to /var/cache/conftool/dbconfig/20220322-114051-root.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22945 and previous config saved to /var/cache/conftool/dbconfig/20220322-113113-marostegui.json
  • 11:31 marostegui: Reboot db1100 and db1123 for kernel upgrade before master swap
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 for reboot', diff saved to https://phabricator.wikimedia.org/P22944 and previous config saved to /var/cache/conftool/dbconfig/20220322-113003-marostegui.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1100 for reboot', diff saved to https://phabricator.wikimedia.org/P22943 and previous config saved to /var/cache/conftool/dbconfig/20220322-112931-marostegui.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22942 and previous config saved to /var/cache/conftool/dbconfig/20220322-111607-marostegui.json
  • 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:46 mmandere: pool cp1077 with HAProxy as TLS termination layer - T290005
  • 10:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1077.eqiad.wmnet with OS buster
  • 10:26 _joe_: running check-restart-php on api appservers
  • 10:22 _joe_: running check-and-restart on mw-eqiad-appservers
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22940 and previous config saved to /var/cache/conftool/dbconfig/20220322-101354-marostegui.json
  • 10:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22939 and previous config saved to /var/cache/conftool/dbconfig/20220322-101346-marostegui.json
  • 10:03 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.3 refs T300203
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22938 and previous config saved to /var/cache/conftool/dbconfig/20220322-095841-marostegui.json
  • 09:54 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
  • 09:54 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.3 refs T300203 (duration: 62m 07s)
  • 09:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
  • 09:46 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cloudcontrol1005.wikimedia.org with reason: dcaro testing backups
  • 09:46 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cloudcontrol1005.wikimedia.org with reason: dcaro testing backups
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22937 and previous config saved to /var/cache/conftool/dbconfig/20220322-094335-marostegui.json
  • 09:34 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1077.eqiad.wmnet with OS buster
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22936 and previous config saved to /var/cache/conftool/dbconfig/20220322-092830-marostegui.json
  • 09:25 mmandere: depool cp1077 for reimage - T290005
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P22935 and previous config saved to /var/cache/conftool/dbconfig/20220322-091718-root.json
  • 09:11 dcausse: restarted blazegraph on wdqs2002 (deadlocked)
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P22934 and previous config saved to /var/cache/conftool/dbconfig/20220322-090214-root.json
  • 08:59 XioNoX: drmrs propagate LVS med to core routers
  • 08:52 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.3 refs T300203
  • 08:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1008.eqiad.wmnet with OS bullseye
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P22933 and previous config saved to /var/cache/conftool/dbconfig/20220322-084710-root.json
  • 08:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
  • 08:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P22932 and previous config saved to /var/cache/conftool/dbconfig/20220322-083206-root.json
  • 08:19 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1008.eqiad.wmnet with OS bullseye
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22931 and previous config saved to /var/cache/conftool/dbconfig/20220322-081806-marostegui.json
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298557)', diff saved to https://phabricator.wikimedia.org/P22930 and previous config saved to /var/cache/conftool/dbconfig/20220322-081758-marostegui.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P22929 and previous config saved to /var/cache/conftool/dbconfig/20220322-081702-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 some more weight T301879', diff saved to https://phabricator.wikimedia.org/P22928 and previous config saved to /var/cache/conftool/dbconfig/20220322-080713-marostegui.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22927 and previous config saved to /var/cache/conftool/dbconfig/20220322-080253-marostegui.json
  • 07:57 urbanecm: UTC morning backport window completed
  • 07:57 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.2/extensions/GrowthExperiments/modules/ext.growthExperiments.MentorDashboard/MenteeOverview/MenteeOverviewPresets.js: 84877bd: MenteeOverviewPresets.getUsersToShow: Fix typo (T304353) (duration: 00m 49s)
  • 07:53 elukey: restart php-fpm on mw1449 - opcache full after deployment
  • 07:49 elukey: restart php-fpm on mw1448 - high cpu usage right after yesterday's deployment at 21 UTC
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22925 and previous config saved to /var/cache/conftool/dbconfig/20220322-074748-marostegui.json
  • 07:47 elukey: depool mw1448 manually on the node (high cpu usage from php-fpm)
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298557)', diff saved to https://phabricator.wikimedia.org/P22924 and previous config saved to /var/cache/conftool/dbconfig/20220322-073243-marostegui.json
  • 07:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8151bf2: Allow flooders to remove the group from themselves in viwiki (T303578) (duration: 00m 50s)
  • 07:21 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1007.eqiad.wmnet with OS bullseye
  • 07:17 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: caad5a4: wgCrossSiteAJAXdomains: Add foundationwiki and {ee,ge,punjabi}wikimedia (T300978) (duration: 00m 49s)
  • 07:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b4a9935: Create "editautopatrolprotected" protection level for viwiki (T303579) (duration: 00m 57s)
  • 07:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
  • 07:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
  • 06:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1007.eqiad.wmnet with OS bullseye
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T300775)', diff saved to https://phabricator.wikimedia.org/P22923 and previous config saved to /var/cache/conftool/dbconfig/20220322-064230-marostegui.json
  • 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300775)', diff saved to https://phabricator.wikimedia.org/P22922 and previous config saved to /var/cache/conftool/dbconfig/20220322-064222-marostegui.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298557)', diff saved to https://phabricator.wikimedia.org/P22921 and previous config saved to /var/cache/conftool/dbconfig/20220322-063223-marostegui.json
  • 06:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 06:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22920 and previous config saved to /var/cache/conftool/dbconfig/20220322-062717-marostegui.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 to s1 with minimal weight T301879', diff saved to https://phabricator.wikimedia.org/P22919 and previous config saved to /var/cache/conftool/dbconfig/20220322-062310-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 to dbctl T301879', diff saved to https://phabricator.wikimedia.org/P22918 and previous config saved to /var/cache/conftool/dbconfig/20220322-062140-marostegui.json
  • 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1175.eqiad.wmnet with OS bullseye
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22917 and previous config saved to /var/cache/conftool/dbconfig/20220322-061212-marostegui.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300775)', diff saved to https://phabricator.wikimedia.org/P22916 and previous config saved to /var/cache/conftool/dbconfig/20220322-055707-marostegui.json
  • 05:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage
  • 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage
  • 05:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye
  • 03:47 eileen: civicrm revision changed from 457adec4 to b6ceb722
  • 02:56 eileen: civicrm revision changed from 30c55f51 to 457adec4
  • 02:56 eileen: revision changed from 30c55f51 to 457adec4
  • 02:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 02:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 01:35 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 00:35 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye

2022-03-21

  • 23:52 eileen: civicrm revision changed from 52c45874 to 30c55f51
  • 22:29 ryankemper: T301955 Lifted downtime on relforge now that cluster upgrade is complete and cluster is back to green status
  • 22:26 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:04 reedy@deploy1002: Synchronized php-1.39.0-wmf.2/extensions/OATHAuth/: T304350 (duration: 00m 49s)
  • 22:03 reedy@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/OATHAuth/: T304350 (duration: 00m 49s)
  • 21:59 ryankemper: T301955 Downtimed relforge for 2 days; stuck in yellow status during upgrade b/c replica shards cannot be scheduled to a host of lower elasticsearch version than primary shards. Working on patch for our `rolling-operation` cookbook to disable replication during operation
  • 21:46 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 21:46 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 21:46 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 21:45 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 21:45 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 21:45 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 21:44 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 21:44 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 21:43 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 21:43 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 21:41 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 21:41 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 21:41 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 21:41 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 21:40 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 21:36 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 21:33 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 21:33 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 21:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 21:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 21:30 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 21:30 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 21:28 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 21:28 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 21:27 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 21:27 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 21:26 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 21:25 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 21:25 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 21:25 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 21:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 21:10 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 21:03 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:56 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.2 refs T300203
  • 20:52 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:49 dduvall@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.2 refs T300203 (duration: 00m 51s)
  • 20:49 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.2 refs T300203
  • 20:31 urbanecm: UTC late backport window completed
  • 20:29 urbanecm@deploy1002: Synchronized docroot/noc/db.php: 3bcccdc: Migrate away from $wmfDbconfigFromEtcd (T45956; 2/2) (duration: 00m 50s)
  • 20:29 urbanecm@deploy1002: Synchronized wmf-config/etcd.php: 3bcccdc: Migrate away from $wmfDbconfigFromEtcd (T45956; 1/2) (duration: 00m 50s)
  • 20:19 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 8347de5: ExtensionDistributor: Add REL1_38 (T304185) (duration: 00m 51s)
  • 19:48 brennen: mw1416: sudo -i /usr/local/sbin/restart-php7.2-fpm
  • 19:42 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.2 refs T300203
  • 19:26 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.2 refs T300203 (duration: 64m 33s)
  • 18:54 ebernhardson: T303548 start commonswiki reindexing on eqiad codfw and cloudelastic cirrus clusters
  • 18:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22906 and previous config saved to /var/cache/conftool/dbconfig/20220321-185042-marostegui.json
  • 18:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22905 and previous config saved to /var/cache/conftool/dbconfig/20220321-183537-marostegui.json
  • 18:22 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.2 refs T300203
  • 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22904 and previous config saved to /var/cache/conftool/dbconfig/20220321-182032-marostegui.json
  • 18:19 otto@deploy1002: Finished deploy [analytics/refinery@2175d63]: gobblin prometheus metrics for all jobs - T294420 (duration: 04m 41s)
  • 18:19 brennen: trainsperiment (T300203): 1.39.0-wmf.1 on all wikis; starting prep of wmf.2, will abort if needed
  • 18:15 otto@deploy1002: Started deploy [analytics/refinery@2175d63]: gobblin prometheus metrics for all jobs - T294420
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22903 and previous config saved to /var/cache/conftool/dbconfig/20220321-180526-marostegui.json
  • 18:04 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.1 refs T300203
  • 18:03 otto@deploy1002: Finished deploy [analytics/refinery@2175d63] (hadoop-test): gobblin prometheus metrics for all jobs - T294420 (duration: 07m 19s)
  • 17:59 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1021 for T302233
  • 17:59 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1020 for T302233
  • 17:57 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1016 for T302233
  • 17:55 otto@deploy1002: Started deploy [analytics/refinery@2175d63] (hadoop-test): gobblin prometheus metrics for all jobs - T294420
  • 17:53 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 17:51 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1017 for T302233
  • 17:49 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1013 for T302233
  • 17:49 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1018 for T302233
  • 17:46 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1014
  • 17:41 ryankemper: [WCQS Deploy] Test query passed on commons-query.wikimedia.org; WCQS deploy complete
  • 17:40 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@2b67de7] (wcqs): Deploy 0.3.107 to WCQS (duration: 02m 12s)
  • 17:38 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.107` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
  • 17:37 ryankemper@deploy1002: Started deploy [wdqs/wdqs@2b67de7] (wcqs): Deploy 0.3.107 to WCQS
  • 17:35 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1018 after same command without `--table` argument timed out waiting for `zhwiki_p.page`
  • 17:32 ryankemper: [Maps] Running puppet agent on rest of `maps*`: `ryankemper@cumin1001:~$ sudo -E cumin -b 4 'maps*' 'run-puppet-agent'`
  • 17:31 ryankemper: [Maps] Ran puppet agent on maps master `maps1009` to verify puppet patch works; looks like osm import was disabled as intended `Notice: /Stage[main]/Osm::Imposm3/Systemd::Service[imposm]/Service[imposm]/ensure: ensure changed 'running' to 'stopped'`
  • 17:26 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 17:25 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 17:25 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 17:22 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@2b67de7]: 0.3.107 (duration: 08m 26s)
  • 17:15 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.107` on canary `wdqs1003`; proceeding to rest of fleet
  • 17:14 ryankemper@deploy1002: Started deploy [wdqs/wdqs@2b67de7]: 0.3.107
  • 17:13 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.107`. Pre-deploy tests passing on canary `wdqs1003`
  • 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22902 and previous config saved to /var/cache/conftool/dbconfig/20220321-170731-marostegui.json
  • 17:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:58 brennen: trainsperiment (T300203): blockers currently cleared, will hold wmf.1 -> group2 until 18:00 UTC, per deployment calendar
  • 16:55 taavi@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/WikimediaEvents/includes/PageSplitter/PageSplitterHooks.php: Backport: PageSplitter: check for OutputPage::getTitle() returning null (T304331) (duration: 00m 50s)
  • 16:53 taavi@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaEvents/includes/PageSplitter/PageSplitterHooks.php: Backport: PageSplitter: check for OutputPage::getTitle() returning null (T304331) (duration: 00m 51s)
  • 16:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 16:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 16:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:46 razzi@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 16:44 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/Wikibase/repo/: Backport: Add display to wbsearchentities response even if empty (T104344) (duration: 00m 53s)
  • 16:15 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 16:14 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 16:13 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 16:13 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 16:12 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 16:12 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 16:11 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 16:11 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:11 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:11 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 16:11 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 16:10 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 16:10 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 16:10 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 16:10 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 16:09 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 16:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298557)', diff saved to https://phabricator.wikimedia.org/P22901 and previous config saved to /var/cache/conftool/dbconfig/20220321-160557-marostegui.json
  • 16:05 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 16:04 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 16:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 16:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 16:00 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 16:00 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 15:59 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 15:59 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:58 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 15:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:57 otto@deploy1002: Finished deploy [analytics/refinery@cd7bf7a] (hadoop-test): fix prometheus pushgateway url - T294420 (duration: 07m 18s)
  • 15:57 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:57 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 15:57 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 15:57 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 15:56 reedy@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: T304111 (duration: 00m 50s)
  • 15:56 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 15:56 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 15:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22900 and previous config saved to /var/cache/conftool/dbconfig/20220321-155052-marostegui.json
  • 15:50 otto@deploy1002: Started deploy [analytics/refinery@cd7bf7a] (hadoop-test): fix prometheus pushgateway url - T294420
  • 15:50 otto@deploy1002: Finished deploy [analytics/refinery@33f66db] (hadoop-test): fix prometheus pushgateway url - T294420 (duration: 00m 03s)
  • 15:50 otto@deploy1002: Started deploy [analytics/refinery@33f66db] (hadoop-test): fix prometheus pushgateway url - T294420
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T300775)', diff saved to https://phabricator.wikimedia.org/P22899 and previous config saved to /var/cache/conftool/dbconfig/20220321-154607-marostegui.json
  • 15:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300775)', diff saved to https://phabricator.wikimedia.org/P22898 and previous config saved to /var/cache/conftool/dbconfig/20220321-154559-marostegui.json
  • 15:44 reedy@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/Flow/maintenance: T304318 (duration: 00m 49s)
  • 15:44 razzi@cumin1001: START - Cookbook sre.wikireplicas.update-views
  • 15:43 reedy@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/Flow/maintenance: T304318 (duration: 00m 51s)
  • 15:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 15:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22897 and previous config saved to /var/cache/conftool/dbconfig/20220321-153547-marostegui.json
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22896 and previous config saved to /var/cache/conftool/dbconfig/20220321-153054-marostegui.json
  • 15:27 mmandere: pool cp1075 with HAProxy as TLS termination layer - T290005
  • 15:23 jnuche@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.1 refs T300203 (duration: 01m 54s)
  • 15:21 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.1 refs T300203
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298557)', diff saved to https://phabricator.wikimedia.org/P22895 and previous config saved to /var/cache/conftool/dbconfig/20220321-152041-marostegui.json
  • 15:19 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1075.eqiad.wmnet with OS buster
  • 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22894 and previous config saved to /var/cache/conftool/dbconfig/20220321-151549-marostegui.json
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298557)', diff saved to https://phabricator.wikimedia.org/P22893 and previous config saved to /var/cache/conftool/dbconfig/20220321-150417-marostegui.json
  • 15:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 15:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 15:01 otto@deploy1002: Finished deploy [analytics/refinery@33f66db] (hadoop-test): gobblin-wmf-core-1.0.1 - T297939 (duration: 07m 10s)
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300775)', diff saved to https://phabricator.wikimedia.org/P22891 and previous config saved to /var/cache/conftool/dbconfig/20220321-150044-marostegui.json
  • 14:58 XioNoX: asw2-b-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 2 member 5 - T304316
  • 14:54 otto@deploy1002: Started deploy [analytics/refinery@33f66db] (hadoop-test): gobblin-wmf-core-1.0.1 - T297939
  • 14:49 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
  • 14:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
  • 14:43 oblivian@puppetmaster1001: conftool action : set/enabled=false; selector: name=parameter_q,cluster=cache-text
  • 14:35 hashar: Restarting CI Zuul server
  • 14:31 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.1 refs T300203
  • 14:31 hashar: restarting Apache on gerrit2001 and gerrit1001
  • 14:30 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1075.eqiad.wmnet with OS buster
  • 14:28 hashar@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/Echo: Revert "Call IDatabase::timestamp before inserting rows" - T304307 (duration: 00m 52s)
  • 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 14:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298557)', diff saved to https://phabricator.wikimedia.org/P22890 and previous config saved to /var/cache/conftool/dbconfig/20220321-141922-marostegui.json
  • 14:11 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 14:09 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 14:07 mmandere: depool cp1075 - T290005
  • 14:07 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 14:05 mmandere: depool cp1074 - T290005
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22889 and previous config saved to /var/cache/conftool/dbconfig/20220321-140417-marostegui.json
  • 14:04 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 14:03 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 14:03 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 14:02 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 14:01 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 13:55 otto@deploy1002: Finished deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - T297939 (duration: 00m 06s)
  • 13:55 otto@deploy1002: Started deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - T297939
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22888 and previous config saved to /var/cache/conftool/dbconfig/20220321-134912-marostegui.json
  • 13:34 Lucas_WMDE: UTC afternoon backport window done
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298557)', diff saved to https://phabricator.wikimedia.org/P22887 and previous config saved to /var/cache/conftool/dbconfig/20220321-133407-marostegui.json
  • 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "ptwiki: Disable Growth's image recommendation" (T304095) (duration: 00m 49s)
  • 13:29 otto@deploy1002: Finished deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - T297939 (duration: 08m 53s)
  • 13:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused wgWMESearchRelevancePages config variable (duration: 00m 50s)
  • 13:20 otto@deploy1002: Started deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - T297939
  • 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Create "pagemover" group at azwiki (T303752) (duration: 00m 50s)
  • 13:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove changetags right from users on wikidatawiki and testwikidatawiki (T303682) (while keeping applychangetags right) (duration: 00m 49s)
  • 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
  • 12:48 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
  • 12:42 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.38.0-wmf.26"
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298557)', diff saved to https://phabricator.wikimedia.org/P22885 and previous config saved to /var/cache/conftool/dbconfig/20220321-123055-marostegui.json
  • 12:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298557)', diff saved to https://phabricator.wikimedia.org/P22884 and previous config saved to /var/cache/conftool/dbconfig/20220321-123042-marostegui.json
  • 12:22 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.1 refs T300203
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22883 and previous config saved to /var/cache/conftool/dbconfig/20220321-121537-marostegui.json
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22882 and previous config saved to /var/cache/conftool/dbconfig/20220321-120032-marostegui.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298557)', diff saved to https://phabricator.wikimedia.org/P22881 and previous config saved to /var/cache/conftool/dbconfig/20220321-114527-marostegui.json
  • 11:41 jnuche@deploy1002: Pruned MediaWiki: 1.38.0-wmf.25 (duration: 01m 32s)
  • 11:38 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@cbc85d3] (eqiad): Update kartotherian to 2ef5c2d (duration: 01m 40s)
  • 11:36 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@cbc85d3] (eqiad): Update kartotherian to 2ef5c2d
  • 11:36 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@cbc85d3] (codfw): Update kartotherian to 2ef5c2d (duration: 02m 51s)
  • 11:35 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.1 refs T300203 (duration: 81m 15s)
  • 11:33 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@cbc85d3] (codfw): Update kartotherian to 2ef5c2d
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298557)', diff saved to https://phabricator.wikimedia.org/P22880 and previous config saved to /var/cache/conftool/dbconfig/20220321-112217-marostegui.json
  • 11:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22879 and previous config saved to /var/cache/conftool/dbconfig/20220321-112210-marostegui.json
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22878 and previous config saved to /var/cache/conftool/dbconfig/20220321-110705-marostegui.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22877 and previous config saved to /var/cache/conftool/dbconfig/20220321-105159-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22876 and previous config saved to /var/cache/conftool/dbconfig/20220321-103654-marostegui.json
  • 10:13 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.1 refs T300203
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22875 and previous config saved to /var/cache/conftool/dbconfig/20220321-094614-marostegui.json
  • 09:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:32 hashar: 1.39.0-wmf.1 train is delayed due to a CI / npm build failure which is being resolved T300203
  • 09:08 dcausse: restarting blazegraph on wdqs2001 (stuck)
  • 09:07 moritzm: restarting FPM
  • 09:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 09:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298557)', diff saved to https://phabricator.wikimedia.org/P22874 and previous config saved to /var/cache/conftool/dbconfig/20220321-090250-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22873 and previous config saved to /var/cache/conftool/dbconfig/20220321-084745-marostegui.json
  • 08:43 hashar: Train blocked due to a npm checksum mismatch preventing CI from merging in the mediawiki/core 1.39.0-wmf.1 change which create the branch. T304286
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22872 and previous config saved to /var/cache/conftool/dbconfig/20220321-083240-marostegui.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22871 and previous config saved to /var/cache/conftool/dbconfig/20220321-083050-root.json
  • 08:23 dcausse: restarting blazegraph on wdqs2003 (stuck for 16 hours)
  • 08:19 moritzm: installing openssl security updates
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298557)', diff saved to https://phabricator.wikimedia.org/P22870 and previous config saved to /var/cache/conftool/dbconfig/20220321-081735-marostegui.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22869 and previous config saved to /var/cache/conftool/dbconfig/20220321-081546-root.json
  • 08:09 dcausse: restarting blazegraph on wdqs2004 and wdqs2002 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22868 and previous config saved to /var/cache/conftool/dbconfig/20220321-080042-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22867 and previous config saved to /var/cache/conftool/dbconfig/20220321-074538-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22866 and previous config saved to /var/cache/conftool/dbconfig/20220321-073033-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298557)', diff saved to https://phabricator.wikimedia.org/P22865 and previous config saved to /var/cache/conftool/dbconfig/20220321-072902-marostegui.json
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22864 and previous config saved to /var/cache/conftool/dbconfig/20220321-072854-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22863 and previous config saved to /var/cache/conftool/dbconfig/20220321-071349-marostegui.json
  • 07:12 urbanecm: UTC morning B&C done
  • 07:08 urbanecm: Create `wikishared.cx_significant_edits` and `wikishared.cx_section_translation` at x1 (T302371; `mwscript sql.php --wiki=aawiki --wikidb=wikishared --cluster=extension1 /srv/mediawiki-staging/php-1.38.0-wmf.26/extensions/ContentTranslation/sql/{section-translations,significant-edits}.sql)`)
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22862 and previous config saved to /var/cache/conftool/dbconfig/20220321-065844-marostegui.json
  • 06:43 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1175.eqiad.wmnet with OS bullseye
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22861 and previous config saved to /var/cache/conftool/dbconfig/20220321-064339-marostegui.json
  • 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye
  • 06:19 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1175.eqiad.wmnet with OS bullseye
  • 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye
  • 05:52 marostegui: dbmaint s5@eqiad T300600
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 reimage T300600', diff saved to https://phabricator.wikimedia.org/P22860 and previous config saved to /var/cache/conftool/dbconfig/20220321-055202-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22859 and previous config saved to /var/cache/conftool/dbconfig/20220321-054838-marostegui.json
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P22858 and previous config saved to /var/cache/conftool/dbconfig/20220321-054358-marostegui.json
  • 05:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance

2022-03-20

  • 23:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T300775)', diff saved to https://phabricator.wikimedia.org/P22857 and previous config saved to /var/cache/conftool/dbconfig/20220320-234358-marostegui.json
  • 23:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300775)', diff saved to https://phabricator.wikimedia.org/P22856 and previous config saved to /var/cache/conftool/dbconfig/20220320-234350-marostegui.json
  • 23:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22855 and previous config saved to /var/cache/conftool/dbconfig/20220320-232845-marostegui.json
  • 23:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22854 and previous config saved to /var/cache/conftool/dbconfig/20220320-231340-marostegui.json
  • 22:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300775)', diff saved to https://phabricator.wikimedia.org/P22853 and previous config saved to /var/cache/conftool/dbconfig/20220320-225835-marostegui.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T300775)', diff saved to https://phabricator.wikimedia.org/P22850 and previous config saved to /var/cache/conftool/dbconfig/20220320-081713-marostegui.json
  • 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300775)', diff saved to https://phabricator.wikimedia.org/P22849 and previous config saved to /var/cache/conftool/dbconfig/20220320-081705-marostegui.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22848 and previous config saved to /var/cache/conftool/dbconfig/20220320-080200-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22847 and previous config saved to /var/cache/conftool/dbconfig/20220320-074655-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300775)', diff saved to https://phabricator.wikimedia.org/P22846 and previous config saved to /var/cache/conftool/dbconfig/20220320-073150-marostegui.json

2022-03-19

  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T300775)', diff saved to https://phabricator.wikimedia.org/P22845 and previous config saved to /var/cache/conftool/dbconfig/20220319-171757-marostegui.json
  • 17:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 17:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300775)', diff saved to https://phabricator.wikimedia.org/P22844 and previous config saved to /var/cache/conftool/dbconfig/20220319-171749-marostegui.json
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22843 and previous config saved to /var/cache/conftool/dbconfig/20220319-170244-marostegui.json
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22842 and previous config saved to /var/cache/conftool/dbconfig/20220319-164739-marostegui.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300775)', diff saved to https://phabricator.wikimedia.org/P22841 and previous config saved to /var/cache/conftool/dbconfig/20220319-163234-marostegui.json
  • 13:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 13:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 13:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 13:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 13:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 13:23 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=piwiki --move-talk --fix # T304201
  • 13:20 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 04:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 04:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 04:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 03:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 03:51 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 03:29 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 03:28 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 03:28 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 03:28 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 03:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 02:52 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 02:27 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 02:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T300775)', diff saved to https://phabricator.wikimedia.org/P22839 and previous config saved to /var/cache/conftool/dbconfig/20220319-015847-marostegui.json
  • 01:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300775)', diff saved to https://phabricator.wikimedia.org/P22838 and previous config saved to /var/cache/conftool/dbconfig/20220319-015839-marostegui.json
  • 01:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 01:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 01:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22837 and previous config saved to /var/cache/conftool/dbconfig/20220319-014334-marostegui.json
  • 01:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 01:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22836 and previous config saved to /var/cache/conftool/dbconfig/20220319-012829-marostegui.json
  • 01:23 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300775)', diff saved to https://phabricator.wikimedia.org/P22835 and previous config saved to /var/cache/conftool/dbconfig/20220319-011324-marostegui.json
  • 00:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 00:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye

2022-03-18

  • 21:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 21:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 21:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 15:38 jayme: powercycle kubernetes1002
  • 14:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:26 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/FlaggedRevs/backend/FlaggedRevs.php: Backport: Don't pass the revision to PO access service (T304127) (duration: 00m 49s)
  • 14:12 XioNoX: configure NAT for civi1002 - T304098
  • 14:02 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 14:02 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 14:01 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 14:01 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 13:59 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 13:59 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 13:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test sync - jbond@cumin1001"
  • 13:07 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test sync - jbond@cumin1001"
  • 13:02 moritzm: imported python3.5 3.5.3-1+deb9u5+wmf1 to component/python35 T303801
  • 12:35 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 11:35 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 11:33 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 11:32 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 11:30 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 11:29 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 11:28 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 11:09 vgutierrez: rolling restart of nginx on ncredir instances to catch up on OpenSSL updates
  • 11:05 vgutierrez: restarting acme-chief and acme-chief API services to catch up on OpenSSL updates
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 10:54 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 10:52 akosiaris: drain kubernetes200[1-4] T303045
  • 10:51 akosiaris: depool kubernetes200[1-4] T303045
  • 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2004.codfw.wmnet
  • 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2003.codfw.wmnet
  • 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2002.codfw.wmnet
  • 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2001.codfw.wmnet
  • 10:01 akosiaris: drain kubernetes100[1-4] T303044
  • 09:54 akosiaris: depool kubernetes100[1-4] from pybal T303044
  • 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1004.eqiad.wmnet
  • 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1003.eqiad.wmnet
  • 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1002.eqiad.wmnet
  • 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1001.eqiad.wmnet
  • 09:42 akosiaris: uncordon kubernetes1018-1022. T293728. Nodes are live, ready to receive workloads and traffic.
  • 09:37 akosiaris: pool kubernetes1018-1022 in pybal. T293728
  • 09:37 akosiaris: pool kubernetes1018-1022 in pybal.
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1022.eqiad.wmnet
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1021.eqiad.wmnet
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1020.eqiad.wmnet
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1019.eqiad.wmnet
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1018.eqiad.wmnet
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T300775)', diff saved to https://phabricator.wikimedia.org/P22827 and previous config saved to /var/cache/conftool/dbconfig/20220318-093543-marostegui.json
  • 09:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1022.eqiad.wmnet
  • 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1021.eqiad.wmnet
  • 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1020.eqiad.wmnet
  • 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1019.eqiad.wmnet
  • 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1018.eqiad.wmnet
  • 09:10 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:08 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298557)', diff saved to https://phabricator.wikimedia.org/P22826 and previous config saved to /var/cache/conftool/dbconfig/20220318-085517-marostegui.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22825 and previous config saved to /var/cache/conftool/dbconfig/20220318-084012-marostegui.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22824 and previous config saved to /var/cache/conftool/dbconfig/20220318-082507-marostegui.json
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298557)', diff saved to https://phabricator.wikimedia.org/P22823 and previous config saved to /var/cache/conftool/dbconfig/20220318-081002-marostegui.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22822 and previous config saved to /var/cache/conftool/dbconfig/20220318-072852-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298557)', diff saved to https://phabricator.wikimedia.org/P22821 and previous config saved to /var/cache/conftool/dbconfig/20220318-071758-marostegui.json
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298557)', diff saved to https://phabricator.wikimedia.org/P22820 and previous config saved to /var/cache/conftool/dbconfig/20220318-071750-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22819 and previous config saved to /var/cache/conftool/dbconfig/20220318-071348-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22818 and previous config saved to /var/cache/conftool/dbconfig/20220318-070245-marostegui.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22817 and previous config saved to /var/cache/conftool/dbconfig/20220318-065844-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22816 and previous config saved to /var/cache/conftool/dbconfig/20220318-064740-marostegui.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22815 and previous config saved to /var/cache/conftool/dbconfig/20220318-064340-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22814 and previous config saved to /var/cache/conftool/dbconfig/20220318-063631-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22813 and previous config saved to /var/cache/conftool/dbconfig/20220318-063524-root.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298557)', diff saved to https://phabricator.wikimedia.org/P22812 and previous config saved to /var/cache/conftool/dbconfig/20220318-063235-marostegui.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22811 and previous config saved to /var/cache/conftool/dbconfig/20220318-062836-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22810 and previous config saved to /var/cache/conftool/dbconfig/20220318-062127-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22809 and previous config saved to /var/cache/conftool/dbconfig/20220318-062020-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22808 and previous config saved to /var/cache/conftool/dbconfig/20220318-061332-root.json
  • 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1179.eqiad.wmnet with OS bullseye
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22807 and previous config saved to /var/cache/conftool/dbconfig/20220318-060623-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22806 and previous config saved to /var/cache/conftool/dbconfig/20220318-060516-root.json
  • 05:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1179.eqiad.wmnet with reason: host reimage
  • 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1179.eqiad.wmnet with reason: host reimage
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22805 and previous config saved to /var/cache/conftool/dbconfig/20220318-055119-root.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22804 and previous config saved to /var/cache/conftool/dbconfig/20220318-055012-root.json
  • 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1179.eqiad.wmnet with OS bullseye
  • 05:39 marostegui: dbmaint on s3@eqiad T300600
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 reimage T300600', diff saved to https://phabricator.wikimedia.org/P22803 and previous config saved to /var/cache/conftool/dbconfig/20220318-053832-marostegui.json
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 10%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22802 and previous config saved to /var/cache/conftool/dbconfig/20220318-053615-root.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22801 and previous config saved to /var/cache/conftool/dbconfig/20220318-053508-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298557)', diff saved to https://phabricator.wikimedia.org/P22800 and previous config saved to /var/cache/conftool/dbconfig/20220318-053443-marostegui.json
  • 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 01:23 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 01:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 01:14 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye

2022-03-17

  • 22:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:36 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.26 refs T300202
  • 22:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:28 derick@deploy1002: Synchronized wmf-config/MetaContactPages.php: Config: Add new field to capture application URL link on Meta (duration: 00m 50s)
  • 22:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:17 derick@deploy1002: Finished scap: Backport: Add & improve message for the chapter/thorg application contact form (duration: 11m 37s)
  • 22:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:05 derick@deploy1002: Started scap: Backport: Add & improve message for the chapter/thorg application contact form
  • 22:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:00 brennen@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "Revert "Revert "Enable Parsoid API everywhere""" (duration: 00m 51s)
  • 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:48 brennen@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "Revert "Enable Parsoid API everywhere"" (duration: 00m 51s)
  • 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:45 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 21:42 rzl@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 21:42 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 21:42 rzl@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 21:42 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 21:41 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 21:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 21:41 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 21:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 21:40 rzl@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 21:35 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 21:26 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 21:26 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 21:26 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:26 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 21:25 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 21:25 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:25 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 21:25 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:23 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 21:21 cjming@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/T299104.php: Backport: Update invalid skin preference update script (T299104) (duration: 00m 51s)
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:11 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.26 refs T300202 (duration: 00m 50s)
  • 21:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.26 refs T300202
  • 20:57 ladsgroup@deploy1002: Finished scap: Revert "rdbms: Followups to automatic connection recovery patch" (duration: 11m 50s)
  • 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:45 ladsgroup@deploy1002: Started scap: Revert "rdbms: Followups to automatic connection recovery patch"
  • 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300775)', diff saved to https://phabricator.wikimedia.org/P22798 and previous config saved to /var/cache/conftool/dbconfig/20220317-204128-marostegui.json
  • 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:35 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:29 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "Enable Parsoid API everywhere" (T302081) (duration: 00m 50s)
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22797 and previous config saved to /var/cache/conftool/dbconfig/20220317-202623-marostegui.json
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22796 and previous config saved to /var/cache/conftool/dbconfig/20220317-201118-marostegui.json
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300775)', diff saved to https://phabricator.wikimedia.org/P22795 and previous config saved to /var/cache/conftool/dbconfig/20220317-195613-marostegui.json
  • 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:55 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 18:53 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.26/skins/Vector/includes/Hooks.php: Backport: Fix updateUserLinksDropdownItems not being called (T304002) (duration: 00m 50s)
  • 18:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:27 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 18:18 akosiaris: cordon kubernetes10{18..22} T293728
  • 18:12 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 18:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:47 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:46 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:41 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:41 arturo: uploaded prometheus-openstack-exporter 0.0.8-4~wmf1 to bullseye-wikimedia (T302178)
  • 17:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 17:36 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS bullseye
  • 17:35 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1019.eqiad.wmnet with OS bullseye
  • 17:34 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephmon1003.eqiad.wmnet on all recursors
  • 17:34 dcaro@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcephmon1003.eqiad.wmnet on all recursors
  • 17:33 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1020.eqiad.wmnet with OS bullseye
  • 17:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1018.eqiad.wmnet with OS bullseye
  • 17:28 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:28 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:28 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync
  • 17:28 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: sync
  • 17:27 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:25 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
  • 17:25 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
  • 17:25 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:24 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:24 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:23 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
  • 17:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
  • 17:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:21 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:21 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
  • 17:21 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:21 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:20 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:20 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
  • 17:20 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
  • 17:20 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
  • 17:18 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
  • 17:18 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:18 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
  • 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:15 dancy@deploy1002: Synchronized README: testing mediawiki image build (duration: 02m 11s)
  • 17:11 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:10 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:09 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1020.eqiad.wmnet with OS bullseye
  • 17:09 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1020.eqiad.wmnet with OS bullseye
  • 17:09 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 17:08 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 17:08 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS bullseye
  • 17:08 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1021.eqiad.wmnet with OS bullseye
  • 17:07 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1019.eqiad.wmnet with OS bullseye
  • 17:06 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1019.eqiad.wmnet with OS bullseye
  • 17:06 bblack: geodns - Cyprus routed to new drmrs edge DC (first live users!) - will phase in over the standard 10 minute DNS TTL
  • 17:05 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1018.eqiad.wmnet with OS bullseye
  • 17:04 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1018.eqiad.wmnet with OS bullseye
  • 17:03 volans: restart atftp on install1003
  • 17:01 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:00 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:00 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:50 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:48 XioNoX: disable BGP to Lumen in codfw for fiber move
  • 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298557)', diff saved to https://phabricator.wikimedia.org/P22794 and previous config saved to /var/cache/conftool/dbconfig/20220317-164228-marostegui.json
  • 16:42 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 16:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:40 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:36 moritzm: restarting LDAP replicas for openssl update
  • 16:35 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephmon1003.eqiad.wmnet on all recursors
  • 16:35 dcaro@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcephmon1003.eqiad.wmnet on all recursors
  • 16:35 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephmon1003.eqiad.wmnet on all recursors
  • 16:35 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcephmon1003.eqiad.wmnet on all recursors
  • 16:34 ryankemper: [WDQS] Pooled `wdqs2001` (caught up on lag)
  • 16:31 andrewbogott: sudo service networking restart on puppetmaster1003
  • 16:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22793 and previous config saved to /var/cache/conftool/dbconfig/20220317-162723-marostegui.json
  • 16:15 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22792 and previous config saved to /var/cache/conftool/dbconfig/20220317-161218-marostegui.json
  • 16:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:10 XioNoX: pfw3-codfw move traffic to cr2 uplink
  • 16:05 oblivian@puppetmaster1001: conftool action : edit; selector: name=random_q
  • 16:04 ryankemper: [WDQS] Depooled `wdqs2001` (~4.85 hours of lag to catch up)
  • 16:03 ryankemper: [WDQS] `ryankemper@wdqs2001:~$ sudo systemctl restart wdqs-blazegraph.service`
  • 16:03 ryankemper: [WDQS] Pooled `wdqs2003` (caught up on lag)
  • 16:00 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:00 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:00 moritzm: restarting apache on logstash*
  • 15:57 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 60980ce: ptwiki: Disable Growth image recommendation (T302828) (duration: 00m 53s)
  • 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298557)', diff saved to https://phabricator.wikimedia.org/P22790 and previous config saved to /var/cache/conftool/dbconfig/20220317-155713-marostegui.json
  • 15:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:46 XioNoX: cr1-codfw move xe-5/2/0 to xe-1/0/1:1 - T289241
  • 15:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:34 moritzm: restarting FPM on mw canaries
  • 15:31 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 15:31 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS bullseye
  • 15:30 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1020.eqiad.wmnet with OS bullseye
  • 15:07 XioNoX: disable BGP to Telia in codfw for fiber move - T289241
  • 15:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1019.eqiad.wmnet with OS bullseye
  • 15:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1018.eqiad.wmnet with OS bullseye
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298557)', diff saved to https://phabricator.wikimedia.org/P22789 and previous config saved to /var/cache/conftool/dbconfig/20220317-145716-marostegui.json
  • 14:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 14:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298557)', diff saved to https://phabricator.wikimedia.org/P22788 and previous config saved to /var/cache/conftool/dbconfig/20220317-145708-marostegui.json
  • 14:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22785 and previous config saved to /var/cache/conftool/dbconfig/20220317-144203-marostegui.json
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22784 and previous config saved to /var/cache/conftool/dbconfig/20220317-142658-marostegui.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298557)', diff saved to https://phabricator.wikimedia.org/P22783 and previous config saved to /var/cache/conftool/dbconfig/20220317-141152-marostegui.json
  • 14:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1067.eqiad.wmnet with reason: T303151
  • 14:05 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1067.eqiad.wmnet with reason: T303151
  • 14:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1063.eqiad.wmnet with reason: T303151
  • 14:05 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1063.eqiad.wmnet with reason: T303151
  • 13:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:43 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:17 Lucas_WMDE: UTC afternoon backport window done
  • 13:16 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: commonswiki: Add pictures.snsb.info to wgCopyUploadsDomains allowlist (T303929) (duration: 00m 50s)
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298557)', diff saved to https://phabricator.wikimedia.org/P22782 and previous config saved to /var/cache/conftool/dbconfig/20220317-131227-marostegui.json
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298557)', diff saved to https://phabricator.wikimedia.org/P22781 and previous config saved to /var/cache/conftool/dbconfig/20220317-131220-marostegui.json
  • 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Write "unexpectedUnconnectedPage" page prop on Beta – no expected behavior change in production (3/3) (duration: 00m 49s)
  • 13:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Write "unexpectedUnconnectedPage" page prop on Beta – no expected behavior change in production (2/3) (duration: 00m 49s)
  • 13:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Write "unexpectedUnconnectedPage" page prop on Beta – no expected behavior change in production (1/3) (duration: 00m 53s)
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22780 and previous config saved to /var/cache/conftool/dbconfig/20220317-125715-marostegui.json
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22779 and previous config saved to /var/cache/conftool/dbconfig/20220317-124209-marostegui.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298557)', diff saved to https://phabricator.wikimedia.org/P22778 and previous config saved to /var/cache/conftool/dbconfig/20220317-122704-marostegui.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22777 and previous config saved to /var/cache/conftool/dbconfig/20220317-120700-root.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22776 and previous config saved to /var/cache/conftool/dbconfig/20220317-115156-root.json
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22775 and previous config saved to /var/cache/conftool/dbconfig/20220317-115012-root.json
  • 11:42 volans: upgrades spicerack on cumin hosts to v2.3.3
  • 11:41 volans: uploaded spicerack_2.3.3 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22774 and previous config saved to /var/cache/conftool/dbconfig/20220317-113652-root.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22773 and previous config saved to /var/cache/conftool/dbconfig/20220317-113508-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298557)', diff saved to https://phabricator.wikimedia.org/P22772 and previous config saved to /var/cache/conftool/dbconfig/20220317-112921-marostegui.json
  • 11:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 11:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298557)', diff saved to https://phabricator.wikimedia.org/P22771 and previous config saved to /var/cache/conftool/dbconfig/20220317-112913-marostegui.json
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22770 and previous config saved to /var/cache/conftool/dbconfig/20220317-112148-root.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22769 and previous config saved to /var/cache/conftool/dbconfig/20220317-112004-root.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22768 and previous config saved to /var/cache/conftool/dbconfig/20220317-111408-marostegui.json
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22767 and previous config saved to /var/cache/conftool/dbconfig/20220317-110645-root.json
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298556)', diff saved to https://phabricator.wikimedia.org/P22766 and previous config saved to /var/cache/conftool/dbconfig/20220317-110536-marostegui.json
  • 11:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22765 and previous config saved to /var/cache/conftool/dbconfig/20220317-105903-marostegui.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22764 and previous config saved to /var/cache/conftool/dbconfig/20220317-105349-marostegui.json
  • 10:50 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-fe[1005-1008].eqiad.wmnet
  • 10:47 marostegui: dbmaint on s3@eqiad T298556
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298557)', diff saved to https://phabricator.wikimedia.org/P22763 and previous config saved to /var/cache/conftool/dbconfig/20220317-104358-marostegui.json
  • 10:40 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298556)', diff saved to https://phabricator.wikimedia.org/P22762 and previous config saved to /var/cache/conftool/dbconfig/20220317-103844-marostegui.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298556)', diff saved to https://phabricator.wikimedia.org/P22761 and previous config saved to /var/cache/conftool/dbconfig/20220317-103726-marostegui.json
  • 10:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 10:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298556)', diff saved to https://phabricator.wikimedia.org/P22760 and previous config saved to /var/cache/conftool/dbconfig/20220317-103719-marostegui.json
  • 10:31 mvernon@cumin1001: START - Cookbook sre.dns.netbox
  • 10:26 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-fe[1005-1008].eqiad.wmnet
  • 10:24 marostegui: dbmaint on s3@codfw T298556
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22759 and previous config saved to /var/cache/conftool/dbconfig/20220317-102214-marostegui.json
  • 10:10 marostegui: dbmaint on s7@eqiad T298556
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22758 and previous config saved to /var/cache/conftool/dbconfig/20220317-100709-marostegui.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298556)', diff saved to https://phabricator.wikimedia.org/P22757 and previous config saved to /var/cache/conftool/dbconfig/20220317-095204-marostegui.json
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298556)', diff saved to https://phabricator.wikimedia.org/P22756 and previous config saved to /var/cache/conftool/dbconfig/20220317-095044-marostegui.json
  • 09:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298557)', diff saved to https://phabricator.wikimedia.org/P22755 and previous config saved to /var/cache/conftool/dbconfig/20220317-094025-marostegui.json
  • 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298557)', diff saved to https://phabricator.wikimedia.org/P22754 and previous config saved to /var/cache/conftool/dbconfig/20220317-094017-marostegui.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22752 and previous config saved to /var/cache/conftool/dbconfig/20220317-092512-marostegui.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P22751 and previous config saved to /var/cache/conftool/dbconfig/20220317-091911-marostegui.json
  • 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22750 and previous config saved to /var/cache/conftool/dbconfig/20220317-091007-marostegui.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298557)', diff saved to https://phabricator.wikimedia.org/P22749 and previous config saved to /var/cache/conftool/dbconfig/20220317-085502-marostegui.json
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clarakosi out of all services on: 1881 hosts
  • 08:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Clarakosi out of all services on: 1881 hosts
  • 08:24 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: 0da40c2: throttle: Remove expired rules (duration: 00m 50s)
  • 08:23 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: 980ea35: Throttle: Increase limit for English Wikipedia (T304016) (duration: 00m 51s)
  • 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ppchelko out of all services on: 1881 hosts
  • 08:12 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ppchelko out of all services on: 1881 hosts
  • 08:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Accraze out of all services on: 1881 hosts
  • 08:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Accraze out of all services on: 1881 hosts
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22748 and previous config saved to /var/cache/conftool/dbconfig/20220317-080705-root.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298557)', diff saved to https://phabricator.wikimedia.org/P22747 and previous config saved to /var/cache/conftool/dbconfig/20220317-075350-marostegui.json
  • 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22746 and previous config saved to /var/cache/conftool/dbconfig/20220317-075201-root.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22745 and previous config saved to /var/cache/conftool/dbconfig/20220317-073658-root.json
  • 07:31 marostegui: dbmaint on s5@eqiad T297189
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22744 and previous config saved to /var/cache/conftool/dbconfig/20220317-072154-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22743 and previous config saved to /var/cache/conftool/dbconfig/20220317-071200-root.json
  • 07:11 ryankemper: [WDQS] Depooled `wdqs2003` (8 hours of lag to catch up on)
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22742 and previous config saved to /var/cache/conftool/dbconfig/20220317-070650-root.json
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 06:57 ryankemper: [WDQS] Also of note is the spiking thread counts on the affected hosts: https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1647457172391&to=1647500081971&viewPanel=22
  • 06:57 ryankemper: [WDQS] Note that per https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1647457172391&to=1647500081971&viewPanel=7 `wdqs2003` has been offline for ~6 hours, `wdqs2001` for 1.5 hours and `wdqs2004` just recently.
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22741 and previous config saved to /var/cache/conftool/dbconfig/20220317-065656-root.json
  • 06:54 ryankemper: [WDQS] `ryankemper@wdqs2003:~$ sudo systemctl restart wdqs-blazegraph.service`
  • 06:53 ryankemper: [WDQS] `ryankemper@wdqs2001:~$ sudo systemctl restart wdqs-blazegraph.service`
  • 06:50 elukey: restart blazegraph on wdqs2004
  • 06:46 elukey: kill remaining hanging processes for ppche*lko and accra*ze on an-test-client1001 to allow users offboard (puppet broken)
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22740 and previous config saved to /var/cache/conftool/dbconfig/20220317-064152-root.json
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22739 and previous config saved to /var/cache/conftool/dbconfig/20220317-062648-root.json
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22738 and previous config saved to /var/cache/conftool/dbconfig/20220317-061144-root.json
  • 04:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T300775)', diff saved to https://phabricator.wikimedia.org/P22737 and previous config saved to /var/cache/conftool/dbconfig/20220317-040634-marostegui.json
  • 04:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 04:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 02:57 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 02:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 02:07 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 01:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye

2022-03-16

  • 23:52 tzatziki: Removing two files for legal compliance
  • 21:17 cjming: end running skin update preference maintenance script
  • 20:52 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [no-op] 8efa537: GrowthExperiments: Set GEWelcomeSurveyShowMailingListQuestion (T303240) (duration: 00m 53s)
  • 20:38 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/: 9ba157b: Add insert option for update skin preferences script (T299104) (duration: 00m 50s)
  • 20:34 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/WikimediaMaintenance/: ebfc516: Add script to update vector skin preferences (T299104) (duration: 00m 51s)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:24 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 20:13 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:13 urbanecm@deploy1002: Synchronized docroot/noc/db.php: f649199: Migrate wmfDatacenter(s) to wmgDatacenter(s) (T45956; 3/3) (duration: 00m 49s)
  • 20:12 urbanecm@deploy1002: Synchronized multiversion/: f649199: Migrate wmfDatacenter(s) to wmgDatacenter(s) (T45956; 2/3) (duration: 00m 50s)
  • 20:11 urbanecm@deploy1002: Synchronized wmf-config/: f649199: Migrate wmfDatacenter(s) to wmgDatacenter(s) (T45956; 1/3) (duration: 00m 50s)
  • 19:22 otto@deploy1002: Finished deploy [analytics/refinery@2d2056a] (hadoop-test): (no justification provided) (duration: 07m 50s)
  • 19:14 otto@deploy1002: Started deploy [analytics/refinery@2d2056a] (hadoop-test): (no justification provided)
  • 18:32 sukhe: running: homer "cr*-drmrs*" commit "Gerrit 771359: Set up BGP peering in drmrs for Wikidough."
  • 18:09 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics_test@257960f] (duration: 00m 08s)
  • 18:09 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics_test@257960f]
  • 18:02 aqu@deploy1002: Finished deploy [airflow-dags/analytics@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics@257960f] (duration: 00m 08s)
  • 18:02 aqu@deploy1002: Started deploy [airflow-dags/analytics@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics@257960f]
  • 18:00 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
  • 18:00 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
  • 17:36 dancy@deploy1002: Synchronized multiversion/MWMultiVersion.php: Config: mwscript: Support --force-version flag (T303878) (duration: 00m 57s)
  • 17:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 17:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 17:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 17:13 aqu@deploy1002: Finished deploy [analytics/refinery@d039471] (hadoop-test): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471] (duration: 07m 23s)
  • 17:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster
  • 17:06 aqu@deploy1002: Started deploy [analytics/refinery@d039471] (hadoop-test): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471]
  • 17:06 aqu@deploy1002: Finished deploy [analytics/refinery@d039471] (thin): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471] (duration: 00m 07s)
  • 17:06 aqu@deploy1002: Started deploy [analytics/refinery@d039471] (thin): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471]
  • 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:48 aqu@deploy1002: Finished deploy [analytics/refinery@d039471]: Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471] (duration: 25m 49s)
  • 16:45 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 16:37 Emperor: rolling restart of ms-fe10[09-12] so they know about removal of older proxies T303733
  • 16:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
  • 16:28 Emperor: moving swiftrepl and stats reporter host from ms-fe1005 to ms-fe1009 T303733
  • 16:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
  • 16:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22734 and previous config saved to /var/cache/conftool/dbconfig/20220316-162721-marostegui.json
  • 16:22 aqu@deploy1002: Started deploy [analytics/refinery@d039471]: Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471]
  • 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22733 and previous config saved to /var/cache/conftool/dbconfig/20220316-161216-marostegui.json
  • 16:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS buster
  • 16:02 aqu: analytics/refinery - scap deply "Migrate session_length/daily from Oozie to Airflow"
  • 15:59 dancy@deploy1002: Synchronized README: testing mediawiki image build (duration: 02m 11s)
  • 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22732 and previous config saved to /var/cache/conftool/dbconfig/20220316-155711-marostegui.json
  • 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298557)', diff saved to https://phabricator.wikimedia.org/P22731 and previous config saved to /var/cache/conftool/dbconfig/20220316-155300-marostegui.json
  • 15:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 15:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 15:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 15:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS buster
  • 15:51 moritzm: restarting exim/spamasassin on MXes to pick up new OpenSSL
  • 15:49 urbanecm@deploy1002: Synchronized wmf-config/logos.php: cswiki celebration logo (duration: 00m 49s)
  • 15:46 urbanecm@deploy1002: Synchronized static/images/project-logos/: cswiki celebration logos (duration: 00m 50s)
  • 15:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:43 dancy@deploy1002: scap failed: RuntimeError dictionary changed size during iteration (duration: 25m 55s)
  • 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22730 and previous config saved to /var/cache/conftool/dbconfig/20220316-154206-marostegui.json
  • 15:38 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 15:37 ryankemper: [WCQS] Restarted updater across fleet to get out jvm sec upgrades: `ryankemper@cumin1001:~$ sudo -E cumin 'wcqs*' 'systemctl restart wcqs-updater.service'`
  • 15:35 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 15:35 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 15:17 dancy@deploy1002: Started scap: testing mediawiki image build
  • 15:15 dancy@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/bin/make -C /srv/mwbuilder/release/make-container-image -f Makefile build-and-push-all-images http_proxy=http://webproxy.eqiad.wmnet:8080 https_proxy=http://webproxy.eqiad.wmnet:8080 GIT_BASE=https://gerrit.wikimedia.org/r/ BRANCH=master workdir_volume=/srv/mediawiki-staging mv_image_name=docker-registry.discovery.wmnet/restricted/mediaw
  • 15:12 dancy@deploy1002: Started scap: (no justification provided)
  • 15:11 dancy: Testing mediawiki image build on deploy server again
  • 15:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
  • 15:08 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
  • 15:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 15:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 15:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 15:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298557)', diff saved to https://phabricator.wikimedia.org/P22729 and previous config saved to /var/cache/conftool/dbconfig/20220316-150433-marostegui.json
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22728 and previous config saved to /var/cache/conftool/dbconfig/20220316-145946-marostegui.json
  • 14:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 14:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 14:55 sukhe: rolling restart of nginx.service on durum* hosts for OpenSSL updates
  • 14:55 cjming@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/T299104.php: Backport: Add script to update vector skin preferences (T299104) (duration: 00m 51s)
  • 14:53 moritzm: restarting nginx/dhcpd on install/apt servers
  • 14:53 sukhe: rolling restart of pdns-recursor.service and dnsdist.service on doh* hosts for OpenSSL updates
  • 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:52 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22727 and previous config saved to /var/cache/conftool/dbconfig/20220316-144928-marostegui.json
  • 14:47 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 14:46 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS buster
  • 14:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 14:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 14:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 14:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS buster
  • 14:35 XioNoX: add anycast6 peers in drmrs
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22726 and previous config saved to /var/cache/conftool/dbconfig/20220316-143423-marostegui.json
  • 14:25 Emperor: depooling ms-fe100[5-8] prior to decommissioning T303733
  • 14:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298557)', diff saved to https://phabricator.wikimedia.org/P22725 and previous config saved to /var/cache/conftool/dbconfig/20220316-141918-marostegui.json
  • 14:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
  • 14:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 14:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T297189)', diff saved to https://phabricator.wikimedia.org/P22724 and previous config saved to /var/cache/conftool/dbconfig/20220316-141708-marostegui.json
  • 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:12 taavi@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/CentralAuth/includes/User/CentralAuthUser.php: Backport: Replace use of deprecated RecentChange::getEngine (T303861) (duration: 00m 51s)
  • 14:10 herron: grafana1002:~# systemctl restart grafana-ldap-users-sync.service T303064
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22723 and previous config saved to /var/cache/conftool/dbconfig/20220316-140203-marostegui.json
  • 13:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS buster
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22722 and previous config saved to /var/cache/conftool/dbconfig/20220316-134658-marostegui.json
  • 13:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 13:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 13:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22721 and previous config saved to /var/cache/conftool/dbconfig/20220316-133458-marostegui.json
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T297189)', diff saved to https://phabricator.wikimedia.org/P22720 and previous config saved to /var/cache/conftool/dbconfig/20220316-133153-marostegui.json
  • 13:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS buster
  • 13:25 krinkle@deploy1002: Synchronized w/static.php: 159dfd21d (duration: 00m 50s)
  • 13:24 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS buster
  • 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22718 and previous config saved to /var/cache/conftool/dbconfig/20220316-131953-marostegui.json
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298557)', diff saved to https://phabricator.wikimedia.org/P22717 and previous config saved to /var/cache/conftool/dbconfig/20220316-131429-marostegui.json
  • 13:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 13:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298557)', diff saved to https://phabricator.wikimedia.org/P22716 and previous config saved to /var/cache/conftool/dbconfig/20220316-131421-marostegui.json
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:07 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Deploy template features to enwiki (T302857) (duration: 00m 50s)
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22715 and previous config saved to /var/cache/conftool/dbconfig/20220316-130448-marostegui.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22714 and previous config saved to /var/cache/conftool/dbconfig/20220316-125916-marostegui.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T297189)', diff saved to https://phabricator.wikimedia.org/P22713 and previous config saved to /var/cache/conftool/dbconfig/20220316-125803-marostegui.json
  • 12:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 12:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T297189)', diff saved to https://phabricator.wikimedia.org/P22712 and previous config saved to /var/cache/conftool/dbconfig/20220316-125755-marostegui.json
  • 12:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22711 and previous config saved to /var/cache/conftool/dbconfig/20220316-124943-marostegui.json
  • 12:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22710 and previous config saved to /var/cache/conftool/dbconfig/20220316-124742-marostegui.json
  • 12:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298294)', diff saved to https://phabricator.wikimedia.org/P22709 and previous config saved to /var/cache/conftool/dbconfig/20220316-124734-marostegui.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22708 and previous config saved to /var/cache/conftool/dbconfig/20220316-124411-marostegui.json
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22707 and previous config saved to /var/cache/conftool/dbconfig/20220316-124250-marostegui.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22705 and previous config saved to /var/cache/conftool/dbconfig/20220316-123229-marostegui.json
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298557)', diff saved to https://phabricator.wikimedia.org/P22704 and previous config saved to /var/cache/conftool/dbconfig/20220316-122906-marostegui.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22703 and previous config saved to /var/cache/conftool/dbconfig/20220316-122745-marostegui.json
  • 12:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS buster
  • 12:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 12:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 12:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22702 and previous config saved to /var/cache/conftool/dbconfig/20220316-121724-marostegui.json
  • 12:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS buster
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T297189)', diff saved to https://phabricator.wikimedia.org/P22701 and previous config saved to /var/cache/conftool/dbconfig/20220316-121240-marostegui.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298294)', diff saved to https://phabricator.wikimedia.org/P22700 and previous config saved to /var/cache/conftool/dbconfig/20220316-120219-marostegui.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298294)', diff saved to https://phabricator.wikimedia.org/P22699 and previous config saved to /var/cache/conftool/dbconfig/20220316-120100-marostegui.json
  • 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298294)', diff saved to https://phabricator.wikimedia.org/P22698 and previous config saved to /var/cache/conftool/dbconfig/20220316-120047-marostegui.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22697 and previous config saved to /var/cache/conftool/dbconfig/20220316-114542-marostegui.json
  • 11:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T297189)', diff saved to https://phabricator.wikimedia.org/P22695 and previous config saved to /var/cache/conftool/dbconfig/20220316-113200-marostegui.json
  • 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22694 and previous config saved to /var/cache/conftool/dbconfig/20220316-113152-marostegui.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298557)', diff saved to https://phabricator.wikimedia.org/P22693 and previous config saved to /var/cache/conftool/dbconfig/20220316-113057-marostegui.json
  • 11:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 11:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22692 and previous config saved to /var/cache/conftool/dbconfig/20220316-113037-marostegui.json
  • 11:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22691 and previous config saved to /var/cache/conftool/dbconfig/20220316-111647-marostegui.json
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298294)', diff saved to https://phabricator.wikimedia.org/P22690 and previous config saved to /var/cache/conftool/dbconfig/20220316-111532-marostegui.json
  • 11:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS buster
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298294)', diff saved to https://phabricator.wikimedia.org/P22689 and previous config saved to /var/cache/conftool/dbconfig/20220316-110411-marostegui.json
  • 11:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298294)', diff saved to https://phabricator.wikimedia.org/P22688 and previous config saved to /var/cache/conftool/dbconfig/20220316-110403-marostegui.json
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22687 and previous config saved to /var/cache/conftool/dbconfig/20220316-110142-marostegui.json
  • 10:55 vgutierrez: rolling upgrade to HAProxy 2.4.15 on cache nodes
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22686 and previous config saved to /var/cache/conftool/dbconfig/20220316-104858-marostegui.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22685 and previous config saved to /var/cache/conftool/dbconfig/20220316-104637-marostegui.json
  • 10:42 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22684 and previous config saved to /var/cache/conftool/dbconfig/20220316-103353-marostegui.json
  • 10:28 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298294)', diff saved to https://phabricator.wikimedia.org/P22683 and previous config saved to /var/cache/conftool/dbconfig/20220316-101848-marostegui.json
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298294)', diff saved to https://phabricator.wikimedia.org/P22682 and previous config saved to /var/cache/conftool/dbconfig/20220316-101729-marostegui.json
  • 10:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 10 hosts with reason: Maintenance
  • 10:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 10 hosts with reason: Maintenance
  • 10:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 10:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 10:15 vgutierrez: rolling restart of ats-tls and ats-backend to catch up on OpenSSL updates
  • 10:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22681 and previous config saved to /var/cache/conftool/dbconfig/20220316-101502-marostegui.json
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22680 and previous config saved to /var/cache/conftool/dbconfig/20220316-100527-marostegui.json
  • 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T297189)', diff saved to https://phabricator.wikimedia.org/P22679 and previous config saved to /var/cache/conftool/dbconfig/20220316-100519-marostegui.json
  • 10:04 vgutierrez: vgutierrez@apt1001:~$ sudo -i reprepro --component thirdparty/haproxy24 update buster-wikimedia
  • 10:01 moritzm: installing openssl security updates
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22678 and previous config saved to /var/cache/conftool/dbconfig/20220316-095957-marostegui.json
  • 09:56 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1069.eqiad.wmnet with OS stretch
  • 09:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1070.eqiad.wmnet with OS stretch
  • 09:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1071.eqiad.wmnet with OS buster
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22677 and previous config saved to /var/cache/conftool/dbconfig/20220316-095014-marostegui.json
  • 09:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22676 and previous config saved to /var/cache/conftool/dbconfig/20220316-094452-marostegui.json
  • 09:36 dcausse: T293862: manually restarted blazegraph on wdqs1010 with "-agentpath:/usr/lib/libjvmquake.so=1000,1,0,warn=30,touch=/tmp/jvmquake"
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22675 and previous config saved to /var/cache/conftool/dbconfig/20220316-093509-marostegui.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22674 and previous config saved to /var/cache/conftool/dbconfig/20220316-092947-marostegui.json
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22673 and previous config saved to /var/cache/conftool/dbconfig/20220316-092742-marostegui.json
  • 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22672 and previous config saved to /var/cache/conftool/dbconfig/20220316-092735-marostegui.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T297189)', diff saved to https://phabricator.wikimedia.org/P22671 and previous config saved to /var/cache/conftool/dbconfig/20220316-092004-marostegui.json
  • 09:16 moritzm: revert mx1001/mx2001 to the Bullseye version of Exim T303738
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 T303498', diff saved to https://phabricator.wikimedia.org/P22670 and previous config saved to /var/cache/conftool/dbconfig/20220316-091533-marostegui.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22669 and previous config saved to /var/cache/conftool/dbconfig/20220316-091229-marostegui.json
  • 09:09 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 08:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 08:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22668 and previous config saved to /var/cache/conftool/dbconfig/20220316-085724-marostegui.json
  • 08:55 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 08:52 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22667 and previous config saved to /var/cache/conftool/dbconfig/20220316-084219-marostegui.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T297189)', diff saved to https://phabricator.wikimedia.org/P22666 and previous config saved to /var/cache/conftool/dbconfig/20220316-084140-marostegui.json
  • 08:41 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22665 and previous config saved to /var/cache/conftool/dbconfig/20220316-084127-marostegui.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22664 and previous config saved to /var/cache/conftool/dbconfig/20220316-084011-marostegui.json
  • 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298294)', diff saved to https://phabricator.wikimedia.org/P22663 and previous config saved to /var/cache/conftool/dbconfig/20220316-084003-marostegui.json
  • 08:35 hashar: Restarting CI Jenkins
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22662 and previous config saved to /var/cache/conftool/dbconfig/20220316-082622-marostegui.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22661 and previous config saved to /var/cache/conftool/dbconfig/20220316-082458-marostegui.json
  • 08:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:11 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Change A/V player to videojs in the first batch of production wiki (T248418) (duration: 00m 49s)
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22660 and previous config saved to /var/cache/conftool/dbconfig/20220316-081117-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22659 and previous config saved to /var/cache/conftool/dbconfig/20220316-080953-marostegui.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22658 and previous config saved to /var/cache/conftool/dbconfig/20220316-075612-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22657 and previous config saved to /var/cache/conftool/dbconfig/20220316-075502-marostegui.json
  • 07:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298294)', diff saved to https://phabricator.wikimedia.org/P22656 and previous config saved to /var/cache/conftool/dbconfig/20220316-075448-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298294)', diff saved to https://phabricator.wikimedia.org/P22655 and previous config saved to /var/cache/conftool/dbconfig/20220316-075248-marostegui.json
  • 07:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298563)', diff saved to https://phabricator.wikimedia.org/P22654 and previous config saved to /var/cache/conftool/dbconfig/20220316-075007-marostegui.json
  • 07:49 Amir1: dbmaint on master of s4@eqiad (T298743)
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22653 and previous config saved to /var/cache/conftool/dbconfig/20220316-073502-marostegui.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22652 and previous config saved to /var/cache/conftool/dbconfig/20220316-071957-marostegui.json
  • 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298557)', diff saved to https://phabricator.wikimedia.org/P22651 and previous config saved to /var/cache/conftool/dbconfig/20220316-071859-marostegui.json
  • 07:18 urbanecm: UTC morning B&C window done
  • 07:15 urbanecm: Create `testwiki.cx_significant_edits` and `testwiki.cx_section_translation` at s3 (T302371; `mwscript sql.php --wiki=testwiki /srv/mediawiki-staging/php-1.38.0-wmf.26/extensions/ContentTranslation/sql/{section-translations,significant-edits}.sql)`)
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4558951: Disable ContentTranslation for non-extended confirmed users on viwiki (T299636) (duration: 00m 51s)
  • 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298563)', diff saved to https://phabricator.wikimedia.org/P22650 and previous config saved to /var/cache/conftool/dbconfig/20220316-070452-marostegui.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22649 and previous config saved to /var/cache/conftool/dbconfig/20220316-070354-marostegui.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312', diff saved to https://phabricator.wikimedia.org/P22648 and previous config saved to /var/cache/conftool/dbconfig/20220316-070033-marostegui.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22647 and previous config saved to /var/cache/conftool/dbconfig/20220316-065918-marostegui.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22646 and previous config saved to /var/cache/conftool/dbconfig/20220316-064849-marostegui.json
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298557)', diff saved to https://phabricator.wikimedia.org/P22644 and previous config saved to /var/cache/conftool/dbconfig/20220316-063344-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298563)', diff saved to https://phabricator.wikimedia.org/P22643 and previous config saved to /var/cache/conftool/dbconfig/20220316-060008-marostegui.json
  • 06:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22642 and previous config saved to /var/cache/conftool/dbconfig/20220316-055903-marostegui.json
  • 05:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 05:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T298557)', diff saved to https://phabricator.wikimedia.org/P22641 and previous config saved to /var/cache/conftool/dbconfig/20220316-055805-marostegui.json
  • 05:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 05:36 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 05:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1068.eqiad.wmnet with OS stretch
  • 05:14 ryankemper: [WCQS Deploy] Test query passed on commons-query.wikimedia.org ; WCQS deploy complete
  • 05:13 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@38de611] (wcqs): Deploy 0.3.106 to WCQS (duration: 01m 53s)
  • 05:12 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.106` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
  • 05:11 ryankemper@deploy1002: Started deploy [wdqs/wdqs@38de611] (wcqs): Deploy 0.3.106 to WCQS
  • 05:11 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 05:11 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 05:11 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 05:09 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@38de611]: 0.3.106 (duration: 06m 36s)
  • 05:03 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.106` on canary `wdqs1003`; proceeding to rest of fleet
  • 05:02 ryankemper@deploy1002: Started deploy [wdqs/wdqs@38de611]: 0.3.106
  • 05:01 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.106`. Pre-deploy tests passing on canary `wdqs1003`
  • 02:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22640 and previous config saved to /var/cache/conftool/dbconfig/20220316-025347-marostegui.json
  • 02:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22639 and previous config saved to /var/cache/conftool/dbconfig/20220316-023842-marostegui.json
  • 02:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22638 and previous config saved to /var/cache/conftool/dbconfig/20220316-022336-marostegui.json
  • 02:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22637 and previous config saved to /var/cache/conftool/dbconfig/20220316-020831-marostegui.json
  • 01:43 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 01:37 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 01:37 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 01:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 01:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 01:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 01:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS buster
  • 00:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
  • 00:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
  • 00:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS buster
  • 00:03 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye

2022-03-15

  • 22:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 22:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 22:07 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 22:06 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:05 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 22:04 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 22:03 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 22:02 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 22:01 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 22:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 22:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 21:59 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 21:56 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 21:55 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 21:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22635 and previous config saved to /var/cache/conftool/dbconfig/20220315-214729-marostegui.json
  • 21:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 21:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300775)', diff saved to https://phabricator.wikimedia.org/P22634 and previous config saved to /var/cache/conftool/dbconfig/20220315-214721-marostegui.json
  • 21:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298743)', diff saved to https://phabricator.wikimedia.org/P22633 and previous config saved to /var/cache/conftool/dbconfig/20220315-214133-ladsgroup.json
  • 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 21:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS buster
  • 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22632 and previous config saved to /var/cache/conftool/dbconfig/20220315-213216-marostegui.json
  • 21:27 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.26/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Backport: rdbms: provide $owner argument in LoadBalancer::flushPrimarySessions() (T303885) (duration: 00m 53s)
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22631 and previous config saved to /var/cache/conftool/dbconfig/20220315-212628-ladsgroup.json
  • 21:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 21:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22630 and previous config saved to /var/cache/conftool/dbconfig/20220315-211711-marostegui.json
  • 21:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22629 and previous config saved to /var/cache/conftool/dbconfig/20220315-211123-ladsgroup.json
  • 21:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 21:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300775)', diff saved to https://phabricator.wikimedia.org/P22628 and previous config saved to /var/cache/conftool/dbconfig/20220315-210204-marostegui.json
  • 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22627 and previous config saved to /var/cache/conftool/dbconfig/20220315-205702-marostegui.json
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298743)', diff saved to https://phabricator.wikimedia.org/P22626 and previous config saved to /var/cache/conftool/dbconfig/20220315-205618-ladsgroup.json
  • 20:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
  • 20:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 20:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 20:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298557)', diff saved to https://phabricator.wikimedia.org/P22625 and previous config saved to /var/cache/conftool/dbconfig/20220315-204912-marostegui.json
  • 20:47 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
  • 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22624 and previous config saved to /var/cache/conftool/dbconfig/20220315-204157-marostegui.json
  • 20:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22623 and previous config saved to /var/cache/conftool/dbconfig/20220315-203407-marostegui.json
  • 20:27 bd808: Toolhub: running post-deploy database migrations
  • 20:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS buster
  • 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22622 and previous config saved to /var/cache/conftool/dbconfig/20220315-202652-marostegui.json
  • 20:26 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:21 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 20:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22621 and previous config saved to /var/cache/conftool/dbconfig/20220315-201902-marostegui.json
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Add another entry to GECampaignPatterns (T302738) (duration: 02m 22s)
  • 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22620 and previous config saved to /var/cache/conftool/dbconfig/20220315-201147-marostegui.json
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298557)', diff saved to https://phabricator.wikimedia.org/P22619 and previous config saved to /var/cache/conftool/dbconfig/20220315-200357-marostegui.json
  • 19:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T298743)', diff saved to https://phabricator.wikimedia.org/P22618 and previous config saved to /var/cache/conftool/dbconfig/20220315-195934-ladsgroup.json
  • 19:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298743)', diff saved to https://phabricator.wikimedia.org/P22617 and previous config saved to /var/cache/conftool/dbconfig/20220315-195657-ladsgroup.json
  • 19:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 19:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 19:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22616 and previous config saved to /var/cache/conftool/dbconfig/20220315-194152-ladsgroup.json
  • 19:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T298557)', diff saved to https://phabricator.wikimedia.org/P22615 and previous config saved to /var/cache/conftool/dbconfig/20220315-193029-marostegui.json
  • 19:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 19:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22614 and previous config saved to /var/cache/conftool/dbconfig/20220315-192647-ladsgroup.json
  • 19:24 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 19:22 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 19:19 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 19:18 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 19:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22612 and previous config saved to /var/cache/conftool/dbconfig/20220315-191234-marostegui.json
  • 19:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298563)', diff saved to https://phabricator.wikimedia.org/P22611 and previous config saved to /var/cache/conftool/dbconfig/20220315-191226-marostegui.json
  • 19:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS buster
  • 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298743)', diff saved to https://phabricator.wikimedia.org/P22610 and previous config saved to /var/cache/conftool/dbconfig/20220315-191140-ladsgroup.json
  • 19:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
  • 19:00 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
  • 18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
  • 18:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22609 and previous config saved to /var/cache/conftool/dbconfig/20220315-185721-marostegui.json
  • 18:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
  • 18:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
  • 18:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
  • 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T298743)', diff saved to https://phabricator.wikimedia.org/P22608 and previous config saved to /var/cache/conftool/dbconfig/20220315-185413-ladsgroup.json
  • 18:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298743)', diff saved to https://phabricator.wikimedia.org/P22607 and previous config saved to /var/cache/conftool/dbconfig/20220315-185405-ladsgroup.json
  • 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1069.eqiad.wmnet with OS stretch
  • 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1070.eqiad.wmnet with OS stretch
  • 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1071.eqiad.wmnet with OS buster
  • 18:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22606 and previous config saved to /var/cache/conftool/dbconfig/20220315-184216-marostegui.json
  • 18:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22605 and previous config saved to /var/cache/conftool/dbconfig/20220315-183900-ladsgroup.json
  • 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
  • 18:32 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1071.eqiad.wmnet with OS buster
  • 18:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1069.eqiad.wmnet with OS buster
  • 18:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1070.eqiad.wmnet with OS buster
  • 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298563)', diff saved to https://phabricator.wikimedia.org/P22604 and previous config saved to /var/cache/conftool/dbconfig/20220315-182711-marostegui.json
  • 18:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
  • 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22603 and previous config saved to /var/cache/conftool/dbconfig/20220315-182355-ladsgroup.json
  • 18:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
  • 18:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
  • 18:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
  • 18:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
  • 18:13 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS buster
  • 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1071.eqiad.wmnet with OS buster
  • 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1070.eqiad.wmnet with OS buster
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298743)', diff saved to https://phabricator.wikimedia.org/P22602 and previous config saved to /var/cache/conftool/dbconfig/20220315-180850-ladsgroup.json
  • 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1069.eqiad.wmnet with OS buster
  • 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298557)', diff saved to https://phabricator.wikimedia.org/P22601 and previous config saved to /var/cache/conftool/dbconfig/20220315-180542-marostegui.json
  • 18:04 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.26 refs T300202
  • 17:57 XioNoX: power down mr1-ulsfo for replacement
  • 17:52 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298563)', diff saved to https://phabricator.wikimedia.org/P22600 and previous config saved to /var/cache/conftool/dbconfig/20220315-175143-marostegui.json
  • 17:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 17:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298563)', diff saved to https://phabricator.wikimedia.org/P22599 and previous config saved to /var/cache/conftool/dbconfig/20220315-175130-marostegui.json
  • 17:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22598 and previous config saved to /var/cache/conftool/dbconfig/20220315-175037-marostegui.json
  • 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22597 and previous config saved to /var/cache/conftool/dbconfig/20220315-173625-marostegui.json
  • 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22596 and previous config saved to /var/cache/conftool/dbconfig/20220315-173532-marostegui.json
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T298743)', diff saved to https://phabricator.wikimedia.org/P22595 and previous config saved to /var/cache/conftool/dbconfig/20220315-172616-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298743)', diff saved to https://phabricator.wikimedia.org/P22594 and previous config saved to /var/cache/conftool/dbconfig/20220315-172608-ladsgroup.json
  • 17:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:25 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22593 and previous config saved to /var/cache/conftool/dbconfig/20220315-172119-marostegui.json
  • 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298557)', diff saved to https://phabricator.wikimedia.org/P22592 and previous config saved to /var/cache/conftool/dbconfig/20220315-172027-marostegui.json
  • 17:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22591 and previous config saved to /var/cache/conftool/dbconfig/20220315-171103-ladsgroup.json
  • 17:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298563)', diff saved to https://phabricator.wikimedia.org/P22590 and previous config saved to /var/cache/conftool/dbconfig/20220315-170614-marostegui.json
  • 17:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T300775)', diff saved to https://phabricator.wikimedia.org/P22589 and previous config saved to /var/cache/conftool/dbconfig/20220315-170201-marostegui.json
  • 17:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 17:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 17:01 jhuneidi@deploy1002: Pruned MediaWiki: 1.38.0-wmf.24 (duration: 01m 32s)
  • 16:59 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.26 refs T300202 (duration: 38m 54s)
  • 16:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22588 and previous config saved to /var/cache/conftool/dbconfig/20220315-165558-ladsgroup.json
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T298557)', diff saved to https://phabricator.wikimedia.org/P22587 and previous config saved to /var/cache/conftool/dbconfig/20220315-164751-marostegui.json
  • 16:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 16:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298557)', diff saved to https://phabricator.wikimedia.org/P22586 and previous config saved to /var/cache/conftool/dbconfig/20220315-164743-marostegui.json
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298743)', diff saved to https://phabricator.wikimedia.org/P22585 and previous config saved to /var/cache/conftool/dbconfig/20220315-164053-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T298743)', diff saved to https://phabricator.wikimedia.org/P22584 and previous config saved to /var/cache/conftool/dbconfig/20220315-163626-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298743)', diff saved to https://phabricator.wikimedia.org/P22583 and previous config saved to /var/cache/conftool/dbconfig/20220315-163618-ladsgroup.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22582 and previous config saved to /var/cache/conftool/dbconfig/20220315-163238-marostegui.json
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298563)', diff saved to https://phabricator.wikimedia.org/P22581 and previous config saved to /var/cache/conftool/dbconfig/20220315-163134-marostegui.json
  • 16:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298563)', diff saved to https://phabricator.wikimedia.org/P22580 and previous config saved to /var/cache/conftool/dbconfig/20220315-163126-marostegui.json
  • 16:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22579 and previous config saved to /var/cache/conftool/dbconfig/20220315-162113-ladsgroup.json
  • 16:20 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.26 refs T300202
  • 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22578 and previous config saved to /var/cache/conftool/dbconfig/20220315-161732-marostegui.json
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22577 and previous config saved to /var/cache/conftool/dbconfig/20220315-161621-marostegui.json
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22576 and previous config saved to /var/cache/conftool/dbconfig/20220315-160607-ladsgroup.json
  • 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298557)', diff saved to https://phabricator.wikimedia.org/P22575 and previous config saved to /var/cache/conftool/dbconfig/20220315-160226-marostegui.json
  • 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22574 and previous config saved to /var/cache/conftool/dbconfig/20220315-160116-marostegui.json
  • 15:53 moritzm: updating Exim on mx1001 T303738
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298743)', diff saved to https://phabricator.wikimedia.org/P22573 and previous config saved to /var/cache/conftool/dbconfig/20220315-155102-ladsgroup.json
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T298743)', diff saved to https://phabricator.wikimedia.org/P22572 and previous config saved to /var/cache/conftool/dbconfig/20220315-154639-ladsgroup.json
  • 15:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298743)', diff saved to https://phabricator.wikimedia.org/P22571 and previous config saved to /var/cache/conftool/dbconfig/20220315-154631-ladsgroup.json
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298563)', diff saved to https://phabricator.wikimedia.org/P22570 and previous config saved to /var/cache/conftool/dbconfig/20220315-154610-marostegui.json
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22569 and previous config saved to /var/cache/conftool/dbconfig/20220315-153126-ladsgroup.json
  • 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T298557)', diff saved to https://phabricator.wikimedia.org/P22568 and previous config saved to /var/cache/conftool/dbconfig/20220315-152916-marostegui.json
  • 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 15:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 15:18 moritzm: installing Java updates on wcqs*/wdqs* hosts
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22567 and previous config saved to /var/cache/conftool/dbconfig/20220315-151621-ladsgroup.json
  • 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298563)', diff saved to https://phabricator.wikimedia.org/P22566 and previous config saved to /var/cache/conftool/dbconfig/20220315-151206-marostegui.json
  • 15:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 15:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 15:09 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@f01214c]: (no justification provided) (duration: 00m 07s)
  • 15:09 ebysans@deploy1002: Started deploy [airflow-dags/analytics@f01214c]: (no justification provided)
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22565 and previous config saved to /var/cache/conftool/dbconfig/20220315-150649-root.json
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298743)', diff saved to https://phabricator.wikimedia.org/P22564 and previous config saved to /var/cache/conftool/dbconfig/20220315-150116-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T298743)', diff saved to https://phabricator.wikimedia.org/P22563 and previous config saved to /var/cache/conftool/dbconfig/20220315-145246-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298743)', diff saved to https://phabricator.wikimedia.org/P22562 and previous config saved to /var/cache/conftool/dbconfig/20220315-145238-ladsgroup.json
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22561 and previous config saved to /var/cache/conftool/dbconfig/20220315-145146-root.json
  • 14:50 moritzm: installing postgresql-11 security updates
  • 14:49 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@88d5618]: (no justification provided) (duration: 00m 07s)
  • 14:49 ntsako@deploy1002: Started deploy [airflow-dags/analytics@88d5618]: (no justification provided)
  • 14:43 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 14:42 ottomata: I read the cumin output wrong, kafka-jumbo1001 and 1002 restarted successfully before accidental ctrl-c on cumin command. Restarting the full jumbo roll-restart to thoroughly do them all - T303324
  • 14:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1068.eqiad.wmnet with reason: host reimage
  • 14:39 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:38 ottomata: all brokers except kafka-jumbo1001 were succesffully roll restarted, doing kafka-jumbo1001 manually - T303324
  • 14:37 ottomata: accidental cancel of roll restart brokers, re-doing - T303324
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22560 and previous config saved to /var/cache/conftool/dbconfig/20220315-143733-ladsgroup.json
  • 14:37 otto@cumin1001: END (ERROR) - Cookbook sre.kafka.roll-restart-brokers (exit_code=97) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 14:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1068.eqiad.wmnet with reason: host reimage
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22559 and previous config saved to /var/cache/conftool/dbconfig/20220315-143642-root.json
  • 14:32 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@2924232]: (no justification provided) (duration: 00m 08s)
  • 14:32 ntsako@deploy1002: Started deploy [airflow-dags/analytics@2924232]: (no justification provided)
  • 14:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1068.eqiad.wmnet with OS stretch
  • 14:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 14:22 inflatador: T303256 bking@cumin1001 restarting wdqs services `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-blazegraph`
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22558 and previous config saved to /var/cache/conftool/dbconfig/20220315-142228-ladsgroup.json
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22557 and previous config saved to /var/cache/conftool/dbconfig/20220315-142138-root.json
  • 14:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 10 hosts with reason: Maintenance
  • 14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 10 hosts with reason: Maintenance
  • 14:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298743)', diff saved to https://phabricator.wikimedia.org/P22556 and previous config saved to /var/cache/conftool/dbconfig/20220315-140723-ladsgroup.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22555 and previous config saved to /var/cache/conftool/dbconfig/20220315-140634-root.json
  • 14:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298557)', diff saved to https://phabricator.wikimedia.org/P22554 and previous config saved to /var/cache/conftool/dbconfig/20220315-140520-marostegui.json
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T298743)', diff saved to https://phabricator.wikimedia.org/P22553 and previous config saved to /var/cache/conftool/dbconfig/20220315-140259-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298743)', diff saved to https://phabricator.wikimedia.org/P22552 and previous config saved to /var/cache/conftool/dbconfig/20220315-140252-ladsgroup.json
  • 14:01 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:00 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 13:59 ottomata: roll restarting kafka jumbo brokers to set max.incremental.fetch.session.cache.slots=2000 - T303324
  • 13:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
  • 13:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22551 and previous config saved to /var/cache/conftool/dbconfig/20220315-135015-marostegui.json
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22550 and previous config saved to /var/cache/conftool/dbconfig/20220315-134747-ladsgroup.json
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:37 awight: EU deployment complete
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22549 and previous config saved to /var/cache/conftool/dbconfig/20220315-133510-marostegui.json
  • 13:34 awight@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [beta] Disable improved template search (T286991, T302857) (take 2) (duration: 00m 50s)
  • 13:32 awight@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [beta] Disable improved template search (T286991, T302857) (duration: 00m 48s)
  • 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22548 and previous config saved to /var/cache/conftool/dbconfig/20220315-133241-ladsgroup.json
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:31 awight@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [beta] Remove unused config overrides (duration: 00m 49s)
  • 13:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22547 and previous config saved to /var/cache/conftool/dbconfig/20220315-132857-marostegui.json
  • 13:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298557)', diff saved to https://phabricator.wikimedia.org/P22546 and previous config saved to /var/cache/conftool/dbconfig/20220315-132005-marostegui.json
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298743)', diff saved to https://phabricator.wikimedia.org/P22545 and previous config saved to /var/cache/conftool/dbconfig/20220315-131736-ladsgroup.json
  • 13:15 awight@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/TemplateWizard/resources/ext.TemplateWizard.SearchField.js: Backport: Fix copy-paste mistake in template search widget (T303524) (duration: 00m 49s)
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1095:3315', diff saved to https://phabricator.wikimedia.org/P22544 and previous config saved to /var/cache/conftool/dbconfig/20220315-131436-marostegui.json
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22543 and previous config saved to /var/cache/conftool/dbconfig/20220315-131352-marostegui.json
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T298743)', diff saved to https://phabricator.wikimedia.org/P22542 and previous config saved to /var/cache/conftool/dbconfig/20220315-131311-ladsgroup.json
  • 13:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298743)', diff saved to https://phabricator.wikimedia.org/P22541 and previous config saved to /var/cache/conftool/dbconfig/20220315-131303-ladsgroup.json
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22540 and previous config saved to /var/cache/conftool/dbconfig/20220315-130936-marostegui.json
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:07 Amir1: removed 440 more corrupt rows in flaggedtemplates in dewiki (T297189)
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22539 and previous config saved to /var/cache/conftool/dbconfig/20220315-125847-marostegui.json
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22538 and previous config saved to /var/cache/conftool/dbconfig/20220315-125758-ladsgroup.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22537 and previous config saved to /var/cache/conftool/dbconfig/20220315-125431-marostegui.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T298557)', diff saved to https://phabricator.wikimedia.org/P22536 and previous config saved to /var/cache/conftool/dbconfig/20220315-125228-marostegui.json
  • 12:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:48 Amir1: removed 170 corrupt rows in flaggedtemplates in dewiki (T297189)
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22535 and previous config saved to /var/cache/conftool/dbconfig/20220315-124342-marostegui.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22534 and previous config saved to /var/cache/conftool/dbconfig/20220315-124253-ladsgroup.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22533 and previous config saved to /var/cache/conftool/dbconfig/20220315-123926-marostegui.json
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298743)', diff saved to https://phabricator.wikimedia.org/P22532 and previous config saved to /var/cache/conftool/dbconfig/20220315-122748-ladsgroup.json
  • 12:24 moritzm: updating Exim on mx2001 T303738
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22531 and previous config saved to /var/cache/conftool/dbconfig/20220315-122421-marostegui.json
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T298743)', diff saved to https://phabricator.wikimedia.org/P22530 and previous config saved to /var/cache/conftool/dbconfig/20220315-121317-ladsgroup.json
  • 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298743)', diff saved to https://phabricator.wikimedia.org/P22529 and previous config saved to /var/cache/conftool/dbconfig/20220315-121309-ladsgroup.json
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22528 and previous config saved to /var/cache/conftool/dbconfig/20220315-115804-ladsgroup.json
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22527 and previous config saved to /var/cache/conftool/dbconfig/20220315-114259-ladsgroup.json
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298743)', diff saved to https://phabricator.wikimedia.org/P22526 and previous config saved to /var/cache/conftool/dbconfig/20220315-112754-ladsgroup.json
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T298743)', diff saved to https://phabricator.wikimedia.org/P22525 and previous config saved to /var/cache/conftool/dbconfig/20220315-112308-ladsgroup.json
  • 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22524 and previous config saved to /var/cache/conftool/dbconfig/20220315-110423-marostegui.json
  • 11:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22523 and previous config saved to /var/cache/conftool/dbconfig/20220315-110416-marostegui.json
  • 10:50 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22522 and previous config saved to /var/cache/conftool/dbconfig/20220315-104910-marostegui.json
  • 10:49 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22521 and previous config saved to /var/cache/conftool/dbconfig/20220315-103405-marostegui.json
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22520 and previous config saved to /var/cache/conftool/dbconfig/20220315-101922-root.json
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22519 and previous config saved to /var/cache/conftool/dbconfig/20220315-101900-marostegui.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22518 and previous config saved to /var/cache/conftool/dbconfig/20220315-101449-root.json
  • 10:13 Amir1: start of foreachwikiindblist all maintenance/refreshImageMetadata.php --force --verbose --mediatype=AUDIO --sleep 2 --oldimage (T226311)
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22517 and previous config saved to /var/cache/conftool/dbconfig/20220315-100418-root.json
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22516 and previous config saved to /var/cache/conftool/dbconfig/20220315-095945-root.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22515 and previous config saved to /var/cache/conftool/dbconfig/20220315-094914-root.json
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22514 and previous config saved to /var/cache/conftool/dbconfig/20220315-094441-root.json
  • 09:38 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22513 and previous config saved to /var/cache/conftool/dbconfig/20220315-093410-root.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22512 and previous config saved to /var/cache/conftool/dbconfig/20220315-092937-root.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22511 and previous config saved to /var/cache/conftool/dbconfig/20220315-091906-root.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22510 and previous config saved to /var/cache/conftool/dbconfig/20220315-091850-root.json
  • 09:14 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22509 and previous config saved to /var/cache/conftool/dbconfig/20220315-091433-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22507 and previous config saved to /var/cache/conftool/dbconfig/20220315-090346-root.json
  • 09:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22506 and previous config saved to /var/cache/conftool/dbconfig/20220315-085929-root.json
  • 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:57 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@a85cf25] (eqiad): Switchover to eqiad tegola on eqiad env (duration: 01m 55s)
  • 08:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@a85cf25] (eqiad): Switchover to eqiad tegola on eqiad env
  • 08:53 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@a85cf25] (codfw): Switchover to eqiad tegola on eqiad env (duration: 03m 22s)
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22505 and previous config saved to /var/cache/conftool/dbconfig/20220315-085214-marostegui.json
  • 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298563)', diff saved to https://phabricator.wikimedia.org/P22504 and previous config saved to /var/cache/conftool/dbconfig/20220315-085206-marostegui.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1095:3315', diff saved to https://phabricator.wikimedia.org/P22503 and previous config saved to /var/cache/conftool/dbconfig/20220315-085026-marostegui.json
  • 08:50 marostegui: dbmaint on s5@eqiad T297189
  • 08:49 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@a85cf25] (codfw): Switchover to eqiad tegola on eqiad env
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22502 and previous config saved to /var/cache/conftool/dbconfig/20220315-084842-root.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22501 and previous config saved to /var/cache/conftool/dbconfig/20220315-084425-root.json
  • 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1161.eqiad.wmnet with OS bullseye
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22500 and previous config saved to /var/cache/conftool/dbconfig/20220315-083925-marostegui.json
  • 08:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300775)', diff saved to https://phabricator.wikimedia.org/P22499 and previous config saved to /var/cache/conftool/dbconfig/20220315-083917-marostegui.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22498 and previous config saved to /var/cache/conftool/dbconfig/20220315-083701-marostegui.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22497 and previous config saved to /var/cache/conftool/dbconfig/20220315-083338-root.json
  • 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: host reimage
  • 08:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: host reimage
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22496 and previous config saved to /var/cache/conftool/dbconfig/20220315-082412-marostegui.json
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22495 and previous config saved to /var/cache/conftool/dbconfig/20220315-082401-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22493 and previous config saved to /var/cache/conftool/dbconfig/20220315-082157-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22492 and previous config saved to /var/cache/conftool/dbconfig/20220315-081835-root.json
  • 08:13 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1161.eqiad.wmnet with OS bullseye
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22491 and previous config saved to /var/cache/conftool/dbconfig/20220315-080907-marostegui.json
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22490 and previous config saved to /var/cache/conftool/dbconfig/20220315-080857-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298563)', diff saved to https://phabricator.wikimedia.org/P22489 and previous config saved to /var/cache/conftool/dbconfig/20220315-080651-marostegui.json
  • 08:05 marostegui: dbmaint on s5@eqiad T300473
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P22488 and previous config saved to /var/cache/conftool/dbconfig/20220315-080329-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P22487 and previous config saved to /var/cache/conftool/dbconfig/20220315-080128-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300775)', diff saved to https://phabricator.wikimedia.org/P22486 and previous config saved to /var/cache/conftool/dbconfig/20220315-075402-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22485 and previous config saved to /var/cache/conftool/dbconfig/20220315-075353-root.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P22484 and previous config saved to /var/cache/conftool/dbconfig/20220315-074825-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22483 and previous config saved to /var/cache/conftool/dbconfig/20220315-074650-root.json
  • 07:43 elukey: restart kube-api server on ml-serve-ctrl2002 - 504 responses registered, corresponding to high custom resource definition requests
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22482 and previous config saved to /var/cache/conftool/dbconfig/20220315-073849-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22481 and previous config saved to /var/cache/conftool/dbconfig/20220315-073146-root.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22480 and previous config saved to /var/cache/conftool/dbconfig/20220315-072345-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22479 and previous config saved to /var/cache/conftool/dbconfig/20220315-071642-root.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22478 and previous config saved to /var/cache/conftool/dbconfig/20220315-070841-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298563)', diff saved to https://phabricator.wikimedia.org/P22477 and previous config saved to /var/cache/conftool/dbconfig/20220315-070635-marostegui.json
  • 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22476 and previous config saved to /var/cache/conftool/dbconfig/20220315-070138-root.json
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1166.eqiad.wmnet with OS bullseye
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22475 and previous config saved to /var/cache/conftool/dbconfig/20220315-065337-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22474 and previous config saved to /var/cache/conftool/dbconfig/20220315-064634-root.json
  • 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: host reimage
  • 06:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1166.eqiad.wmnet with reason: host reimage
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P22473 and previous config saved to /var/cache/conftool/dbconfig/20220315-063130-root.json
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1166.eqiad.wmnet with OS bullseye
  • 06:26 marostegui: dbmaint on s3@eqiad T300600
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P22472 and previous config saved to /var/cache/conftool/dbconfig/20220315-062543-marostegui.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P22471 and previous config saved to /var/cache/conftool/dbconfig/20220315-061626-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T300775)', diff saved to https://phabricator.wikimedia.org/P22470 and previous config saved to /var/cache/conftool/dbconfig/20220315-061458-marostegui.json
  • 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300775)', diff saved to https://phabricator.wikimedia.org/P22469 and previous config saved to /var/cache/conftool/dbconfig/20220315-061450-marostegui.json
  • 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22468 and previous config saved to /var/cache/conftool/dbconfig/20220315-055945-marostegui.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22467 and previous config saved to /var/cache/conftool/dbconfig/20220315-054440-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300775)', diff saved to https://phabricator.wikimedia.org/P22466 and previous config saved to /var/cache/conftool/dbconfig/20220315-052935-marostegui.json
  • 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T300775)', diff saved to https://phabricator.wikimedia.org/P22465 and previous config saved to /var/cache/conftool/dbconfig/20220315-013013-marostegui.json
  • 01:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 01:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22464 and previous config saved to /var/cache/conftool/dbconfig/20220315-013000-marostegui.json
  • 01:26 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/maintenance/populateGlobalEditCount.php: fix script bug gerrit 770058 (duration: 00m 50s)
  • 01:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22463 and previous config saved to /var/cache/conftool/dbconfig/20220315-011455-marostegui.json
  • 00:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22462 and previous config saved to /var/cache/conftool/dbconfig/20220315-005950-marostegui.json
  • 00:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22461 and previous config saved to /var/cache/conftool/dbconfig/20220315-004445-marostegui.json
  • 00:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)

2022-03-14

  • 23:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22460 and previous config saved to /var/cache/conftool/dbconfig/20220314-234430-marostegui.json
  • 23:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 23:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:28 ryankemper: T301108 `ryankemper@cumin1001:~$ sudo cookbook sre.wdqs.data-transfer --source wdqs1009.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "moving away from legacy updater" --blazegraph_instance wikidata --without-lvs --task-id T301108` on tmux `wdqs`
  • 22:27 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
  • 22:16 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
  • 22:04 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 22:03 bking@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs-internal,name=eqiad
  • 22:03 bking@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
  • 22:03 inflatador: T302494 bking@puppetmaster1001 depooling eqiad in DNS-discovery for wdqs and wdqs-internal services
  • 21:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:39 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:38 inflatador: T302494 bking@puppetmaster1001 conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=codfw
  • 21:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:37 bking@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=codfw
  • 21:36 inflatador: bking@cumin pooling codfw in DNS-discovery for wdqs and wdqs-internal services
  • 21:31 sbassett: Deployed security fix for T160800
  • 21:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:54 urbanecm: UTC late B&C completed
  • 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bca9c94: liwiktionary: Change timezone to CET/CEST (T303734) (duration: 00m 49s)
  • 20:45 ebernhardson@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CirrusSearch/profiles/SaneitizeProfiles.config.php: Backport: Cut saneitizer re-indexing rate in half (T302733) (duration: 00m 49s)
  • 20:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
  • 20:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
  • 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:31 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:31 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:30 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:22 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:22 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 19:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300775)', diff saved to https://phabricator.wikimedia.org/P22457 and previous config saved to /var/cache/conftool/dbconfig/20220314-194404-marostegui.json
  • 19:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22456 and previous config saved to /var/cache/conftool/dbconfig/20220314-192859-marostegui.json
  • 19:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1022.eqiad.wmnet with OS bullseye
  • 19:22 ejegg: updated civicrm from 252269c8 to 52c45874
  • 19:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22455 and previous config saved to /var/cache/conftool/dbconfig/20220314-191354-marostegui.json
  • 19:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1022.eqiad.wmnet with reason: host reimage
  • 19:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1022.eqiad.wmnet with reason: host reimage
  • 19:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298294)', diff saved to https://phabricator.wikimedia.org/P22454 and previous config saved to /var/cache/conftool/dbconfig/20220314-190224-marostegui.json
  • 18:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300775)', diff saved to https://phabricator.wikimedia.org/P22453 and previous config saved to /var/cache/conftool/dbconfig/20220314-185849-marostegui.json
  • 18:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1022.eqiad.wmnet with OS bullseye
  • 18:51 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22452 and previous config saved to /var/cache/conftool/dbconfig/20220314-184719-marostegui.json
  • 18:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22451 and previous config saved to /var/cache/conftool/dbconfig/20220314-183214-marostegui.json
  • 18:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1021.eqiad.wmnet with reason: host reimage
  • 18:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1021.eqiad.wmnet with reason: host reimage
  • 18:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298294)', diff saved to https://phabricator.wikimedia.org/P22450 and previous config saved to /var/cache/conftool/dbconfig/20220314-181709-marostegui.json
  • 18:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T298294)', diff saved to https://phabricator.wikimedia.org/P22449 and previous config saved to /var/cache/conftool/dbconfig/20220314-175352-marostegui.json
  • 17:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:47 Amir1: start of foreachwikiindblist all maintenance/refreshImageMetadata.php --force --verbose --mediatype=AUDIO --sleep 2 (T226311)
  • 17:45 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@63af538] (eqiad): Enable 100% traffic mirroring on eqiad (duration: 01m 04s)
  • 17:44 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@63af538] (eqiad): Enable 100% traffic mirroring on eqiad
  • 17:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298294)', diff saved to https://phabricator.wikimedia.org/P22448 and previous config saved to /var/cache/conftool/dbconfig/20220314-173442-marostegui.json
  • 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22446 and previous config saved to /var/cache/conftool/dbconfig/20220314-171937-marostegui.json
  • 17:18 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22445 and previous config saved to /var/cache/conftool/dbconfig/20220314-170432-marostegui.json
  • 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298294)', diff saved to https://phabricator.wikimedia.org/P22444 and previous config saved to /var/cache/conftool/dbconfig/20220314-164927-marostegui.json
  • 16:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T298294)', diff saved to https://phabricator.wikimedia.org/P22442 and previous config saved to /var/cache/conftool/dbconfig/20220314-162509-marostegui.json
  • 16:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298294)', diff saved to https://phabricator.wikimedia.org/P22441 and previous config saved to /var/cache/conftool/dbconfig/20220314-162501-marostegui.json
  • 16:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
  • 16:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
  • 16:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 16:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 16:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298563)', diff saved to https://phabricator.wikimedia.org/P22440 and previous config saved to /var/cache/conftool/dbconfig/20220314-161943-marostegui.json
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22439 and previous config saved to /var/cache/conftool/dbconfig/20220314-160955-marostegui.json
  • 16:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22438 and previous config saved to /var/cache/conftool/dbconfig/20220314-160438-marostegui.json
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22437 and previous config saved to /var/cache/conftool/dbconfig/20220314-155450-marostegui.json
  • 15:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22436 and previous config saved to /var/cache/conftool/dbconfig/20220314-154933-marostegui.json
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298294)', diff saved to https://phabricator.wikimedia.org/P22435 and previous config saved to /var/cache/conftool/dbconfig/20220314-153945-marostegui.json
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298563)', diff saved to https://phabricator.wikimedia.org/P22434 and previous config saved to /var/cache/conftool/dbconfig/20220314-153428-marostegui.json
  • 15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T298294)', diff saved to https://phabricator.wikimedia.org/P22432 and previous config saved to /var/cache/conftool/dbconfig/20220314-151025-marostegui.json
  • 15:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 15:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298294)', diff saved to https://phabricator.wikimedia.org/P22431 and previous config saved to /var/cache/conftool/dbconfig/20220314-151017-marostegui.json
  • 14:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22430 and previous config saved to /var/cache/conftool/dbconfig/20220314-145512-marostegui.json
  • 14:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T300775)', diff saved to https://phabricator.wikimedia.org/P22429 and previous config saved to /var/cache/conftool/dbconfig/20220314-145345-marostegui.json
  • 14:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 14:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T298563)', diff saved to https://phabricator.wikimedia.org/P22428 and previous config saved to /var/cache/conftool/dbconfig/20220314-145109-marostegui.json
  • 14:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 14:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22427 and previous config saved to /var/cache/conftool/dbconfig/20220314-144007-marostegui.json
  • 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298294)', diff saved to https://phabricator.wikimedia.org/P22426 and previous config saved to /var/cache/conftool/dbconfig/20220314-142502-marostegui.json
  • 14:01 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1010.eqiad.wmnet
  • 14:01 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1010.eqiad.wmnet
  • 14:01 mvernon@cumin1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1010.eqiad.wmnet
  • 14:01 mvernon@cumin1001: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1010.eqiad.wmnet
  • 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1017.eqiad.wmnet with reason: host reimage
  • 13:58 herron: grafana1002:~# systemctl restart grafana-ldap-users-sync.service T303064
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T298294)', diff saved to https://phabricator.wikimedia.org/P22425 and previous config saved to /var/cache/conftool/dbconfig/20220314-135744-marostegui.json
  • 13:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298294)', diff saved to https://phabricator.wikimedia.org/P22424 and previous config saved to /var/cache/conftool/dbconfig/20220314-135736-marostegui.json
  • 13:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 13:57 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1011.eqiad.wmnet
  • 13:57 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1011.eqiad.wmnet
  • 13:56 mvernon@cumin1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1011.eqiad.wmnet
  • 13:56 mvernon@cumin1001: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1011.eqiad.wmnet
  • 13:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1017.eqiad.wmnet with reason: host reimage
  • 13:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:53 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:50 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1012.eqiad.wmnet
  • 13:50 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1012.eqiad.wmnet
  • 13:50 mvernon@cumin1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1012.eqiad.wmnet
  • 13:49 mvernon@cumin1001: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1012.eqiad.wmnet
  • 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:45 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1009.eqiad.wmnet
  • 13:45 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1009.eqiad.wmnet
  • 13:45 mvernon@cumin1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1009.eqiad.wmnet
  • 13:45 mvernon@cumin1001: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1009.eqiad.wmnet
  • 13:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 13:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298563)', diff saved to https://phabricator.wikimedia.org/P22423 and previous config saved to /var/cache/conftool/dbconfig/20220314-134356-marostegui.json
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:43 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22422 and previous config saved to /var/cache/conftool/dbconfig/20220314-134231-marostegui.json
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 Emperor: restarting swift-proxy on ms-fe100[5-8] to update config to know about new eqiad frontends T303698
  • 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22421 and previous config saved to /var/cache/conftool/dbconfig/20220314-132849-marostegui.json
  • 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22420 and previous config saved to /var/cache/conftool/dbconfig/20220314-132726-marostegui.json
  • 13:25 dcausse: restarting blazegraph on wdqs1006 (jvm stuck for 10hours)
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:20 urbanecm@deploy1002: Synchronized static/images/project-logos/: 3fa9683: Delete huwiki 500k milestone logo files (T301923) (duration: 00m 49s)
  • 13:18 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 3c2c8b0: Stop using huwiki 500k milestone logo (T301923) (duration: 00m 48s)
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22419 and previous config saved to /var/cache/conftool/dbconfig/20220314-131344-marostegui.json
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298294)', diff saved to https://phabricator.wikimedia.org/P22418 and previous config saved to /var/cache/conftool/dbconfig/20220314-131220-marostegui.json
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:10 urbanecm@deploy1002: Synchronized wmf-config/wikitech.php: 95f376a: wikitech: migrate wmf* to wmg* (T45956) (duration: 00m 48s)
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298563)', diff saved to https://phabricator.wikimedia.org/P22417 and previous config saved to /var/cache/conftool/dbconfig/20220314-125839-marostegui.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T298294)', diff saved to https://phabricator.wikimedia.org/P22416 and previous config saved to /var/cache/conftool/dbconfig/20220314-124911-marostegui.json
  • 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298294)', diff saved to https://phabricator.wikimedia.org/P22415 and previous config saved to /var/cache/conftool/dbconfig/20220314-124902-marostegui.json
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22414 and previous config saved to /var/cache/conftool/dbconfig/20220314-123357-marostegui.json
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T298563)', diff saved to https://phabricator.wikimedia.org/P22413 and previous config saved to /var/cache/conftool/dbconfig/20220314-121937-marostegui.json
  • 12:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22412 and previous config saved to /var/cache/conftool/dbconfig/20220314-121852-marostegui.json
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298294)', diff saved to https://phabricator.wikimedia.org/P22411 and previous config saved to /var/cache/conftool/dbconfig/20220314-120347-marostegui.json
  • 11:55 moritzm: restarting nginx on archiva1002 to pick up security updates
  • 11:53 moritzm: restarting apache2 on matomo1002 to pick up security updates
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T298294)', diff saved to https://phabricator.wikimedia.org/P22410 and previous config saved to /var/cache/conftool/dbconfig/20220314-114312-marostegui.json
  • 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298294)', diff saved to https://phabricator.wikimedia.org/P22409 and previous config saved to /var/cache/conftool/dbconfig/20220314-114305-marostegui.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22408 and previous config saved to /var/cache/conftool/dbconfig/20220314-112759-marostegui.json
  • 11:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298563)', diff saved to https://phabricator.wikimedia.org/P22407 and previous config saved to /var/cache/conftool/dbconfig/20220314-112117-marostegui.json
  • 11:18 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c8a9efd] (eqiad): Enable mirroring on eqiad with 50% of the traffic (duration: 02m 38s)
  • 11:15 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c8a9efd] (eqiad): Enable mirroring on eqiad with 50% of the traffic
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22406 and previous config saved to /var/cache/conftool/dbconfig/20220314-111255-marostegui.json
  • 11:12 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@594f1d5] (eqiad): Revert "Revert "Mirror 100% of request to tegola in eqiad"" (duration: 07m 01s)
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22405 and previous config saved to /var/cache/conftool/dbconfig/20220314-110612-marostegui.json
  • 11:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@594f1d5] (eqiad): Revert "Revert "Mirror 100% of request to tegola in eqiad""
  • 11:04 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@594f1d5] (codfw): Revert "Revert "Mirror 100% of request to tegola in eqiad"" (duration: 01m 30s)
  • 11:03 mbsantos@deploy1002: Started deploy [kartotherian/deploy@594f1d5] (codfw): Revert "Revert "Mirror 100% of request to tegola in eqiad""
  • 10:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298294)', diff saved to https://phabricator.wikimedia.org/P22404 and previous config saved to /var/cache/conftool/dbconfig/20220314-105749-marostegui.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22403 and previous config saved to /var/cache/conftool/dbconfig/20220314-105107-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298563)', diff saved to https://phabricator.wikimedia.org/P22402 and previous config saved to /var/cache/conftool/dbconfig/20220314-103602-marostegui.json
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T298294)', diff saved to https://phabricator.wikimedia.org/P22401 and previous config saved to /var/cache/conftool/dbconfig/20220314-103532-marostegui.json
  • 10:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 10:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298294)', diff saved to https://phabricator.wikimedia.org/P22400 and previous config saved to /var/cache/conftool/dbconfig/20220314-103525-marostegui.json
  • 10:29 _joe_: running puppet on all cp hosts, to introduce the cloud netmapping
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22399 and previous config saved to /var/cache/conftool/dbconfig/20220314-102020-marostegui.json
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22398 and previous config saved to /var/cache/conftool/dbconfig/20220314-100515-marostegui.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T298563)', diff saved to https://phabricator.wikimedia.org/P22397 and previous config saved to /var/cache/conftool/dbconfig/20220314-095353-marostegui.json
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298563)', diff saved to https://phabricator.wikimedia.org/P22396 and previous config saved to /var/cache/conftool/dbconfig/20220314-095346-marostegui.json
  • 09:53 Emperor: rebooting ms-fe10[09-12] as part of bringing into service T303698
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298294)', diff saved to https://phabricator.wikimedia.org/P22395 and previous config saved to /var/cache/conftool/dbconfig/20220314-095009-marostegui.json
  • 09:48 Amir1: dbmaint on s2@eqiad (T298743)
  • 09:46 Amir1: dbmaint on s8@eqiad (T298743)
  • 09:46 Amir1: dbmaint on s1@eqiad (T298743)
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22394 and previous config saved to /var/cache/conftool/dbconfig/20220314-093840-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T298294)', diff saved to https://phabricator.wikimedia.org/P22393 and previous config saved to /var/cache/conftool/dbconfig/20220314-092559-marostegui.json
  • 09:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 09:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298294)', diff saved to https://phabricator.wikimedia.org/P22392 and previous config saved to /var/cache/conftool/dbconfig/20220314-092551-marostegui.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22391 and previous config saved to /var/cache/conftool/dbconfig/20220314-092335-marostegui.json
  • 09:18 moritzm: installing vim security updates
  • 09:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2017.codfw.wmnet with OS bullseye
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22390 and previous config saved to /var/cache/conftool/dbconfig/20220314-091046-marostegui.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298563)', diff saved to https://phabricator.wikimedia.org/P22389 and previous config saved to /var/cache/conftool/dbconfig/20220314-090830-marostegui.json
  • 09:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2017.codfw.wmnet with reason: host reimage
  • 09:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2017.codfw.wmnet with reason: host reimage
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22388 and previous config saved to /var/cache/conftool/dbconfig/20220314-085541-marostegui.json
  • 08:46 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2017.codfw.wmnet with OS bullseye
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298294)', diff saved to https://phabricator.wikimedia.org/P22387 and previous config saved to /var/cache/conftool/dbconfig/20220314-084036-marostegui.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T298563)', diff saved to https://phabricator.wikimedia.org/P22386 and previous config saved to /var/cache/conftool/dbconfig/20220314-082846-marostegui.json
  • 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298563)', diff saved to https://phabricator.wikimedia.org/P22385 and previous config saved to /var/cache/conftool/dbconfig/20220314-082838-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T298294)', diff saved to https://phabricator.wikimedia.org/P22384 and previous config saved to /var/cache/conftool/dbconfig/20220314-081836-marostegui.json
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 08:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298294)', diff saved to https://phabricator.wikimedia.org/P22383 and previous config saved to /var/cache/conftool/dbconfig/20220314-081828-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22382 and previous config saved to /var/cache/conftool/dbconfig/20220314-081333-marostegui.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22381 and previous config saved to /var/cache/conftool/dbconfig/20220314-080323-marostegui.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22380 and previous config saved to /var/cache/conftool/dbconfig/20220314-075828-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22379 and previous config saved to /var/cache/conftool/dbconfig/20220314-074818-marostegui.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298563)', diff saved to https://phabricator.wikimedia.org/P22378 and previous config saved to /var/cache/conftool/dbconfig/20220314-074323-marostegui.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298294)', diff saved to https://phabricator.wikimedia.org/P22377 and previous config saved to /var/cache/conftool/dbconfig/20220314-073313-marostegui.json
  • 07:18 elukey: restart varnishkafka-webrequest on cp6001 to test a metric issue
  • 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:11 marostegui: dbmaint on s7@eqiad T300775
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T298563)', diff saved to https://phabricator.wikimedia.org/P22376 and previous config saved to /var/cache/conftool/dbconfig/20220314-070721-marostegui.json
  • 07:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T298294)', diff saved to https://phabricator.wikimedia.org/P22375 and previous config saved to /var/cache/conftool/dbconfig/20220314-070404-marostegui.json
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 06:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 12 hosts with reason: Maintenance
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 12 hosts with reason: Maintenance
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 06:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance

2022-03-11

  • 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2014.codfw.wmnet with OS bullseye
  • 15:44 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2014.codfw.wmnet with reason: host reimage
  • 15:42 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2014.codfw.wmnet with reason: host reimage
  • 15:39 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:38 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:37 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:36 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:36 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:35 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:33 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:33 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:27 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2014.codfw.wmnet with OS bullseye
  • 15:07 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host kubernetes2013.codfw.wmnet with OS bullseye
  • 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22374 and previous config saved to /var/cache/conftool/dbconfig/20220311-150702-root.json
  • 15:02 XioNoX: cr1/2-eqiad AVOID-PATHS as-path TI "6762 .*"
  • 15:02 XioNoX: cr2-esams AVOID-PATHS as-path TI "6762 .*" <- rolled back
  • 14:57 XioNoX: cr2-esams AVOID-PATHS as-path TI "6762 .*"
  • 14:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2013.codfw.wmnet with reason: host reimage
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22373 and previous config saved to /var/cache/conftool/dbconfig/20220311-145159-root.json
  • 14:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2013.codfw.wmnet with reason: host reimage
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22372 and previous config saved to /var/cache/conftool/dbconfig/20220311-143652-root.json
  • 14:35 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2013.codfw.wmnet with OS bullseye
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22371 and previous config saved to /var/cache/conftool/dbconfig/20220311-142147-root.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22370 and previous config saved to /var/cache/conftool/dbconfig/20220311-140641-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P22369 and previous config saved to /var/cache/conftool/dbconfig/20220311-140549-marostegui.json
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P22368 and previous config saved to /var/cache/conftool/dbconfig/20220311-135137-root.json
  • 13:49 marostegui: dbmaint on s8@eqiad T300775
  • 13:49 marostegui: dbmaint on s1@eqiad T298294
  • 13:43 jelto: update pcc facts
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P22367 and previous config saved to /var/cache/conftool/dbconfig/20220311-133633-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123', diff saved to https://phabricator.wikimedia.org/P22366 and previous config saved to /var/cache/conftool/dbconfig/20220311-133407-marostegui.json
  • 12:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cumin2001.codfw.wmnet
  • 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cumin2001.codfw.wmnet
  • 11:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2012.codfw.wmnet with OS bullseye
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 11:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 11:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2012.codfw.wmnet with reason: host reimage
  • 11:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2012.codfw.wmnet with reason: host reimage
  • 10:59 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
  • 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2110.codfw.wmnet with OS bullseye
  • 10:46 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2012.codfw.wmnet with OS bullseye
  • 10:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2011.codfw.wmnet with OS bullseye
  • 10:39 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
  • 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2110.codfw.wmnet with reason: host reimage
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2110.codfw.wmnet with reason: host reimage
  • 10:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2011.codfw.wmnet with reason: host reimage
  • 10:25 vgutierrez: disable certspotter - T303593
  • 10:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2011.codfw.wmnet with reason: host reimage
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 10:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2110.codfw.wmnet with OS bullseye
  • 10:16 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 10:09 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2011.codfw.wmnet with OS bullseye
  • 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 10:03 dcausse: manually installed jvmquake to wdqs1010 (test machine) from https://people.wikimedia.org/~jmm/jvmquake/
  • 09:54 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 09:47 vgutierrez: stopping certspotter on alert1001
  • 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 09:36 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 09:35 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 09:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 09:15 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 09:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 09:00 jayme: kubernetes2011:~# systemctl restart rsyslog.service - T289766
  • 08:52 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 08:51 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1017.eqiad.wmnet
  • 08:43 dcausse: restarting blazegraph on wdqs1012 (jvm stuck for 5hours)
  • 08:42 jynus: upgrade and restart db2139
  • 08:41 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host cloudvirt1017.eqiad.wmnet
  • 08:30 jynus: upgrade and restart db1145
  • 08:23 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1017.eqiad.wmnet
  • 08:21 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host cloudvirt1017.eqiad.wmnet
  • 08:19 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22364 and previous config saved to /var/cache/conftool/dbconfig/20220311-063921-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22363 and previous config saved to /var/cache/conftool/dbconfig/20220311-062417-root.json
  • 06:13 marostegui: Reboot dbproxy1014 T303174
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22362 and previous config saved to /var/cache/conftool/dbconfig/20220311-060913-root.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22361 and previous config saved to /var/cache/conftool/dbconfig/20220311-055409-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P22360 and previous config saved to /var/cache/conftool/dbconfig/20220311-054514-marostegui.json
  • 02:54 eileen: revision changed from 9fb68b24 to 252269c8
  • 01:56 eileen: civicrm revision changed from 8501c38c to 9fb68b24
  • 01:31 eileen: civicrm changed from 4cb2bdbc to 8501c38c
  • 00:33 TimStarling: on mwmaint1002 running populateGlobalEditCount.php
  • 00:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 00:01 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply

2022-03-10

  • 23:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 23:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 23:08 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 23:07 rzl@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 22:42 tstarling@deploy1002: Finished scap: global_edit_count gerrit 769561 (duration: 15m 12s)
  • 22:27 tstarling@deploy1002: Started scap: global_edit_count gerrit 769561
  • 22:24 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/User/CentralAuthUser.php: global_edit_count gerrit 769561 (duration: 00m 47s)
  • 22:24 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/Hooks/Handlers/UserEditCountUpdateHookHandler.php: global_edit_count gerrit 769561 (duration: 00m 47s)
  • 22:23 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/CentralAuthServices.php: global_edit_count gerrit 769561 (duration: 00m 47s)
  • 22:22 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/ServiceWiring.php: global_edit_count gerrit 769561 (duration: 00m 48s)
  • 22:21 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/CentralAuthEditCounter.php: global_edit_count gerrit 769561 (duration: 00m 48s)
  • 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:08 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:05 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:04 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:04 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:02 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:02 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:41 rzl: UTC late B&C training window done
  • 21:39 rzl@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: CommonSettings: Update comment about Image Suggestions API (T294362) (duration: 00m 48s)
  • 21:34 rzl@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/DiscussionTools/modules/controller.js: Backport: Fix highlighting of comments when reloading (T303261) (duration: 00m 47s)
  • 21:33 rzl@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/VisualEditor/modules/ve-mw: Backport: Preserve classes on media wrapper links (T292657 T303469) (duration: 00m 49s)
  • 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:18 cstone: update Donation Interface revision changed from ca37a93e to 5db12b21
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:13 rzl@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove centralauth-oversight from the config (T302675) (duration: 00m 49s)
  • 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300775)', diff saved to https://phabricator.wikimedia.org/P22356 and previous config saved to /var/cache/conftool/dbconfig/20220310-205114-marostegui.json
  • 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P22355 and previous config saved to /var/cache/conftool/dbconfig/20220310-203608-marostegui.json
  • 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P22354 and previous config saved to /var/cache/conftool/dbconfig/20220310-202103-marostegui.json
  • 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300775)', diff saved to https://phabricator.wikimedia.org/P22353 and previous config saved to /var/cache/conftool/dbconfig/20220310-200558-marostegui.json
  • 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:47 volans: installed spicerack v2.3.2 on the cumin hosts
  • 19:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:46 volans@cumin2002: END (PASS) - Cookbook sre.misc-clusters.sretest (exit_code=0) rolling restart_daemons on A:sretest
  • 19:46 volans@cumin2002: START - Cookbook sre.misc-clusters.sretest rolling restart_daemons on A:sretest
  • 19:44 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.25 refs T300201
  • 19:44 volans: uploaded spicerack_2.3.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 19:33 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 19:32 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 19:32 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 19:31 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 19:29 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 19:29 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:07 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 19:06 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 19:06 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T300775)', diff saved to https://phabricator.wikimedia.org/P22352 and previous config saved to /var/cache/conftool/dbconfig/20220310-190544-marostegui.json
  • 19:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 19:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22351 and previous config saved to /var/cache/conftool/dbconfig/20220310-190530-marostegui.json
  • 19:04 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 19:04 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:02 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:02 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:01 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 19:00 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 18:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 18:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:58 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 18:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:57 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:56 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 18:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P22350 and previous config saved to /var/cache/conftool/dbconfig/20220310-185025-marostegui.json
  • 18:46 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 18:43 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 18:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 18:41 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:40 moritzm: restarting thumbor to pick up tiff security updates
  • 18:40 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:40 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:39 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 18:36 moritzm: installing tiff security updates
  • 18:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P22349 and previous config saved to /var/cache/conftool/dbconfig/20220310-183520-marostegui.json
  • 18:33 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 18:30 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 18:29 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 18:28 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 18:27 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 18:26 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22348 and previous config saved to /var/cache/conftool/dbconfig/20220310-182015-marostegui.json
  • 18:20 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 18:19 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 18:19 razzi: cumin 'C:elasticsearch' 'systemctl restart prometheus-wmf-elasticsearch-exporter-9200.service'
  • 18:15 razzi: systemctl restart prometheus-wmf-elasticsearch-exporter-9200.service on elastic2042 for T300295
  • 18:13 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 18:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 18:11 moritzm: installing cyrus-sasl2 security updates
  • 18:08 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 18:08 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 17:51 herron: repool thanos-fe1001
  • 17:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:43 herron: depooling thanos-fe1001 for envoy upgrade
  • 17:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:41 dancy@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: wmf-config: Use __DIR__ instead of "$IP/../wmf-config" (T45956) (duration: 00m 50s)
  • 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1070.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1068.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1071.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1069.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1070.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1069.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1068.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22347 and previous config saved to /var/cache/conftool/dbconfig/20220310-172001-marostegui.json
  • 17:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 17:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300775)', diff saved to https://phabricator.wikimedia.org/P22346 and previous config saved to /var/cache/conftool/dbconfig/20220310-171953-marostegui.json
  • 17:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1071.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P22345 and previous config saved to /var/cache/conftool/dbconfig/20220310-170448-marostegui.json
  • 16:57 damilare: civicrm change revision from 9b5aafbc to 4cb2bdbc
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22344 and previous config saved to /var/cache/conftool/dbconfig/20220310-165014-ladsgroup.json
  • 16:50 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.mgmt with reason: Testing alertmanager downtime
  • 16:50 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.mgmt with reason: Testing alertmanager downtime
  • 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P22343 and previous config saved to /var/cache/conftool/dbconfig/20220310-164943-marostegui.json
  • 16:49 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:05:00 on D{cumin1001.mgmt} with reason: Testing alertmanager downtime
  • 16:49 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on D{cumin1001.mgmt} with reason: Testing alertmanager downtime
  • 16:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Testing alertmanager downtime
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22342 and previous config saved to /var/cache/conftool/dbconfig/20220310-163509-ladsgroup.json
  • 16:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300775)', diff saved to https://phabricator.wikimedia.org/P22341 and previous config saved to /var/cache/conftool/dbconfig/20220310-163438-marostegui.json
  • 16:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on doh1002.wikimedia.org with reason: testing eBPF filtering
  • 16:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on doh1002.wikimedia.org with reason: testing eBPF filtering
  • 16:30 sukhe: depool doh1002 for testing eBPF
  • 16:21 volans: uploaded spicerack_2.3.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22340 and previous config saved to /var/cache/conftool/dbconfig/20220310-162004-ladsgroup.json
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22339 and previous config saved to /var/cache/conftool/dbconfig/20220310-160457-ladsgroup.json
  • 15:57 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 15:56 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1121.eqiad.wmnet with OS bullseye
  • 15:47 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: host reimage
  • 15:37 moritzm: rolling restart of thumbor to pick up expat security updates
  • 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: host reimage
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298294)', diff saved to https://phabricator.wikimedia.org/P22338 and previous config saved to /var/cache/conftool/dbconfig/20220310-153428-marostegui.json
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T300775)', diff saved to https://phabricator.wikimedia.org/P22337 and previous config saved to /var/cache/conftool/dbconfig/20220310-153424-marostegui.json
  • 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300775)', diff saved to https://phabricator.wikimedia.org/P22336 and previous config saved to /var/cache/conftool/dbconfig/20220310-153416-marostegui.json
  • 15:33 sukhe: upload certspotter 0.10-1wm1 to apt.wm.o - T204993
  • 15:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1121.eqiad.wmnet with OS bullseye
  • 15:21 moritzm: installing expat security updates on stretch
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22335 and previous config saved to /var/cache/conftool/dbconfig/20220310-151923-marostegui.json
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P22334 and previous config saved to /var/cache/conftool/dbconfig/20220310-151910-marostegui.json
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22333 and previous config saved to /var/cache/conftool/dbconfig/20220310-150839-ladsgroup.json
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22332 and previous config saved to /var/cache/conftool/dbconfig/20220310-150803-ladsgroup.json
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22331 and previous config saved to /var/cache/conftool/dbconfig/20220310-150417-marostegui.json
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P22330 and previous config saved to /var/cache/conftool/dbconfig/20220310-150405-marostegui.json
  • 14:55 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:54 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22329 and previous config saved to /var/cache/conftool/dbconfig/20220310-145258-ladsgroup.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298294)', diff saved to https://phabricator.wikimedia.org/P22328 and previous config saved to /var/cache/conftool/dbconfig/20220310-144911-marostegui.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300775)', diff saved to https://phabricator.wikimedia.org/P22327 and previous config saved to /var/cache/conftool/dbconfig/20220310-144900-marostegui.json
  • 14:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298294)', diff saved to https://phabricator.wikimedia.org/P22326 and previous config saved to /var/cache/conftool/dbconfig/20220310-144222-marostegui.json
  • 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298294)', diff saved to https://phabricator.wikimedia.org/P22325 and previous config saved to /var/cache/conftool/dbconfig/20220310-144214-marostegui.json
  • 14:41 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 14:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22324 and previous config saved to /var/cache/conftool/dbconfig/20220310-143753-ladsgroup.json
  • 14:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22323 and previous config saved to /var/cache/conftool/dbconfig/20220310-142709-marostegui.json
  • 14:26 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:25 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22322 and previous config saved to /var/cache/conftool/dbconfig/20220310-142248-ladsgroup.json
  • 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22321 and previous config saved to /var/cache/conftool/dbconfig/20220310-141204-marostegui.json
  • 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:08 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:08 akosiaris: repool ores in eqiad in discovery records
  • 14:06 urbanecm: UTC afternoon B&C done
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298294)', diff saved to https://phabricator.wikimedia.org/P22320 and previous config saved to /var/cache/conftool/dbconfig/20220310-135659-marostegui.json
  • 13:55 akosiaris: depool ores in eqiad from discovery records to initiate reboot of rdb1011
  • 13:55 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=eqiad
  • 13:51 akosiaris: repool ores in codfw in discovery records
  • 13:50 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=codfw
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298294)', diff saved to https://phabricator.wikimedia.org/P22319 and previous config saved to /var/cache/conftool/dbconfig/20220310-135047-marostegui.json
  • 13:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 13:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298294)', diff saved to https://phabricator.wikimedia.org/P22318 and previous config saved to /var/cache/conftool/dbconfig/20220310-135039-marostegui.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T300775)', diff saved to https://phabricator.wikimedia.org/P22317 and previous config saved to /var/cache/conftool/dbconfig/20220310-134807-marostegui.json
  • 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 13:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300775)', diff saved to https://phabricator.wikimedia.org/P22316 and previous config saved to /var/cache/conftool/dbconfig/20220310-134759-marostegui.json
  • 13:43 akosiaris: reboot rdb2007 for upgrades
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22315 and previous config saved to /var/cache/conftool/dbconfig/20220310-133534-marostegui.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P22314 and previous config saved to /var/cache/conftool/dbconfig/20220310-133254-marostegui.json
  • 13:27 akosiaris: depool ores in codfw from discovery records to initiate reboot of rdb2007
  • 13:26 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=codfw
  • 13:22 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22313 and previous config saved to /var/cache/conftool/dbconfig/20220310-132234-ladsgroup.json
  • 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 13:20 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22311 and previous config saved to /var/cache/conftool/dbconfig/20220310-132029-marostegui.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P22310 and previous config saved to /var/cache/conftool/dbconfig/20220310-131748-marostegui.json
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22309 and previous config saved to /var/cache/conftool/dbconfig/20220310-131214-ladsgroup.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298294)', diff saved to https://phabricator.wikimedia.org/P22308 and previous config saved to /var/cache/conftool/dbconfig/20220310-130523-marostegui.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300775)', diff saved to https://phabricator.wikimedia.org/P22307 and previous config saved to /var/cache/conftool/dbconfig/20220310-130243-marostegui.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298294)', diff saved to https://phabricator.wikimedia.org/P22306 and previous config saved to /var/cache/conftool/dbconfig/20220310-125909-marostegui.json
  • 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298294)', diff saved to https://phabricator.wikimedia.org/P22305 and previous config saved to /var/cache/conftool/dbconfig/20220310-125901-marostegui.json
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22304 and previous config saved to /var/cache/conftool/dbconfig/20220310-125709-ladsgroup.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22303 and previous config saved to /var/cache/conftool/dbconfig/20220310-124355-marostegui.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22302 and previous config saved to /var/cache/conftool/dbconfig/20220310-124204-ladsgroup.json
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22301 and previous config saved to /var/cache/conftool/dbconfig/20220310-122850-marostegui.json
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22300 and previous config saved to /var/cache/conftool/dbconfig/20220310-122659-ladsgroup.json
  • 12:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1141.eqiad.wmnet with OS bullseye
  • 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Reboots
  • 12:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Reboots
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298294)', diff saved to https://phabricator.wikimedia.org/P22299 and previous config saved to /var/cache/conftool/dbconfig/20220310-121344-marostegui.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T300775)', diff saved to https://phabricator.wikimedia.org/P22298 and previous config saved to /var/cache/conftool/dbconfig/20220310-120228-marostegui.json
  • 12:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 12:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300775)', diff saved to https://phabricator.wikimedia.org/P22297 and previous config saved to /var/cache/conftool/dbconfig/20220310-120221-marostegui.json
  • 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1141.eqiad.wmnet with reason: host reimage
  • 11:58 marostegui: Failover m1 master
  • 11:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1141.eqiad.wmnet with reason: host reimage
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Reboots
  • 11:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Reboots
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P22296 and previous config saved to /var/cache/conftool/dbconfig/20220310-114715-marostegui.json
  • 11:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1141.eqiad.wmnet with OS bullseye
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22294 and previous config saved to /var/cache/conftool/dbconfig/20220310-113638-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P22293 and previous config saved to /var/cache/conftool/dbconfig/20220310-113210-marostegui.json
  • 11:29 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@b681376]: (no justification provided) (duration: 00m 07s)
  • 11:29 ebysans@deploy1002: Started deploy [airflow-dags/analytics@b681376]: (no justification provided)
  • 11:26 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:26 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:25 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 11:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1093.eqiad.wmnet
  • 11:24 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:24 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:24 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 11:18 volans: rolled out python3-wmflib v1.1.2 to the entire fleet (buster+ only)
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300775)', diff saved to https://phabricator.wikimedia.org/P22292 and previous config saved to /var/cache/conftool/dbconfig/20220310-111705-marostegui.json
  • 11:16 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1093.eqiad.wmnet
  • 11:14 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298294)', diff saved to https://phabricator.wikimedia.org/P22291 and previous config saved to /var/cache/conftool/dbconfig/20220310-111330-marostegui.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T300775)', diff saved to https://phabricator.wikimedia.org/P22290 and previous config saved to /var/cache/conftool/dbconfig/20220310-111320-marostegui.json
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300775)', diff saved to https://phabricator.wikimedia.org/P22289 and previous config saved to /var/cache/conftool/dbconfig/20220310-111313-marostegui.json
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:10 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:10 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
  • 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 14 hosts with reason: Maintenance
  • 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 14 hosts with reason: Maintenance
  • 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:04 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298294)', diff saved to https://phabricator.wikimedia.org/P22287 and previous config saved to /var/cache/conftool/dbconfig/20220310-110253-marostegui.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22286 and previous config saved to /var/cache/conftool/dbconfig/20220310-105807-marostegui.json
  • 10:48 jbond: re-enable puppet fleet wide
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22285 and previous config saved to /var/cache/conftool/dbconfig/20220310-104748-marostegui.json
  • 10:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
  • 10:44 akosiaris: reboot rdb2009 for upgrades
  • 10:44 jbond: disable puppet fleet wide
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22284 and previous config saved to /var/cache/conftool/dbconfig/20220310-104302-marostegui.json
  • 10:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2010.codfw.wmnet with OS bullseye
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22283 and previous config saved to /var/cache/conftool/dbconfig/20220310-103243-marostegui.json
  • 10:30 moritzm: failover ganeti master for drmrs/B13 to ganeti6004
  • 10:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
  • 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300775)', diff saved to https://phabricator.wikimedia.org/P22282 and previous config saved to /var/cache/conftool/dbconfig/20220310-102757-marostegui.json
  • 10:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
  • 10:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298294)', diff saved to https://phabricator.wikimedia.org/P22281 and previous config saved to /var/cache/conftool/dbconfig/20220310-101738-marostegui.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298294)', diff saved to https://phabricator.wikimedia.org/P22280 and previous config saved to /var/cache/conftool/dbconfig/20220310-101133-marostegui.json
  • 10:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 10:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298294)', diff saved to https://phabricator.wikimedia.org/P22279 and previous config saved to /var/cache/conftool/dbconfig/20220310-101125-marostegui.json
  • 10:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2010.codfw.wmnet with OS bullseye
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
  • 10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22278 and previous config saved to /var/cache/conftool/dbconfig/20220310-095620-marostegui.json
  • 09:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2009.codfw.wmnet with OS bullseye
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22277 and previous config saved to /var/cache/conftool/dbconfig/20220310-094115-marostegui.json
  • 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
  • 09:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T300775)', diff saved to https://phabricator.wikimedia.org/P22276 and previous config saved to /var/cache/conftool/dbconfig/20220310-092742-marostegui.json
  • 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300775)', diff saved to https://phabricator.wikimedia.org/P22275 and previous config saved to /var/cache/conftool/dbconfig/20220310-092735-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298294)', diff saved to https://phabricator.wikimedia.org/P22274 and previous config saved to /var/cache/conftool/dbconfig/20220310-092610-marostegui.json
  • 09:22 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2009.codfw.wmnet with OS bullseye
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298294)', diff saved to https://phabricator.wikimedia.org/P22273 and previous config saved to /var/cache/conftool/dbconfig/20220310-091807-marostegui.json
  • 09:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298294)', diff saved to https://phabricator.wikimedia.org/P22272 and previous config saved to /var/cache/conftool/dbconfig/20220310-091759-marostegui.json
  • 09:16 moritzm: failover ganeti master for drmrs/B12 to ganeti6003
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P22271 and previous config saved to /var/cache/conftool/dbconfig/20220310-091230-marostegui.json
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
  • 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22270 and previous config saved to /var/cache/conftool/dbconfig/20220310-090254-marostegui.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P22269 and previous config saved to /var/cache/conftool/dbconfig/20220310-085724-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22268 and previous config saved to /var/cache/conftool/dbconfig/20220310-084749-marostegui.json
  • 08:43 apergos: UTC morning backport and config window completed
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300775)', diff saved to https://phabricator.wikimedia.org/P22267 and previous config saved to /var/cache/conftool/dbconfig/20220310-084219-marostegui.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P22266 and previous config saved to /var/cache/conftool/dbconfig/20220310-084139-marostegui.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After reboot5', diff saved to https://phabricator.wikimedia.org/P22265 and previous config saved to /var/cache/conftool/dbconfig/20220310-083732-root.json
  • 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298294)', diff saved to https://phabricator.wikimedia.org/P22264 and previous config saved to /var/cache/conftool/dbconfig/20220310-083244-marostegui.json
  • 08:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P22263 and previous config saved to /var/cache/conftool/dbconfig/20220310-082737-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298294)', diff saved to https://phabricator.wikimedia.org/P22262 and previous config saved to /var/cache/conftool/dbconfig/20220310-082642-marostegui.json
  • 08:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298294)', diff saved to https://phabricator.wikimedia.org/P22261 and previous config saved to /var/cache/conftool/dbconfig/20220310-082634-marostegui.json
  • 08:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:24 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Part 2: SectionTranslation: Also add languages to target (T298237) (duration: 00m 49s)
  • 08:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T300775)', diff saved to https://phabricator.wikimedia.org/P22260 and previous config saved to /var/cache/conftool/dbconfig/20220310-082234-marostegui.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After reboot5', diff saved to https://phabricator.wikimedia.org/P22259 and previous config saved to /var/cache/conftool/dbconfig/20220310-082227-root.json
  • 08:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 08:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300775)', diff saved to https://phabricator.wikimedia.org/P22258 and previous config saved to /var/cache/conftool/dbconfig/20220310-082223-marostegui.json
  • 08:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:19 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Part 1: Enable SectionTranslation on Javanese, Tagalog, Mongolian, Telugu WPs (T298237) (duration: 00m 50s)
  • 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1, s8) for reboot', diff saved to https://phabricator.wikimedia.org/P22256 and previous config saved to /var/cache/conftool/dbconfig/20220310-081244-marostegui.json
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P22255 and previous config saved to /var/cache/conftool/dbconfig/20220310-081129-marostegui.json
  • 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P22254 and previous config saved to /var/cache/conftool/dbconfig/20220310-080718-marostegui.json
  • 08:03 marostegui: Reboot dbproxy1017 1016 T303174
  • 08:00 marostegui: Reboot dbproxy1012, 1015, 1016 T303174
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P22253 and previous config saved to /var/cache/conftool/dbconfig/20220310-075623-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P22252 and previous config saved to /var/cache/conftool/dbconfig/20220310-075213-marostegui.json
  • 07:43 marostegui: Reboot dbproxy2001, 2002, 2003, 2004 T303174
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298294)', diff saved to https://phabricator.wikimedia.org/P22251 and previous config saved to /var/cache/conftool/dbconfig/20220310-074118-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300775)', diff saved to https://phabricator.wikimedia.org/P22250 and previous config saved to /var/cache/conftool/dbconfig/20220310-073708-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298294)', diff saved to https://phabricator.wikimedia.org/P22249 and previous config saved to /var/cache/conftool/dbconfig/20220310-073523-marostegui.json
  • 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22248 and previous config saved to /var/cache/conftool/dbconfig/20220310-073022-marostegui.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T300775)', diff saved to https://phabricator.wikimedia.org/P22247 and previous config saved to /var/cache/conftool/dbconfig/20220310-072124-marostegui.json
  • 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
  • 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
  • 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22246 and previous config saved to /var/cache/conftool/dbconfig/20220310-072019-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22245 and previous config saved to /var/cache/conftool/dbconfig/20220310-071516-marostegui.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P22244 and previous config saved to /var/cache/conftool/dbconfig/20220310-070514-marostegui.json
  • 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1132.eqiad.wmnet with OS bullseye
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22243 and previous config saved to /var/cache/conftool/dbconfig/20220310-070011-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P22242 and previous config saved to /var/cache/conftool/dbconfig/20220310-065009-marostegui.json
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1132.eqiad.wmnet with reason: host reimage
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22241 and previous config saved to /var/cache/conftool/dbconfig/20220310-064506-marostegui.json
  • 06:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1132.eqiad.wmnet with reason: host reimage
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22240 and previous config saved to /var/cache/conftool/dbconfig/20220310-063858-marostegui.json
  • 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22239 and previous config saved to /var/cache/conftool/dbconfig/20220310-063850-marostegui.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22238 and previous config saved to /var/cache/conftool/dbconfig/20220310-063503-marostegui.json
  • 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1132.eqiad.wmnet with OS bullseye
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22237 and previous config saved to /var/cache/conftool/dbconfig/20220310-063017-marostegui.json
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22236 and previous config saved to /var/cache/conftool/dbconfig/20220310-062345-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22235 and previous config saved to /var/cache/conftool/dbconfig/20220310-060840-marostegui.json
  • 06:07 marostegui: dbmaint on s3@eqiad T272512
  • 06:05 marostegui: dbmaint on s7@eqiad T272512
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22234 and previous config saved to /var/cache/conftool/dbconfig/20220310-055335-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22233 and previous config saved to /var/cache/conftool/dbconfig/20220310-054701-marostegui.json
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui: dbmaint on s5@eqiad T272512
  • 05:46 marostegui: dbmaint on s4@eqiad T272512
  • 05:46 marostegui: dbmaint on pc3@eqiad T272512
  • 05:45 marostegui: dbmaint on pc2@eqiad T272512
  • 05:45 marostegui: dbmaint on pc1@eqiad T272512
  • 05:45 marostegui: dbmaint on s2@eqiad T272512
  • 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22232 and previous config saved to /var/cache/conftool/dbconfig/20220310-053950-marostegui.json
  • 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 00:26 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@7975c27]: (no justification provided) (duration: 00m 08s)
  • 00:26 ebysans@deploy1002: Started deploy [airflow-dags/analytics@7975c27]: (no justification provided)
  • 00:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-03-09

  • 23:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:09 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.25 refs T300201 (duration: 00m 49s)
  • 23:08 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 23:08 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 23:08 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.25 refs T300201
  • 23:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1047.eqiad.wmnet
  • 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1047.eqiad.wmnet
  • 22:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1047.eqiad.wmnet
  • 22:54 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1047.eqiad.wmnet
  • 22:35 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:35 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:31 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300775)', diff saved to https://phabricator.wikimedia.org/P22229 and previous config saved to /var/cache/conftool/dbconfig/20220309-223130-marostegui.json
  • 22:15 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22228 and previous config saved to /var/cache/conftool/dbconfig/20220309-221555-marostegui.json
  • 22:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22226 and previous config saved to /var/cache/conftool/dbconfig/20220309-220020-marostegui.json
  • 21:57 reedy@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/Gadgets: T303455 (duration: 00m 50s)
  • 21:54 volans: uploaded python3-wmflib_1.1.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 21:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 21:44 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300775)', diff saved to https://phabricator.wikimedia.org/P22225 and previous config saved to /var/cache/conftool/dbconfig/20220309-214445-marostegui.json
  • 21:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 21:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:06 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 20:51 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 20:49 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 20:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 20:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 20:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 19:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 19:47 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:21 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.24 refs T300201 (duration: 00m 50s)
  • 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:20 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.24 refs T300201
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:07 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.25 refs T300201 (duration: 00m 49s)
  • 19:06 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.25 refs T300201
  • 18:23 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1127 (T300775)', diff saved to https://phabricator.wikimedia.org/P22222 and previous config saved to /var/cache/conftool/dbconfig/20220309-182355-marostegui.json
  • 18:23 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22221 and previous config saved to /var/cache/conftool/dbconfig/20220309-182316-marostegui.json
  • 18:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22220 and previous config saved to /var/cache/conftool/dbconfig/20220309-180741-marostegui.json
  • 17:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22219 and previous config saved to /var/cache/conftool/dbconfig/20220309-175205-marostegui.json
  • 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:41 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 17:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:36 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22217 and previous config saved to /var/cache/conftool/dbconfig/20220309-173630-marostegui.json
  • 17:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 17:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:29 reedy@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/WebAuthn/: T303404 (duration: 00m 53s)
  • 17:29 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:28 reedy@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/WebAuthn/: T303404 (duration: 00m 51s)
  • 17:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2008.codfw.wmnet with OS bullseye
  • 17:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage
  • 17:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage
  • 16:56 akosiaris: reboot rdb[2008,2010].codfw.wmnet,rdb[1010,1012].eqiad.wmnet for upgrades
  • 16:49 akosiaris: reboot rdb2008 for upgrades
  • 16:45 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2008.codfw.wmnet with OS bullseye
  • 16:22 moritzm: installing 5.10.103 kernels on bullseye hosts
  • 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host karapace1001.eqiad.wmnet
  • 16:00 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:57 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/includes/parser/Sanitizer.php: 31189c6: Ensure that the recognizedTagData static cache is properly initialized (T303360) (duration: 00m 51s)
  • 15:56 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 15:56 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host karapace1001.eqiad.wmnet
  • 15:33 jbond: deploy gerrit:740818 to add more genral rate limits for crawling cached and upload pages
  • 15:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2007.codfw.wmnet with OS bullseye
  • 15:28 volans: uploaded spicerack_2.3.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 15:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage
  • 15:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage
  • 15:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:06 taavi: UTC afternoon deploys done
  • 15:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:06 awight@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/VisualEditor/modules/ve-mw/ui/styles/pages/ve.ui.MWParameterPage.css: Backport: Fix missing padding on inline descriptions (T303386) (duration: 00m 49s)
  • 15:05 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 6 hosts with reason: Maintenance
  • 15:05 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on 6 hosts with reason: Maintenance
  • 15:05 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 15:05 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 15:05 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298294)', diff saved to https://phabricator.wikimedia.org/P22215 and previous config saved to /var/cache/conftool/dbconfig/20220309-150523-marostegui.json
  • 15:03 awight@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/VisualEditor/modules/ve-mw/ui/styles/pages/ve.ui.MWParameterPage.css: Backport: Fix missing padding on inline descriptions (T303386) (duration: 00m 49s)
  • 15:01 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2007.codfw.wmnet with OS bullseye
  • 15:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:58 taavi@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/Gadgets/extension.json: Backport: wmf.24 HACK: Add forward class alias for Gadget (T303391) (2/2) (duration: 00m 49s)
  • 14:57 taavi@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/Gadgets/includes: Backport: wmf.24 HACK: Add forward class alias for Gadget (T303391) (1/2) (duration: 00m 50s)
  • 14:55 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1001.eqiad.wmnet with reason: Release v0.4.0 to reimaged cumin1001 - volans@cumin1001
  • 14:54 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin1001.eqiad.wmnet with reason: Release v0.4.0 to reimaged cumin1001 - volans@cumin1001
  • 14:49 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22213 and previous config saved to /var/cache/conftool/dbconfig/20220309-144948-marostegui.json
  • 14:34 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22212 and previous config saved to /var/cache/conftool/dbconfig/20220309-143413-marostegui.json
  • 14:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:27 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Add IPInfo viewing rights for certain groups (T296499) (no-op on prod) (duration: 00m 50s)
  • 14:18 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298294)', diff saved to https://phabricator.wikimedia.org/P22211 and previous config saved to /var/cache/conftool/dbconfig/20220309-141837-marostegui.json
  • 14:13 damilare: civicrm revision changed from cb0605ed to 9b5aafbc
  • 14:02 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1112 (T298294)', diff saved to https://phabricator.wikimedia.org/P22210 and previous config saved to /var/cache/conftool/dbconfig/20220309-140158-marostegui.json
  • 14:01 marostegui: Failover m5 from db1132 to db1107 - T302190
  • 14:01 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 13:59 btullis: restarting pybal on lvs1019 T301458
  • 13:51 btullis: restarting pybal on lvs102 T301458
  • 13:47 marostegui: dbmaint on s8@eqiad T272512
  • 13:46 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1101:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22209 and previous config saved to /var/cache/conftool/dbconfig/20220309-134631-marostegui.json
  • 13:45 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22208 and previous config saved to /var/cache/conftool/dbconfig/20220309-134552-marostegui.json
  • 13:42 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:42 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:42 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298294)', diff saved to https://phabricator.wikimedia.org/P22207 and previous config saved to /var/cache/conftool/dbconfig/20220309-134235-marostegui.json
  • 13:30 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22206 and previous config saved to /var/cache/conftool/dbconfig/20220309-133017-marostegui.json
  • 13:27 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22205 and previous config saved to /var/cache/conftool/dbconfig/20220309-132700-marostegui.json
  • 13:14 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22204 and previous config saved to /var/cache/conftool/dbconfig/20220309-131442-marostegui.json
  • 13:11 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22203 and previous config saved to /var/cache/conftool/dbconfig/20220309-131124-marostegui.json
  • 12:59 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22202 and previous config saved to /var/cache/conftool/dbconfig/20220309-125907-marostegui.json
  • 12:56 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest[1001-1002].eqiad.wmnet with reason: just a test
  • 12:56 jmm@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on sretest[1001-1002].eqiad.wmnet with reason: just a test
  • 12:55 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298294)', diff saved to https://phabricator.wikimedia.org/P22201 and previous config saved to /var/cache/conftool/dbconfig/20220309-125549-marostegui.json
  • 12:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cumin1001.eqiad.wmnet with OS bullseye
  • 12:26 btullis@cumin2002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 12:25 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1179 (T298294)', diff saved to https://phabricator.wikimedia.org/P22200 and previous config saved to /var/cache/conftool/dbconfig/20220309-122536-marostegui.json
  • 12:25 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:24 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298294)', diff saved to https://phabricator.wikimedia.org/P22199 and previous config saved to /var/cache/conftool/dbconfig/20220309-120554-marostegui.json
  • 11:50 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22198 and previous config saved to /var/cache/conftool/dbconfig/20220309-115019-marostegui.json
  • 11:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:43 awight: sketchy EU deployment complete.
  • 11:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:42 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Syntax highlighting color scheme update on all wikis except enwiki (T280024) (duration: 00m 50s)
  • 11:41 btullis@cumin2002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 11:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:37 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Bracket matching on all wikis except enwiki (T280023) (duration: 00m 49s)
  • 11:34 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22197 and previous config saved to /var/cache/conftool/dbconfig/20220309-113442-marostegui.json
  • 11:32 awight@deploy1002: Synchronized wmf-config/: Config: VE template expanded sidebar and inline descriptions on all wikis except enwiki (T286991) (duration: 00m 51s)
  • 11:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin1001.eqiad.wmnet with reason: host reimage
  • 11:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin1001.eqiad.wmnet with reason: host reimage
  • 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:19 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298294)', diff saved to https://phabricator.wikimedia.org/P22195 and previous config saved to /var/cache/conftool/dbconfig/20220309-111907-marostegui.json
  • 11:17 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: VE template back and delete button on all wikis except enwiki (T286990) (duration: 00m 50s)
  • 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cumin1001.eqiad.wmnet with OS bullseye
  • 11:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:11 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Template search improvements to all wikis except enwiki (T286990) (duration: 00m 51s)
  • 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1016.eqiad.wmnet
  • 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:51 btullis@cumin2002: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 10:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2001.wikimedia.org
  • 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
  • 10:39 btullis@cumin2002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 10:32 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1175 (T298294)', diff saved to https://phabricator.wikimedia.org/P22194 and previous config saved to /var/cache/conftool/dbconfig/20220309-103226-marostegui.json
  • 10:31 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298294)', diff saved to https://phabricator.wikimedia.org/P22193 and previous config saved to /var/cache/conftool/dbconfig/20220309-103146-marostegui.json
  • 10:29 marostegui: dbmaint on s6@eqiad T272512
  • 10:29 marostegui: dbmaint on s3@eqiad T298295
  • 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 10:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 10:16 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22192 and previous config saved to /var/cache/conftool/dbconfig/20220309-101610-marostegui.json
  • 10:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 10:08 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: reenable DPL on nowikimedia (duration: 00m 51s)
  • 10:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22191 and previous config saved to /var/cache/conftool/dbconfig/20220309-100036-marostegui.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2147', diff saved to https://phabricator.wikimedia.org/P22190 and previous config saved to /var/cache/conftool/dbconfig/20220309-094704-marostegui.json
  • 09:45 marostegui: dbmaint on s7@eqiad T298295
  • 09:45 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298294)', diff saved to https://phabricator.wikimedia.org/P22189 and previous config saved to /var/cache/conftool/dbconfig/20220309-094501-marostegui.json
  • 09:31 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1098:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22188 and previous config saved to /var/cache/conftool/dbconfig/20220309-093119-marostegui.json
  • 09:30 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:30 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1166 (T298294)', diff saved to https://phabricator.wikimedia.org/P22187 and previous config saved to /var/cache/conftool/dbconfig/20220309-092731-marostegui.json
  • 09:26 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:26 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:23 marostegui: dbmaint on s2@eqiad T298295
  • 09:18 marostegui: dbmaint on s1@eqiad T298295
  • 09:16 marostegui: dbmaint on s4@eqiad T298295
  • 09:07 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298294)', diff saved to https://phabricator.wikimedia.org/P22186 and previous config saved to /var/cache/conftool/dbconfig/20220309-090737-marostegui.json
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 08:53 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
  • 08:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22184 and previous config saved to /var/cache/conftool/dbconfig/20220309-085201-marostegui.json
  • 08:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 08:49 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
  • 08:46 XioNoX: Redirect one of Microsoft's range to codfw - T282861
  • 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 08:43 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
  • 08:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 08:36 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22183 and previous config saved to /var/cache/conftool/dbconfig/20220309-083626-marostegui.json
  • 08:21 marostegui: dbmaint on s3@eqiad T300380
  • 08:20 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298294)', diff saved to https://phabricator.wikimedia.org/P22182 and previous config saved to /var/cache/conftool/dbconfig/20220309-082051-marostegui.json
  • 08:11 marostegui: dbmaint on s7@eqiad T300380
  • 08:03 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1123 (T298294)', diff saved to https://phabricator.wikimedia.org/P22181 and previous config saved to /var/cache/conftool/dbconfig/20220309-080307-marostegui.json
  • 08:02 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 08:02 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 40%: After schema change', diff saved to https://phabricator.wikimedia.org/P22180 and previous config saved to /var/cache/conftool/dbconfig/20220309-075704-root.json
  • 07:55 marostegui: dbmaint on s2@eqiad T300380
  • 07:49 marostegui: dbmaint on s8@eqiad T300380
  • 07:49 marostegui: dbmaint on s4@eqiad T300380
  • 07:42 marostegui: dbmaint on s1@eqiad T300380
  • 07:42 marostegui: dbmaint on s6@eqiad T300380
  • 07:42 marostegui: dbmaint on s5@eqiad T300380
  • 07:42 marostegui: dbmaint on s5 T300380
  • 07:42 marostegui: dbmaint on s6 T300380
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22179 and previous config saved to /var/cache/conftool/dbconfig/20220309-074200-root.json
  • 07:41 marostegui: dbmaint on s1 T300380
  • 07:41 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22178 and previous config saved to /var/cache/conftool/dbconfig/20220309-074107-root.json
  • 07:34 marostegui: dbmaint on s7@eqiad T300775
  • 07:33 marostegui: dbmaint on db1123 s3@eqiad T300600
  • 07:31 elukey: manually sync pcc facts following https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Manually_update_production
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 15%: After schema change', diff saved to https://phabricator.wikimedia.org/P22177 and previous config saved to /var/cache/conftool/dbconfig/20220309-072656-root.json
  • 07:25 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22176 and previous config saved to /var/cache/conftool/dbconfig/20220309-072540-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P22175 and previous config saved to /var/cache/conftool/dbconfig/20220309-071153-root.json
  • 07:10 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22174 and previous config saved to /var/cache/conftool/dbconfig/20220309-071014-root.json
  • 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1123.eqiad.wmnet with OS bullseye
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1123.eqiad.wmnet with reason: host reimage
  • 06:54 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22173 and previous config saved to /var/cache/conftool/dbconfig/20220309-065447-root.json
  • 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1123.eqiad.wmnet with reason: host reimage
  • 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1123.eqiad.wmnet with OS bullseye
  • 06:20 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1146:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22172 and previous config saved to /var/cache/conftool/dbconfig/20220309-062010-marostegui.json
  • 06:19 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 06:06 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:06 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 01:48 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22171 and previous config saved to /var/cache/conftool/dbconfig/20220309-014831-marostegui.json
  • 01:32 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22170 and previous config saved to /var/cache/conftool/dbconfig/20220309-013256-marostegui.json
  • 01:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22169 and previous config saved to /var/cache/conftool/dbconfig/20220309-011721-marostegui.json
  • 01:01 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22168 and previous config saved to /var/cache/conftool/dbconfig/20220309-010146-marostegui.json
  • 00:53 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1105:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22167 and previous config saved to /var/cache/conftool/dbconfig/20220309-005325-marostegui.json
  • 00:52 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 00:52 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 00:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298294)', diff saved to https://phabricator.wikimedia.org/P22166 and previous config saved to /var/cache/conftool/dbconfig/20220309-005245-marostegui.json
  • 00:37 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22165 and previous config saved to /var/cache/conftool/dbconfig/20220309-003710-marostegui.json
  • 00:21 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22164 and previous config saved to /var/cache/conftool/dbconfig/20220309-002135-marostegui.json
  • 00:06 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298294)', diff saved to https://phabricator.wikimedia.org/P22163 and previous config saved to /var/cache/conftool/dbconfig/20220309-000600-marostegui.json
  • 00:02 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1182 (T298294)', diff saved to https://phabricator.wikimedia.org/P22162 and previous config saved to /var/cache/conftool/dbconfig/20220309-000250-marostegui.json
  • 00:02 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 00:02 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 00:00 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 00:00 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 00:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22161 and previous config saved to /var/cache/conftool/dbconfig/20220309-000025-marostegui.json

2022-03-08

  • 23:44 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22160 and previous config saved to /var/cache/conftool/dbconfig/20220308-234450-marostegui.json
  • 23:29 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22159 and previous config saved to /var/cache/conftool/dbconfig/20220308-232915-marostegui.json
  • 23:13 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22158 and previous config saved to /var/cache/conftool/dbconfig/20220308-231340-marostegui.json
  • 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:10 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1170:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22157 and previous config saved to /var/cache/conftool/dbconfig/20220308-231028-marostegui.json
  • 23:09 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:09 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:09 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298294)', diff saved to https://phabricator.wikimedia.org/P22156 and previous config saved to /var/cache/conftool/dbconfig/20220308-230949-marostegui.json
  • 23:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:54 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22155 and previous config saved to /var/cache/conftool/dbconfig/20220308-225413-marostegui.json
  • 22:38 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22153 and previous config saved to /var/cache/conftool/dbconfig/20220308-223838-marostegui.json
  • 22:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:24 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.25 refs T300201
  • 22:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298294)', diff saved to https://phabricator.wikimedia.org/P22152 and previous config saved to /var/cache/conftool/dbconfig/20220308-222303-marostegui.json
  • 22:20 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1162 (T298294)', diff saved to https://phabricator.wikimedia.org/P22151 and previous config saved to /var/cache/conftool/dbconfig/20220308-222055-marostegui.json
  • 22:20 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 22:20 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 22:20 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298294)', diff saved to https://phabricator.wikimedia.org/P22150 and previous config saved to /var/cache/conftool/dbconfig/20220308-222016-marostegui.json
  • 22:04 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22149 and previous config saved to /var/cache/conftool/dbconfig/20220308-220441-marostegui.json
  • 21:49 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22148 and previous config saved to /var/cache/conftool/dbconfig/20220308-214906-marostegui.json
  • 21:40 andrew@cumin1001: START - Cookbook sre.hosts.dhcp for host cloudvirt1016.eqiad.wmnet
  • 21:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:33 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298294)', diff saved to https://phabricator.wikimedia.org/P22147 and previous config saved to /var/cache/conftool/dbconfig/20220308-213331-marostegui.json
  • 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:33 urbanecm: UTC early B&C window done
  • 21:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 21:30 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1156 (T298294)', diff saved to https://phabricator.wikimedia.org/P22146 and previous config saved to /var/cache/conftool/dbconfig/20220308-213024-marostegui.json
  • 21:29 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 21:29 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 21:29 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:29 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:29 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22145 and previous config saved to /var/cache/conftool/dbconfig/20220308-212939-marostegui.json
  • 21:28 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/DiscussionTools/includes/ApiDiscussionToolsEdit.php: cc5acc2: Fix handling of disabled mobileformat (T303262) (duration: 00m 49s)
  • 21:26 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: a5c6d06: Fix handling of disabled mobileformat (T303262) (duration: 00m 49s)
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:14 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22144 and previous config saved to /var/cache/conftool/dbconfig/20220308-211404-marostegui.json
  • 21:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3132fca: Enable DiscussionTools autotopicsub on MediaWiki.org (T302256) (duration: 00m 49s)
  • 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:03 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.22 (duration: 01m 28s)
  • 21:01 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.23 (duration: 01m 46s)
  • 20:59 dancy@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.25 refs T300201 (duration: 32m 13s)
  • 20:58 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22143 and previous config saved to /var/cache/conftool/dbconfig/20220308-205829-marostegui.json
  • 20:42 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22142 and previous config saved to /var/cache/conftool/dbconfig/20220308-204254-marostegui.json
  • 20:36 rzl: rzl@apt1001:~$ sudo -i reprepro copy bullseye-wikimedia buster-wikimedia envoyproxy # T300324
  • 20:36 rzl: rzl@apt1001:~$ sudo -i reprepro copy stretch-wikimedia buster-wikimedia envoyproxy # T300324
  • 20:27 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.25 refs T300201
  • 20:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 19:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 19:52 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 19:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 19:43 XioNoX: !log push DHCP term to labs-in filters on eqiad cr
  • 19:42 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1146:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22139 and previous config saved to /var/cache/conftool/dbconfig/20220308-194159-marostegui.json
  • 19:41 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:41 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:39 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:39 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:39 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298294)', diff saved to https://phabricator.wikimedia.org/P22138 and previous config saved to /var/cache/conftool/dbconfig/20220308-193930-marostegui.json
  • 19:36 cstone: updated donorwiki revision changed from 73de4731 to ca37a93e
  • 19:32 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.25 refs T300201
  • 19:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22137 and previous config saved to /var/cache/conftool/dbconfig/20220308-192354-marostegui.json
  • 19:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:08 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22136 and previous config saved to /var/cache/conftool/dbconfig/20220308-190818-marostegui.json
  • 18:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:53 ejegg: updated payments-wiki from 3dfac3b2 to ca37a93e
  • 18:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298294)', diff saved to https://phabricator.wikimedia.org/P22135 and previous config saved to /var/cache/conftool/dbconfig/20220308-185242-marostegui.json
  • 18:50 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1129 (T298294)', diff saved to https://phabricator.wikimedia.org/P22134 and previous config saved to /var/cache/conftool/dbconfig/20220308-185033-marostegui.json
  • 18:49 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 18:49 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 18:49 vgutierrez@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp5004.eqsin.wmnet
  • 18:49 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 18:48 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 18:48 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 18:47 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 18:47 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 18:47 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 18:47 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 18:47 vgutierrez@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp1085.eqiad.wmnet
  • 18:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:35 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:35 bblack: cp10[3579] - restarting varnish-fe
  • 18:29 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 18:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1032.eqiad.wmnet with OS buster
  • 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1012.eqiad.wmnet with OS stretch
  • 18:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
  • 18:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
  • 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
  • 18:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
  • 17:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS buster
  • 17:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1012.eqiad.wmnet with OS stretch
  • 17:48 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298294)', diff saved to https://phabricator.wikimedia.org/P22133 and previous config saved to /var/cache/conftool/dbconfig/20220308-174838-marostegui.json
  • 17:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 17:33 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22132 and previous config saved to /var/cache/conftool/dbconfig/20220308-173302-marostegui.json
  • 17:27 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 17:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22131 and previous config saved to /var/cache/conftool/dbconfig/20220308-171728-marostegui.json
  • 17:07 jbond: deploy minor clean up of puppetmaster classes gerrit:769072
  • 17:01 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298294)', diff saved to https://phabricator.wikimedia.org/P22130 and previous config saved to /var/cache/conftool/dbconfig/20220308-170153-marostegui.json
  • 17:01 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 16:58 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1161 (T298294)', diff saved to https://phabricator.wikimedia.org/P22129 and previous config saved to /var/cache/conftool/dbconfig/20220308-165843-marostegui.json
  • 16:58 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:58 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:58 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 16:57 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 16:56 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:56 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 16:54 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:54 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:54 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22128 and previous config saved to /var/cache/conftool/dbconfig/20220308-165436-marostegui.json
  • 16:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 16:54 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on 10 hosts with reason: Maintenance
  • 16:53 inflatador: bking@deneb manually installed tox for T293862 . moritzm will add puppet patch for this
  • 16:53 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on 10 hosts with reason: Maintenance
  • 16:53 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:53 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 16:39 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22127 and previous config saved to /var/cache/conftool/dbconfig/20220308-163901-marostegui.json
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22126 and previous config saved to /var/cache/conftool/dbconfig/20220308-163835-root.json
  • 16:34 rzl: rzl@apt1001:~$ sudo -i reprepro -C main includedeb buster-wikimedia /home/rzl/envoyproxy_1.18.3-1_amd64.deb # reimporting from component/envoy-future into main, for T300324
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22125 and previous config saved to /var/cache/conftool/dbconfig/20220308-162331-root.json
  • 16:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22124 and previous config saved to /var/cache/conftool/dbconfig/20220308-162326-marostegui.json
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22123 and previous config saved to /var/cache/conftool/dbconfig/20220308-160815-root.json
  • 16:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22122 and previous config saved to /var/cache/conftool/dbconfig/20220308-160751-marostegui.json
  • 16:05 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1113:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22121 and previous config saved to /var/cache/conftool/dbconfig/20220308-160542-marostegui.json
  • 16:05 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 16:05 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 16:04 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 16:04 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T298294)', diff saved to https://phabricator.wikimedia.org/P22120 and previous config saved to /var/cache/conftool/dbconfig/20220308-160416-marostegui.json
  • 16:02 inflatador: bking@deneb manually installed openjdk-11-jdk for T293862 . moritzm will add puppet patch for this
  • 15:55 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22119 and previous config saved to /var/cache/conftool/dbconfig/20220308-155312-root.json
  • 15:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:48 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22118 and previous config saved to /var/cache/conftool/dbconfig/20220308-154841-marostegui.json
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P22117 and previous config saved to /var/cache/conftool/dbconfig/20220308-154507-marostegui.json
  • 15:42 XioNoX: update capirca hosts definitions
  • 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22116 and previous config saved to /var/cache/conftool/dbconfig/20220308-154232-root.json
  • 15:40 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:39 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:33 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22115 and previous config saved to /var/cache/conftool/dbconfig/20220308-153306-marostegui.json
  • 15:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T298294)', diff saved to https://phabricator.wikimedia.org/P22114 and previous config saved to /var/cache/conftool/dbconfig/20220308-151731-marostegui.json
  • 15:14 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1100 (T298294)', diff saved to https://phabricator.wikimedia.org/P22113 and previous config saved to /var/cache/conftool/dbconfig/20220308-151446-marostegui.json
  • 15:14 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 15:14 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 15:14 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298294)', diff saved to https://phabricator.wikimedia.org/P22112 and previous config saved to /var/cache/conftool/dbconfig/20220308-151406-marostegui.json
  • 14:58 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22111 and previous config saved to /var/cache/conftool/dbconfig/20220308-145831-marostegui.json
  • 14:42 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22110 and previous config saved to /var/cache/conftool/dbconfig/20220308-144256-marostegui.json
  • 14:33 urbanecm: UTC afternoon B&C window done
  • 14:32 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/DiscussionTools/includes/Notifications/DiscussionToolsEventTrait.php: 23939c7: Fix logic for finding the oldest comment in a bundle (T302014) (duration: 00m 50s)
  • 14:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:27 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298294)', diff saved to https://phabricator.wikimedia.org/P22109 and previous config saved to /var/cache/conftool/dbconfig/20220308-142721-marostegui.json
  • 14:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:24 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1110 (T298294)', diff saved to https://phabricator.wikimedia.org/P22108 and previous config saved to /var/cache/conftool/dbconfig/20220308-142412-marostegui.json
  • 14:23 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22107 and previous config saved to /var/cache/conftool/dbconfig/20220308-142332-marostegui.json
  • 14:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22104 and previous config saved to /var/cache/conftool/dbconfig/20220308-140758-marostegui.json
  • 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:06 dcaro@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1003.wikimedia.org
  • 14:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 75465dd: fawiki: Add patrolmarks right to autopatrolled group (T303269) (duration: 00m 49s)
  • 13:56 aqu@deploy1002: Finished deploy [airflow-dags/analytics@d1c8ae0]: Fix wikidata_item_page_link destination table after tests (duration: 00m 07s)
  • 13:56 aqu@deploy1002: Started deploy [airflow-dags/analytics@d1c8ae0]: Fix wikidata_item_page_link destination table after tests
  • 13:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22103 and previous config saved to /var/cache/conftool/dbconfig/20220308-135223-marostegui.json
  • 13:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
  • 13:46 dcaro@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1003.wikimedia.org
  • 13:40 aqu@deploy1002: Finished deploy [airflow-dags/analytics@725f528]: Set wikidata/item_page_link/weekly start date in production (duration: 00m 07s)
  • 13:40 aqu@deploy1002: Started deploy [airflow-dags/analytics@725f528]: Set wikidata/item_page_link/weekly start date in production
  • 13:40 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
  • 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
  • 13:36 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22102 and previous config saved to /var/cache/conftool/dbconfig/20220308-133647-marostegui.json
  • 13:34 btullis@cumin2002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 13:33 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1144:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22101 and previous config saved to /var/cache/conftool/dbconfig/20220308-133335-marostegui.json
  • 13:33 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:33 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:32 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22100 and previous config saved to /var/cache/conftool/dbconfig/20220308-133255-marostegui.json
  • 13:31 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
  • 13:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 13:17 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1004.wikimedia.org
  • 13:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
  • 13:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22099 and previous config saved to /var/cache/conftool/dbconfig/20220308-131720-marostegui.json
  • 13:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T300775)', diff saved to https://phabricator.wikimedia.org/P22098 and previous config saved to /var/cache/conftool/dbconfig/20220308-131420-marostegui.json
  • 13:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22097 and previous config saved to /var/cache/conftool/dbconfig/20220308-131309-root.json
  • 13:09 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 13:07 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
  • 13:07 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1005.wikimedia.org
  • 13:01 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22096 and previous config saved to /var/cache/conftool/dbconfig/20220308-130145-marostegui.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22095 and previous config saved to /var/cache/conftool/dbconfig/20220308-125806-root.json
  • 12:57 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
  • 12:56 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1021.eqiad.wmnet
  • 12:51 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1021.eqiad.wmnet
  • 12:51 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudcephosd1021.eqiad.wmnet
  • 12:51 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1021.eqiad.wmnet
  • 12:46 btullis@cumin2002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 12:46 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22094 and previous config saved to /var/cache/conftool/dbconfig/20220308-124610-marostegui.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22093 and previous config saved to /var/cache/conftool/dbconfig/20220308-124302-root.json
  • 12:42 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1096:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22092 and previous config saved to /var/cache/conftool/dbconfig/20220308-124257-marostegui.json
  • 12:42 btullis@cumin2002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 12:42 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:42 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22091 and previous config saved to /var/cache/conftool/dbconfig/20220308-122752-root.json
  • 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 12:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298294)', diff saved to https://phabricator.wikimedia.org/P22090 and previous config saved to /var/cache/conftool/dbconfig/20220308-121443-marostegui.json
  • 12:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P22089 and previous config saved to /var/cache/conftool/dbconfig/20220308-115938-marostegui.json
  • 11:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:58 volans: uploaded spicerack_2.2.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 11:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:55 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use namespaced ApiFeatureUsageQueryEngineElastica T302907 (duration: 00m 49s)
  • 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
  • 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
  • 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 11:51 btullis@cumin2002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 11:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1083.eqiad.wmnet with OS buster
  • 11:48 vgutierrez: pool cp1083 with HAProxy as TLS termination layer - T290005
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P22088 and previous config saved to /var/cache/conftool/dbconfig/20220308-114434-marostegui.json
  • 11:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22086 and previous config saved to /var/cache/conftool/dbconfig/20220308-113424-root.json
  • 11:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2008.codfw.wmnet
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T300381)', diff saved to https://phabricator.wikimedia.org/P22085 and previous config saved to /var/cache/conftool/dbconfig/20220308-113110-marostegui.json
  • 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300381)', diff saved to https://phabricator.wikimedia.org/P22084 and previous config saved to /var/cache/conftool/dbconfig/20220308-113102-marostegui.json
  • 11:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298294)', diff saved to https://phabricator.wikimedia.org/P22083 and previous config saved to /var/cache/conftool/dbconfig/20220308-112929-marostegui.json
  • 11:29 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T298294)', diff saved to https://phabricator.wikimedia.org/P22082 and previous config saved to /var/cache/conftool/dbconfig/20220308-112811-marostegui.json
  • 11:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22081 and previous config saved to /var/cache/conftool/dbconfig/20220308-112804-marostegui.json
  • 11:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 11:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2008.codfw.wmnet
  • 11:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2007.codfw.wmnet
  • 11:20 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22080 and previous config saved to /var/cache/conftool/dbconfig/20220308-111920-root.json
  • 11:18 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2007.codfw.wmnet
  • 11:17 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22079 and previous config saved to /var/cache/conftool/dbconfig/20220308-111558-marostegui.json
  • 11:15 XioNoX: Cleanup transport-in filters for codfw/eqiad (CR747551)
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P22078 and previous config saved to /var/cache/conftool/dbconfig/20220308-111259-marostegui.json
  • 11:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2006.codfw.wmnet
  • 11:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 11:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1083.eqiad.wmnet with OS buster
  • 11:10 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1083.eqiad.wmnet with OS buster
  • 11:09 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 11:08 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
  • 11:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1003.eqiad.wmnet
  • 11:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 11:05 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2006.codfw.wmnet
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22077 and previous config saved to /var/cache/conftool/dbconfig/20220308-110416-root.json
  • 11:03 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
  • 11:03 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
  • 11:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 11:02 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 11:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2005.codfw.wmnet
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22076 and previous config saved to /var/cache/conftool/dbconfig/20220308-110053-marostegui.json
  • 10:59 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:59 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 10:59 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
  • 10:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
  • 10:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P22075 and previous config saved to /var/cache/conftool/dbconfig/20220308-105754-marostegui.json
  • 10:57 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
  • 10:54 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2005.codfw.wmnet
  • 10:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
  • 10:52 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
  • 10:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
  • 10:51 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
  • 10:51 btullis: btullis@datahubsearch1001:~$ sudo systemctl reset-failed ifup@ens13.service T273026
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22074 and previous config saved to /var/cache/conftool/dbconfig/20220308-104913-root.json
  • 10:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
  • 10:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1083.eqiad.wmnet with OS buster
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300381)', diff saved to https://phabricator.wikimedia.org/P22073 and previous config saved to /var/cache/conftool/dbconfig/20220308-104548-marostegui.json
  • 10:45 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
  • 10:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22072 and previous config saved to /var/cache/conftool/dbconfig/20220308-104250-marostegui.json
  • 10:39 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2035.codfw.wmnet with OS buster
  • 10:39 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
  • 10:36 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
  • 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
  • 10:34 vgutierrez: pool cp2035 with HAProxy as TLS termination layer - T290005
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22070 and previous config saved to /var/cache/conftool/dbconfig/20220308-103409-root.json
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22069 and previous config saved to /var/cache/conftool/dbconfig/20220308-103251-marostegui.json
  • 10:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T298294)', diff saved to https://phabricator.wikimedia.org/P22068 and previous config saved to /var/cache/conftool/dbconfig/20220308-103243-marostegui.json
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T300381)', diff saved to https://phabricator.wikimedia.org/P22067 and previous config saved to /var/cache/conftool/dbconfig/20220308-103017-marostegui.json
  • 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 10:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
  • 10:27 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
  • 10:27 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
  • 10:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
  • 10:22 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
  • 10:19 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
  • 10:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P22066 and previous config saved to /var/cache/conftool/dbconfig/20220308-101739-marostegui.json
  • 10:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
  • 10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
  • 10:12 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
  • 10:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
  • 10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300381)', diff saved to https://phabricator.wikimedia.org/P22065 and previous config saved to /var/cache/conftool/dbconfig/20220308-100559-marostegui.json
  • 10:03 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P22064 and previous config saved to /var/cache/conftool/dbconfig/20220308-100234-marostegui.json
  • 09:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2035.codfw.wmnet with OS buster
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22063 and previous config saved to /var/cache/conftool/dbconfig/20220308-095055-marostegui.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T298294)', diff saved to https://phabricator.wikimedia.org/P22062 and previous config saved to /var/cache/conftool/dbconfig/20220308-094730-marostegui.json
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T298294)', diff saved to https://phabricator.wikimedia.org/P22061 and previous config saved to /var/cache/conftool/dbconfig/20220308-094613-marostegui.json
  • 09:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298294)', diff saved to https://phabricator.wikimedia.org/P22060 and previous config saved to /var/cache/conftool/dbconfig/20220308-094605-marostegui.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T300775)', diff saved to https://phabricator.wikimedia.org/P22059 and previous config saved to /var/cache/conftool/dbconfig/20220308-094354-marostegui.json
  • 09:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 09:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22058 and previous config saved to /var/cache/conftool/dbconfig/20220308-094155-root.json
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22057 and previous config saved to /var/cache/conftool/dbconfig/20220308-093550-marostegui.json
  • 09:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2022.codfw.wmnet
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P22056 and previous config saved to /var/cache/conftool/dbconfig/20220308-093101-marostegui.json
  • 09:27 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2022.codfw.wmnet
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22055 and previous config saved to /var/cache/conftool/dbconfig/20220308-092651-root.json
  • 09:26 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2021.codfw.wmnet
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300381)', diff saved to https://phabricator.wikimedia.org/P22054 and previous config saved to /var/cache/conftool/dbconfig/20220308-092045-marostegui.json
  • 09:18 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2021.codfw.wmnet
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P22053 and previous config saved to /var/cache/conftool/dbconfig/20220308-091556-marostegui.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22052 and previous config saved to /var/cache/conftool/dbconfig/20220308-091147-root.json
  • 09:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2020.codfw.wmnet
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T300381)', diff saved to https://phabricator.wikimedia.org/P22051 and previous config saved to /var/cache/conftool/dbconfig/20220308-090531-marostegui.json
  • 09:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2020.codfw.wmnet
  • 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2019.codfw.wmnet
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298294)', diff saved to https://phabricator.wikimedia.org/P22050 and previous config saved to /var/cache/conftool/dbconfig/20220308-090051-marostegui.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T298294)', diff saved to https://phabricator.wikimedia.org/P22049 and previous config saved to /var/cache/conftool/dbconfig/20220308-085934-marostegui.json
  • 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T298294)', diff saved to https://phabricator.wikimedia.org/P22048 and previous config saved to /var/cache/conftool/dbconfig/20220308-085921-marostegui.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22047 and previous config saved to /var/cache/conftool/dbconfig/20220308-085644-root.json
  • 08:54 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2019.codfw.wmnet
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P22046 and previous config saved to /var/cache/conftool/dbconfig/20220308-084416-marostegui.json
  • 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300381)', diff saved to https://phabricator.wikimedia.org/P22045 and previous config saved to /var/cache/conftool/dbconfig/20220308-084148-marostegui.json
  • 08:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2018.codfw.wmnet
  • 08:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2018.codfw.wmnet
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P22044 and previous config saved to /var/cache/conftool/dbconfig/20220308-082912-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22043 and previous config saved to /var/cache/conftool/dbconfig/20220308-082643-marostegui.json
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:14 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Add image experiment for fa/fr/pt/trwiki (T302828) (duration: 00m 49s)
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T298294)', diff saved to https://phabricator.wikimedia.org/P22042 and previous config saved to /var/cache/conftool/dbconfig/20220308-081407-marostegui.json
  • 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22041 and previous config saved to /var/cache/conftool/dbconfig/20220308-081138-marostegui.json
  • 08:11 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1004.eqiad.wmnet
  • 08:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage1004.eqiad.wmnet
  • 08:01 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1003.eqiad.wmnet
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300381)', diff saved to https://phabricator.wikimedia.org/P22040 and previous config saved to /var/cache/conftool/dbconfig/20220308-075634-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T298294)', diff saved to https://phabricator.wikimedia.org/P22039 and previous config saved to /var/cache/conftool/dbconfig/20220308-075345-marostegui.json
  • 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22038 and previous config saved to /var/cache/conftool/dbconfig/20220308-075338-marostegui.json
  • 07:53 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage1003.eqiad.wmnet
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T300381)', diff saved to https://phabricator.wikimedia.org/P22037 and previous config saved to /var/cache/conftool/dbconfig/20220308-074136-marostegui.json
  • 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 07:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P22036 and previous config saved to /var/cache/conftool/dbconfig/20220308-073833-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P22035 and previous config saved to /var/cache/conftool/dbconfig/20220308-072329-marostegui.json
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300381)', diff saved to https://phabricator.wikimedia.org/P22034 and previous config saved to /var/cache/conftool/dbconfig/20220308-071724-marostegui.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22033 and previous config saved to /var/cache/conftool/dbconfig/20220308-070824-marostegui.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22032 and previous config saved to /var/cache/conftool/dbconfig/20220308-070728-marostegui.json
  • 07:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22031 and previous config saved to /var/cache/conftool/dbconfig/20220308-070721-marostegui.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22030 and previous config saved to /var/cache/conftool/dbconfig/20220308-070219-marostegui.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P22029 and previous config saved to /var/cache/conftool/dbconfig/20220308-065216-marostegui.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22028 and previous config saved to /var/cache/conftool/dbconfig/20220308-064714-marostegui.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P22027 and previous config saved to /var/cache/conftool/dbconfig/20220308-063711-marostegui.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300381)', diff saved to https://phabricator.wikimedia.org/P22026 and previous config saved to /var/cache/conftool/dbconfig/20220308-063210-marostegui.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22025 and previous config saved to /var/cache/conftool/dbconfig/20220308-062206-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22024 and previous config saved to /var/cache/conftool/dbconfig/20220308-062100-marostegui.json
  • 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T300775)', diff saved to https://phabricator.wikimedia.org/P22023 and previous config saved to /var/cache/conftool/dbconfig/20220308-061842-marostegui.json
  • 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T300381)', diff saved to https://phabricator.wikimedia.org/P22022 and previous config saved to /var/cache/conftool/dbconfig/20220308-061700-marostegui.json
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22021 and previous config saved to /var/cache/conftool/dbconfig/20220308-061609-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22020 and previous config saved to /var/cache/conftool/dbconfig/20220308-060106-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22019 and previous config saved to /var/cache/conftool/dbconfig/20220308-054602-root.json
  • 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:57 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@1c598f5]: (no justification provided) (duration: 00m 04s)
  • 01:57 ebysans@deploy1002: Started deploy [airflow-dags/analytics@1c598f5]: (no justification provided)
  • 01:32 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@1c598f5]: (no justification provided) (duration: 00m 08s)
  • 01:31 ebysans@deploy1002: Started deploy [airflow-dags/analytics@1c598f5]: (no justification provided)
  • 01:22 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@21af07c]: (no justification provided) (duration: 00m 07s)
  • 01:22 ebysans@deploy1002: Started deploy [airflow-dags/analytics@21af07c]: (no justification provided)
  • 01:11 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@c47e886]: (no justification provided) (duration: 00m 04s)
  • 01:11 ebysans@deploy1002: Started deploy [airflow-dags/analytics@c47e886]: (no justification provided)
  • 01:07 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@c47e886]: (no justification provided) (duration: 00m 08s)
  • 01:07 ebysans@deploy1002: Started deploy [airflow-dags/analytics@c47e886]: (no justification provided)
  • 00:34 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@c8a753b]: (no justification provided) (duration: 00m 07s)
  • 00:34 ebysans@deploy1002: Started deploy [airflow-dags/analytics@c8a753b]: (no justification provided)
  • 00:08 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@b5f7840]: (no justification provided) (duration: 00m 08s)
  • 00:08 ebysans@deploy1002: Started deploy [airflow-dags/analytics@b5f7840]: (no justification provided)

2022-03-07

  • 23:50 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mx2001.wikimedia.org with reason: reboot
  • 23:50 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on mx2001.wikimedia.org with reason: reboot
  • 23:49 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mx1001.wikimedia.org with reason: reboot
  • 23:49 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on mx1001.wikimedia.org with reason: reboot
  • 23:40 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mirror1001.wikimedia.org with reason: reboot
  • 23:40 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on mirror1001.wikimedia.org with reason: reboot
  • 22:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1003.wikimedia.org with OS bullseye
  • 22:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 22:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 22:25 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
  • 22:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1003.wikimedia.org with OS bullseye
  • 22:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 22:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 21:49 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
  • 21:38 urbanecm: UTC late B&C window done
  • 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:37 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.24/skins/Vector/includes/SkinVector.php: eac551c: Fix language alert regression (T302018) (duration: 00m 50s)
  • 21:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:22 eileen: config aa7dcd88 -> 16fa8e1c
  • 20:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1003.wikimedia.org with OS bullseye
  • 20:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 20:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 19:49 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
  • 18:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22016 and previous config saved to /var/cache/conftool/dbconfig/20220307-181310-marostegui.json
  • 17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22015 and previous config saved to /var/cache/conftool/dbconfig/20220307-175805-marostegui.json
  • 17:55 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2002.codfw.wmnet
  • 17:49 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
  • 17:47 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubestage2002.codfw.wmnet
  • 17:47 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
  • 17:44 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet
  • 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22014 and previous config saved to /var/cache/conftool/dbconfig/20220307-174300-marostegui.json
  • 17:36 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet
  • 17:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1004.wikimedia.org with OS bullseye
  • 17:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1022.eqiad.wmnet
  • 17:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22013 and previous config saved to /var/cache/conftool/dbconfig/20220307-172755-marostegui.json
  • 17:24 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1022.eqiad.wmnet
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22012 and previous config saved to /var/cache/conftool/dbconfig/20220307-172134-marostegui.json
  • 17:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22011 and previous config saved to /var/cache/conftool/dbconfig/20220307-172126-marostegui.json
  • 17:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp5004.eqsin.wmnet with reason: HW issues see T303043
  • 17:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp5004.eqsin.wmnet with reason: HW issues see T303043
  • 17:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1004.wikimedia.org with reason: host reimage
  • 17:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3058.esams.wmnet with OS buster
  • 17:07 vgutierrez: pool cp3058 with HAProxy as TLS termination layer - T290005
  • 17:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1004.wikimedia.org with reason: host reimage
  • 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22010 and previous config saved to /var/cache/conftool/dbconfig/20220307-170622-marostegui.json
  • 17:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
  • 16:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
  • 16:58 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 16:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
  • 16:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1004.wikimedia.org with OS bullseye
  • 16:52 vgutierrez: depool cp5004 - T303043
  • 16:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22009 and previous config saved to /var/cache/conftool/dbconfig/20220307-165117-marostegui.json
  • 16:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
  • 16:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
  • 16:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 16:46 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 16:45 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 16:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
  • 16:44 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 16:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
  • 16:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5010.eqsin.wmnet with OS buster
  • 16:41 vgutierrez: pool cp5010 with HAProxy as TLS termination layer - T290005
  • 16:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 16:36 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
  • 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22008 and previous config saved to /var/cache/conftool/dbconfig/20220307-163612-marostegui.json
  • 16:36 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 16:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
  • 16:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
  • 16:29 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host graphite2003.codfw.wmnet
  • 16:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
  • 16:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
  • 16:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22007 and previous config saved to /var/cache/conftool/dbconfig/20220307-162821-marostegui.json
  • 16:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 16:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 16:27 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
  • 16:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
  • 16:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
  • 16:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
  • 16:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
  • 16:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 16:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 16:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300381)', diff saved to https://phabricator.wikimedia.org/P22006 and previous config saved to /var/cache/conftool/dbconfig/20220307-162157-marostegui.json
  • 16:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
  • 16:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2002.wikimedia.org
  • 16:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
  • 16:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3058.esams.wmnet with OS buster
  • 16:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5010.eqsin.wmnet with reason: host reimage
  • 16:17 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
  • 16:16 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 16:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
  • 16:16 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host gerrit2002.wikimedia.org
  • 16:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5010.eqsin.wmnet with reason: host reimage
  • 16:14 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 16:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
  • 16:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 16:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
  • 16:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
  • 16:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
  • 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22005 and previous config saved to /var/cache/conftool/dbconfig/20220307-160650-marostegui.json
  • 16:06 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 16:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
  • 16:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
  • 16:04 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
  • 16:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
  • 16:04 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
  • 16:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
  • 16:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
  • 16:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
  • 15:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
  • 15:58 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 15:56 jayme: eqiad: kubectl -n istio-system delete po istiod-69d679d8b5-hm64j - T303184
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22004 and previous config saved to /var/cache/conftool/dbconfig/20220307-155146-marostegui.json
  • 15:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5010.eqsin.wmnet with OS buster
  • 15:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp1085.eqiad.wmnet with reason: HW issues see T303183
  • 15:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp1085.eqiad.wmnet with reason: HW issues see T303183
  • 15:38 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1085.eqiad.wmnet with OS buster
  • 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300381)', diff saved to https://phabricator.wikimedia.org/P22003 and previous config saved to /var/cache/conftool/dbconfig/20220307-153641-marostegui.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T300381)', diff saved to https://phabricator.wikimedia.org/P22002 and previous config saved to /var/cache/conftool/dbconfig/20220307-153357-marostegui.json
  • 15:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300381)', diff saved to https://phabricator.wikimedia.org/P22001 and previous config saved to /var/cache/conftool/dbconfig/20220307-153343-marostegui.json
  • 15:20 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@7642d65]: (no justification provided) (duration: 00m 07s)
  • 15:20 ntsako@deploy1002: Started deploy [airflow-dags/analytics@7642d65]: (no justification provided)
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22000 and previous config saved to /var/cache/conftool/dbconfig/20220307-151929-root.json
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21999 and previous config saved to /var/cache/conftool/dbconfig/20220307-151839-marostegui.json
  • 15:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:09 ntsako@deploy1002: Finished deploy [airflow-dags/analytics_test@7642d65]: (no justification provided) (duration: 00m 09s)
  • 15:09 ntsako@deploy1002: Started deploy [airflow-dags/analytics_test@7642d65]: (no justification provided)
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 15:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21998 and previous config saved to /var/cache/conftool/dbconfig/20220307-150426-root.json
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 15:03 vgutierrez: pool cp4030 with HAProxy as TLS termination layer - T290005
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21997 and previous config saved to /var/cache/conftool/dbconfig/20220307-150334-marostegui.json
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
  • 15:02 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4030.ulsfo.wmnet with OS buster
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 14:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
  • 14:56 vgutierrez: depool cp1085
  • 14:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21996 and previous config saved to /var/cache/conftool/dbconfig/20220307-144922-root.json
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300381)', diff saved to https://phabricator.wikimedia.org/P21995 and previous config saved to /var/cache/conftool/dbconfig/20220307-144829-marostegui.json
  • 14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:45 vgutierrez: pool cp1085 with HAProxy as TLS termination layer - T290005
  • 14:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 14:37 urbanecm@deploy1002: Synchronized static/images/project-logos/: f50c474: Revert "Change temporary logo for slwiki" (T302661; 2/2) (duration: 00m 48s)
  • 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:36 urbanecm@deploy1002: Synchronized wmf-config/logos.php: f50c474: Revert "Change temporary logo for slwiki" (T302661; 1/2) (duration: 00m 49s)
  • 14:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:35 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@46d88a2]: (no justification provided) (duration: 00m 04s)
  • 14:35 ntsako@deploy1002: Started deploy [airflow-dags/analytics@46d88a2]: (no justification provided)
  • 14:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 14:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4030.ulsfo.wmnet with reason: host reimage
  • 14:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21994 and previous config saved to /var/cache/conftool/dbconfig/20220307-143419-root.json
  • 14:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1003.eqiad.wmnet
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21993 and previous config saved to /var/cache/conftool/dbconfig/20220307-143229-ladsgroup.json
  • 14:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4030.ulsfo.wmnet with reason: host reimage
  • 14:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host etherpad1003.eqiad.wmnet
  • 14:30 moritzm: rebooting etherpad1003 (running etherpad1003) for kernel update
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 14:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
  • 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
  • 14:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
  • 14:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
  • 14:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
  • 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
  • 14:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
  • 14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21992 and previous config saved to /var/cache/conftool/dbconfig/20220307-141915-root.json
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21991 and previous config saved to /var/cache/conftool/dbconfig/20220307-141911-root.json
  • 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 64b1284: Enable reply tool by default on enwiki (T296645) (duration: 00m 49s)
  • 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21990 and previous config saved to /var/cache/conftool/dbconfig/20220307-141724-ladsgroup.json
  • 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8f20ec9: fawiki: Disable creating community books and remove "Create a book" link from sidebar (T303173) (duration: 00m 49s)
  • 14:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4030.ulsfo.wmnet with OS buster
  • 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
  • 14:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
  • 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
  • 14:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1085.eqiad.wmnet with OS buster
  • 14:08 urbanecm@deploy1002: Synchronized logos/config.yaml: 8619f59: etwikiquote: Update logo (T302683; 3/3) (duration: 00m 49s)
  • 14:07 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 8619f59: etwikiquote: Update logo (T302683; 2/3) (duration: 00m 49s)
  • 14:07 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/etwikiquote.png (T302683)
  • 14:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: 8619f59: etwikiquote: Update logo (T302683; 1/3) (duration: 00m 50s)
  • 14:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
  • 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21989 and previous config saved to /var/cache/conftool/dbconfig/20220307-140408-root.json
  • 14:02 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2037.codfw.wmnet with OS buster
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21988 and previous config saved to /var/cache/conftool/dbconfig/20220307-140219-ladsgroup.json
  • 14:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
  • 14:00 kormat: removing cumin2001 grants from all db sections T276589
  • 14:00 vgutierrez: pool cp2037 with HAProxy as TLS termination layer - T290005
  • 14:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
  • 13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21987 and previous config saved to /var/cache/conftool/dbconfig/20220307-135614-ladsgroup.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21986 and previous config saved to /var/cache/conftool/dbconfig/20220307-134904-root.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T300381)', diff saved to https://phabricator.wikimedia.org/P21985 and previous config saved to /var/cache/conftool/dbconfig/20220307-134848-marostegui.json
  • 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 13:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300381)', diff saved to https://phabricator.wikimedia.org/P21984 and previous config saved to /var/cache/conftool/dbconfig/20220307-134840-marostegui.json
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21983 and previous config saved to /var/cache/conftool/dbconfig/20220307-134715-ladsgroup.json
  • 13:47 aqu@deploy1002: Finished deploy [analytics/refinery@51d074b] (hadoop-test): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 07m 17s)
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21982 and previous config saved to /var/cache/conftool/dbconfig/20220307-134109-ladsgroup.json
  • 13:39 aqu@deploy1002: Started deploy [analytics/refinery@51d074b] (hadoop-test): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
  • 13:39 aqu@deploy1002: Finished deploy [analytics/refinery@51d074b] (thin): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 00m 08s)
  • 13:39 aqu@deploy1002: Started deploy [analytics/refinery@51d074b] (thin): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
  • 13:37 aqu@deploy1002: Finished deploy [analytics/refinery@51d074b]: Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 25m 04s)
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21981 and previous config saved to /var/cache/conftool/dbconfig/20220307-133400-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21980 and previous config saved to /var/cache/conftool/dbconfig/20220307-133335-marostegui.json
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21979 and previous config saved to /var/cache/conftool/dbconfig/20220307-132605-ladsgroup.json
  • 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1142.eqiad.wmnet with OS bullseye
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21978 and previous config saved to /var/cache/conftool/dbconfig/20220307-131857-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21977 and previous config saved to /var/cache/conftool/dbconfig/20220307-131830-marostegui.json
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 (s5,s6)', diff saved to https://phabricator.wikimedia.org/P21976 and previous config saved to /var/cache/conftool/dbconfig/20220307-131606-marostegui.json
  • 13:12 aqu@deploy1002: Started deploy [analytics/refinery@51d074b]: Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21975 and previous config saved to /var/cache/conftool/dbconfig/20220307-131100-ladsgroup.json
  • 13:09 aqu_: About to deploy analytics/refinery - Migrate wikidata/item_page_link/weekly from Oozie to Airflow
  • 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1142.eqiad.wmnet with reason: host reimage
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21974 and previous config saved to /var/cache/conftool/dbconfig/20220307-130520-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21973 and previous config saved to /var/cache/conftool/dbconfig/20220307-130512-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1142.eqiad.wmnet with reason: host reimage
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300381)', diff saved to https://phabricator.wikimedia.org/P21972 and previous config saved to /var/cache/conftool/dbconfig/20220307-130326-marostegui.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T300381)', diff saved to https://phabricator.wikimedia.org/P21971 and previous config saved to /var/cache/conftool/dbconfig/20220307-125540-marostegui.json
  • 12:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300381)', diff saved to https://phabricator.wikimedia.org/P21970 and previous config saved to /var/cache/conftool/dbconfig/20220307-125532-marostegui.json
  • 12:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1142.eqiad.wmnet with OS bullseye
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21969 and previous config saved to /var/cache/conftool/dbconfig/20220307-125007-ladsgroup.json
  • 12:49 aqu@deploy1002: Finished deploy [airflow-dags/analytics@46d88a2]: Migrate wikidata/item_page_link/weekly (duration: 00m 07s)
  • 12:49 aqu@deploy1002: Started deploy [airflow-dags/analytics@46d88a2]: Migrate wikidata/item_page_link/weekly
  • 12:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21968 and previous config saved to /var/cache/conftool/dbconfig/20220307-124815-ladsgroup.json
  • 12:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21967 and previous config saved to /var/cache/conftool/dbconfig/20220307-124028-marostegui.json
  • 12:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
  • 12:37 XioNoX: restart cr1-drmrs for software upgrade
  • 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21966 and previous config saved to /var/cache/conftool/dbconfig/20220307-123503-ladsgroup.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21965 and previous config saved to /var/cache/conftool/dbconfig/20220307-122523-marostegui.json
  • 12:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2037.codfw.wmnet with OS buster
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21964 and previous config saved to /var/cache/conftool/dbconfig/20220307-121958-ladsgroup.json
  • 12:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3060.esams.wmnet with OS buster
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21963 and previous config saved to /var/cache/conftool/dbconfig/20220307-121443-ladsgroup.json
  • 12:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:13 vgutierrez: pool cp3060 with HAProxy as TLS termination layer - T290005
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T300775)', diff saved to https://phabricator.wikimedia.org/P21962 and previous config saved to /var/cache/conftool/dbconfig/20220307-121122-marostegui.json
  • 12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300381)', diff saved to https://phabricator.wikimedia.org/P21961 and previous config saved to /var/cache/conftool/dbconfig/20220307-121018-marostegui.json
  • 12:10 XioNoX: reboot cr2-drmrs for software upgrade
  • 12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300992)', diff saved to https://phabricator.wikimedia.org/P21960 and previous config saved to /var/cache/conftool/dbconfig/20220307-120821-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21959 and previous config saved to /var/cache/conftool/dbconfig/20220307-120722-ladsgroup.json
  • 12:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 12:06 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T300381)', diff saved to https://phabricator.wikimedia.org/P21958 and previous config saved to /var/cache/conftool/dbconfig/20220307-120532-marostegui.json
  • 12:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5016.eqsin.wmnet with OS buster
  • 12:03 vgutierrez: pool cp5016 with HAProxy as TLS termination layer - T290005
  • 11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
  • 11:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P21957 and previous config saved to /var/cache/conftool/dbconfig/20220307-115337-marostegui.json
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21956 and previous config saved to /var/cache/conftool/dbconfig/20220307-115316-ladsgroup.json
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21955 and previous config saved to /var/cache/conftool/dbconfig/20220307-115217-ladsgroup.json
  • 11:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
  • 11:45 XioNoX: remove MTU1400 on drmrs GTT links
  • 11:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21954 and previous config saved to /var/cache/conftool/dbconfig/20220307-113833-marostegui.json
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21953 and previous config saved to /var/cache/conftool/dbconfig/20220307-113811-ladsgroup.json
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21952 and previous config saved to /var/cache/conftool/dbconfig/20220307-113712-ladsgroup.json
  • 11:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
  • 11:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 11:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21951 and previous config saved to /var/cache/conftool/dbconfig/20220307-112328-marostegui.json
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300992)', diff saved to https://phabricator.wikimedia.org/P21950 and previous config saved to /var/cache/conftool/dbconfig/20220307-112307-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21949 and previous config saved to /var/cache/conftool/dbconfig/20220307-112207-ladsgroup.json
  • 11:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3060.esams.wmnet with OS buster
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21948 and previous config saved to /var/cache/conftool/dbconfig/20220307-111834-root.json
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T300992)', diff saved to https://phabricator.wikimedia.org/P21947 and previous config saved to /var/cache/conftool/dbconfig/20220307-111816-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300992)', diff saved to https://phabricator.wikimedia.org/P21946 and previous config saved to /var/cache/conftool/dbconfig/20220307-111809-ladsgroup.json
  • 11:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4036.ulsfo.wmnet with OS buster
  • 11:12 vgutierrez: pool cp4036 with HAProxy as TLS termination layer - T290005
  • 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5016.eqsin.wmnet with OS buster
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P21945 and previous config saved to /var/cache/conftool/dbconfig/20220307-110823-marostegui.json
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21944 and previous config saved to /var/cache/conftool/dbconfig/20220307-110330-root.json
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21943 and previous config saved to /var/cache/conftool/dbconfig/20220307-110304-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1143.eqiad.wmnet with OS bullseye
  • 11:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1084.eqiad.wmnet with OS buster
  • 10:59 vgutierrez: pool cp1084 with HAProxy as TLS termination layer - T290005
  • 10:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4036.ulsfo.wmnet with reason: host reimage
  • 10:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4036.ulsfo.wmnet with reason: host reimage
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P21942 and previous config saved to /var/cache/conftool/dbconfig/20220307-104906-marostegui.json
  • 10:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21941 and previous config saved to /var/cache/conftool/dbconfig/20220307-104826-root.json
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21940 and previous config saved to /var/cache/conftool/dbconfig/20220307-104759-ladsgroup.json
  • 10:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1143.eqiad.wmnet with reason: host reimage
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1143.eqiad.wmnet with reason: host reimage
  • 10:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
  • 10:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4036.ulsfo.wmnet with OS buster
  • 10:34 jayme: (re)started ferm on kubernetes1001
  • 10:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21939 and previous config saved to /var/cache/conftool/dbconfig/20220307-103323-root.json
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300992)', diff saved to https://phabricator.wikimedia.org/P21938 and previous config saved to /var/cache/conftool/dbconfig/20220307-103253-ladsgroup.json
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1143.eqiad.wmnet with OS bullseye
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T300992)', diff saved to https://phabricator.wikimedia.org/P21937 and previous config saved to /var/cache/conftool/dbconfig/20220307-102737-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300992)', diff saved to https://phabricator.wikimedia.org/P21936 and previous config saved to /var/cache/conftool/dbconfig/20220307-102730-ladsgroup.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312', diff saved to https://phabricator.wikimedia.org/P21935 and previous config saved to /var/cache/conftool/dbconfig/20220307-102209-marostegui.json
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21934 and previous config saved to /var/cache/conftool/dbconfig/20220307-102158-ladsgroup.json
  • 10:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21933 and previous config saved to /var/cache/conftool/dbconfig/20220307-102129-ladsgroup.json
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21932 and previous config saved to /var/cache/conftool/dbconfig/20220307-102054-marostegui.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162', diff saved to https://phabricator.wikimedia.org/P21931 and previous config saved to /var/cache/conftool/dbconfig/20220307-101824-marostegui.json
  • 10:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1084.eqiad.wmnet with OS buster
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21930 and previous config saved to /var/cache/conftool/dbconfig/20220307-101657-root.json
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21929 and previous config saved to /var/cache/conftool/dbconfig/20220307-101225-ladsgroup.json
  • 10:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2036.codfw.wmnet with OS buster
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21928 and previous config saved to /var/cache/conftool/dbconfig/20220307-100624-ladsgroup.json
  • 10:04 vgutierrez: pool cp2036 with HAProxy as TLS termination layer - T290005
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21927 and previous config saved to /var/cache/conftool/dbconfig/20220307-100153-root.json
  • 10:00 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:58 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21926 and previous config saved to /var/cache/conftool/dbconfig/20220307-095720-ladsgroup.json
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21925 and previous config saved to /var/cache/conftool/dbconfig/20220307-095120-ladsgroup.json
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21924 and previous config saved to /var/cache/conftool/dbconfig/20220307-095111-root.json
  • 09:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21923 and previous config saved to /var/cache/conftool/dbconfig/20220307-094649-root.json
  • 09:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
  • 09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300992)', diff saved to https://phabricator.wikimedia.org/P21922 and previous config saved to /var/cache/conftool/dbconfig/20220307-094216-ladsgroup.json
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T300992)', diff saved to https://phabricator.wikimedia.org/P21921 and previous config saved to /var/cache/conftool/dbconfig/20220307-093701-ladsgroup.json
  • 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300992)', diff saved to https://phabricator.wikimedia.org/P21920 and previous config saved to /var/cache/conftool/dbconfig/20220307-093653-ladsgroup.json
  • 09:36 jynus: updated non-A wikipedia.org DNS records T302617
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21919 and previous config saved to /var/cache/conftool/dbconfig/20220307-093615-ladsgroup.json
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21918 and previous config saved to /var/cache/conftool/dbconfig/20220307-093607-root.json
  • 09:35 jynus: updated non-A wikipedia.org DNS records
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21917 and previous config saved to /var/cache/conftool/dbconfig/20220307-093146-root.json
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21916 and previous config saved to /var/cache/conftool/dbconfig/20220307-093032-ladsgroup.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123', diff saved to https://phabricator.wikimedia.org/P21915 and previous config saved to /var/cache/conftool/dbconfig/20220307-093013-marostegui.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21914 and previous config saved to /var/cache/conftool/dbconfig/20220307-092924-root.json
  • 09:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2036.codfw.wmnet with OS buster
  • 09:22 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@19520c1]: (no justification provided) (duration: 00m 04s)
  • 09:22 ebysans@deploy1002: Started deploy [airflow-dags/analytics@19520c1]: (no justification provided)
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21913 and previous config saved to /var/cache/conftool/dbconfig/20220307-092148-ladsgroup.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 60%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21912 and previous config saved to /var/cache/conftool/dbconfig/20220307-092103-root.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21911 and previous config saved to /var/cache/conftool/dbconfig/20220307-092034-marostegui.json
  • 09:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21910 and previous config saved to /var/cache/conftool/dbconfig/20220307-091527-ladsgroup.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21909 and previous config saved to /var/cache/conftool/dbconfig/20220307-091421-root.json
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21908 and previous config saved to /var/cache/conftool/dbconfig/20220307-090644-ladsgroup.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21907 and previous config saved to /var/cache/conftool/dbconfig/20220307-090600-root.json
  • 09:01 dcausse: restarting blazegraph on wdqs1013 (jvm stuck for 6hours)
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21906 and previous config saved to /var/cache/conftool/dbconfig/20220307-090021-ladsgroup.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21905 and previous config saved to /var/cache/conftool/dbconfig/20220307-085917-root.json
  • 08:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300992)', diff saved to https://phabricator.wikimedia.org/P21904 and previous config saved to /var/cache/conftool/dbconfig/20220307-085139-ladsgroup.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 40%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21903 and previous config saved to /var/cache/conftool/dbconfig/20220307-085056-root.json
  • 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:46 elukey: `kafka configs --alter --entity-type topics --entity-name udp_localhost-info --add-config retention.bytes=300000000000` on kafka-logging to reduce the size of the biggest topic partitions
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T300992)', diff saved to https://phabricator.wikimedia.org/P21902 and previous config saved to /var/cache/conftool/dbconfig/20220307-084641-ladsgroup.json
  • 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21901 and previous config saved to /var/cache/conftool/dbconfig/20220307-084516-ladsgroup.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21900 and previous config saved to /var/cache/conftool/dbconfig/20220307-084413-root.json
  • 08:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e3f70f6: enwiki: Deploy Growth features to 100% of users (T302846) (duration: 00m 50s)
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P21899 and previous config saved to /var/cache/conftool/dbconfig/20220307-084235-marostegui.json
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21898 and previous config saved to /var/cache/conftool/dbconfig/20220307-084219-root.json
  • 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300992)', diff saved to https://phabricator.wikimedia.org/P21897 and previous config saved to /var/cache/conftool/dbconfig/20220307-083948-ladsgroup.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21896 and previous config saved to /var/cache/conftool/dbconfig/20220307-083553-root.json
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21895 and previous config saved to /var/cache/conftool/dbconfig/20220307-082716-root.json
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21894 and previous config saved to /var/cache/conftool/dbconfig/20220307-082443-ladsgroup.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 20%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21893 and previous config saved to /var/cache/conftool/dbconfig/20220307-082049-root.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21892 and previous config saved to /var/cache/conftool/dbconfig/20220307-081212-root.json
  • 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21891 and previous config saved to /var/cache/conftool/dbconfig/20220307-080938-ladsgroup.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21890 and previous config saved to /var/cache/conftool/dbconfig/20220307-080545-root.json
  • 08:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1144.eqiad.wmnet with OS bullseye
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21889 and previous config saved to /var/cache/conftool/dbconfig/20220307-075708-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P21888 and previous config saved to /var/cache/conftool/dbconfig/20220307-075523-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21887 and previous config saved to /var/cache/conftool/dbconfig/20220307-075504-root.json
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300992)', diff saved to https://phabricator.wikimedia.org/P21886 and previous config saved to /var/cache/conftool/dbconfig/20220307-075433-ladsgroup.json
  • 07:53 marostegui: dbmaint on db1181 s7@eqiad T276150
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P21885 and previous config saved to /var/cache/conftool/dbconfig/20220307-075120-marostegui.json
  • 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T300992)', diff saved to https://phabricator.wikimedia.org/P21884 and previous config saved to /var/cache/conftool/dbconfig/20220307-074923-ladsgroup.json
  • 07:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300992)', diff saved to https://phabricator.wikimedia.org/P21883 and previous config saved to /var/cache/conftool/dbconfig/20220307-074909-ladsgroup.json
  • 07:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1144.eqiad.wmnet with reason: host reimage
  • 07:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1144.eqiad.wmnet with reason: host reimage
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21882 and previous config saved to /var/cache/conftool/dbconfig/20220307-074001-root.json
  • 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21881 and previous config saved to /var/cache/conftool/dbconfig/20220307-073405-ladsgroup.json
  • 07:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1144.eqiad.wmnet with OS bullseye
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21880 and previous config saved to /var/cache/conftool/dbconfig/20220307-072624-ladsgroup.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21879 and previous config saved to /var/cache/conftool/dbconfig/20220307-072457-root.json
  • 07:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21878 and previous config saved to /var/cache/conftool/dbconfig/20220307-072453-ladsgroup.json
  • 07:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 07:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21877 and previous config saved to /var/cache/conftool/dbconfig/20220307-071900-ladsgroup.json
  • 07:15 elukey: `elukey@ml-staging-ctrl2002:~$ sudo systemctl reset-failed ifup@ens13.service`
  • 07:14 elukey: kill tmux sessions of user 'zpapierski' on wdqs[1004,2002,2003] (puppet broken, offboarded user)
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21876 and previous config saved to /var/cache/conftool/dbconfig/20220307-071227-ladsgroup.json
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21875 and previous config saved to /var/cache/conftool/dbconfig/20220307-070953-root.json
  • 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:06 marostegui: dbmaint on db1179 s3@eqiad T302222
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P21874 and previous config saved to /var/cache/conftool/dbconfig/20220307-070537-marostegui.json
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300992)', diff saved to https://phabricator.wikimedia.org/P21873 and previous config saved to /var/cache/conftool/dbconfig/20220307-070355-ladsgroup.json
  • 07:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T300992)', diff saved to https://phabricator.wikimedia.org/P21872 and previous config saved to /var/cache/conftool/dbconfig/20220307-065839-ladsgroup.json
  • 06:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300992)', diff saved to https://phabricator.wikimedia.org/P21871 and previous config saved to /var/cache/conftool/dbconfig/20220307-065832-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21870 and previous config saved to /var/cache/conftool/dbconfig/20220307-065722-ladsgroup.json
  • 06:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:52 urbanecm: Reset authentication throttle for 217.23.37.10 via resetAuthenticationThrottle.php (T302973)
  • 06:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:49 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: 2e9fdd4: 867bb7b: Add throttle rules (T302973; T303002) (duration: 00m 49s)
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21869 and previous config saved to /var/cache/conftool/dbconfig/20220307-064327-ladsgroup.json
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21868 and previous config saved to /var/cache/conftool/dbconfig/20220307-064217-ladsgroup.json
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21867 and previous config saved to /var/cache/conftool/dbconfig/20220307-062823-ladsgroup.json
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21866 and previous config saved to /var/cache/conftool/dbconfig/20220307-062713-ladsgroup.json
  • 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300992)', diff saved to https://phabricator.wikimedia.org/P21865 and previous config saved to /var/cache/conftool/dbconfig/20220307-061318-ladsgroup.json
  • 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T300992)', diff saved to https://phabricator.wikimedia.org/P21864 and previous config saved to /var/cache/conftool/dbconfig/20220307-060819-ladsgroup.json
  • 06:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 06:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300992)', diff saved to https://phabricator.wikimedia.org/P21863 and previous config saved to /var/cache/conftool/dbconfig/20220307-060811-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21862 and previous config saved to /var/cache/conftool/dbconfig/20220307-055307-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1147.eqiad.wmnet with OS bullseye
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21861 and previous config saved to /var/cache/conftool/dbconfig/20220307-053802-ladsgroup.json
  • 05:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1147.eqiad.wmnet with reason: host reimage
  • 05:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1147.eqiad.wmnet with reason: host reimage
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300992)', diff saved to https://phabricator.wikimedia.org/P21860 and previous config saved to /var/cache/conftool/dbconfig/20220307-052257-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1147.eqiad.wmnet with OS bullseye
  • 05:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T300992)', diff saved to https://phabricator.wikimedia.org/P21859 and previous config saved to /var/cache/conftool/dbconfig/20220307-051807-ladsgroup.json
  • 05:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21858 and previous config saved to /var/cache/conftool/dbconfig/20220307-051537-ladsgroup.json
  • 05:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 05:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance

2022-03-04

  • 17:59 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:57 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 17:57 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:48 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 17:46 mforns@deploy1002: Finished deploy [airflow-dags/analytics@19520c1]: (no justification provided) (duration: 00m 07s)
  • 17:46 mforns@deploy1002: Started deploy [airflow-dags/analytics@19520c1]: (no justification provided)
  • 17:39 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@19520c1]: (no justification provided) (duration: 00m 08s)
  • 17:39 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@19520c1]: (no justification provided)
  • 17:09 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 08s)
  • 17:09 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 16:35 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 07s)
  • 16:35 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 16:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 10s)
  • 16:13 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 16:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21856 and previous config saved to /var/cache/conftool/dbconfig/20220304-160629-ladsgroup.json
  • 16:03 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 03s)
  • 16:03 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 15:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1086.eqiad.wmnet with OS buster
  • 15:58 vgutierrez: pool cp1086 with HAProxy as TLS termination layer - T290005
  • 15:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2038.codfw.wmnet with OS buster
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P21854 and previous config saved to /var/cache/conftool/dbconfig/20220304-155124-ladsgroup.json
  • 15:51 vgutierrez: pool cp2038 with HAProxy as TLS termination layer - T290005
  • 15:49 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 07s)
  • 15:49 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 15:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
  • 15:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
  • 15:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P21852 and previous config saved to /var/cache/conftool/dbconfig/20220304-153619-ladsgroup.json
  • 15:34 XioNoX: blackhole IPs - T303055
  • 15:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
  • 15:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1086.eqiad.wmnet with OS buster
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21851 and previous config saved to /var/cache/conftool/dbconfig/20220304-152114-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21850 and previous config saved to /var/cache/conftool/dbconfig/20220304-152007-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 15:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300992)', diff saved to https://phabricator.wikimedia.org/P21849 and previous config saved to /var/cache/conftool/dbconfig/20220304-151937-ladsgroup.json
  • 15:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2038.codfw.wmnet with OS buster
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21848 and previous config saved to /var/cache/conftool/dbconfig/20220304-150433-ladsgroup.json
  • 14:59 ebernhardson: restart elasticsearch_6@production-search-psi-eqiad.service on elastic1049 to resolve CirrusSearchJVMGCOldPoolFlatlined alert
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21847 and previous config saved to /var/cache/conftool/dbconfig/20220304-144926-ladsgroup.json
  • 14:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3059.esams.wmnet with OS buster
  • 14:43 vgutierrez: pool cp3059 with HAProxy as TLS termination layer - T290005
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300992)', diff saved to https://phabricator.wikimedia.org/P21846 and previous config saved to /var/cache/conftool/dbconfig/20220304-143421-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T300992)', diff saved to https://phabricator.wikimedia.org/P21845 and previous config saved to /var/cache/conftool/dbconfig/20220304-143214-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 14:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300992)', diff saved to https://phabricator.wikimedia.org/P21844 and previous config saved to /var/cache/conftool/dbconfig/20220304-143206-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P21842 and previous config saved to /var/cache/conftool/dbconfig/20220304-141701-ladsgroup.json
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P21841 and previous config saved to /var/cache/conftool/dbconfig/20220304-140156-ladsgroup.json
  • 13:49 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1302-1306].eqiad.wmnet
  • 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300992)', diff saved to https://phabricator.wikimedia.org/P21840 and previous config saved to /var/cache/conftool/dbconfig/20220304-134651-ladsgroup.json
  • 13:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T300992)', diff saved to https://phabricator.wikimedia.org/P21839 and previous config saved to /var/cache/conftool/dbconfig/20220304-134443-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300992)', diff saved to https://phabricator.wikimedia.org/P21838 and previous config saved to /var/cache/conftool/dbconfig/20220304-134436-ladsgroup.json
  • 13:38 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P21837 and previous config saved to /var/cache/conftool/dbconfig/20220304-132931-ladsgroup.json
  • 13:19 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1302-1306].eqiad.wmnet
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P21836 and previous config saved to /var/cache/conftool/dbconfig/20220304-131426-ladsgroup.json
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300992)', diff saved to https://phabricator.wikimedia.org/P21835 and previous config saved to /var/cache/conftool/dbconfig/20220304-125921-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T300992)', diff saved to https://phabricator.wikimedia.org/P21834 and previous config saved to /var/cache/conftool/dbconfig/20220304-125714-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300992)', diff saved to https://phabricator.wikimedia.org/P21833 and previous config saved to /var/cache/conftool/dbconfig/20220304-125706-ladsgroup.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P21832 and previous config saved to /var/cache/conftool/dbconfig/20220304-124201-ladsgroup.json
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P21831 and previous config saved to /var/cache/conftool/dbconfig/20220304-122656-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300992)', diff saved to https://phabricator.wikimedia.org/P21830 and previous config saved to /var/cache/conftool/dbconfig/20220304-121152-ladsgroup.json
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T300992)', diff saved to https://phabricator.wikimedia.org/P21829 and previous config saved to /var/cache/conftool/dbconfig/20220304-120944-ladsgroup.json
  • 12:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300992)', diff saved to https://phabricator.wikimedia.org/P21828 and previous config saved to /var/cache/conftool/dbconfig/20220304-120937-ladsgroup.json
  • 12:04 jbond: enable SameSite=Strict on idp
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P21827 and previous config saved to /var/cache/conftool/dbconfig/20220304-115432-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P21826 and previous config saved to /var/cache/conftool/dbconfig/20220304-113927-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300992)', diff saved to https://phabricator.wikimedia.org/P21825 and previous config saved to /var/cache/conftool/dbconfig/20220304-112422-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T300992)', diff saved to https://phabricator.wikimedia.org/P21824 and previous config saved to /var/cache/conftool/dbconfig/20220304-112214-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
  • 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300992)', diff saved to https://phabricator.wikimedia.org/P21823 and previous config saved to /var/cache/conftool/dbconfig/20220304-112207-ladsgroup.json
  • 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
  • 11:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4024.ulsfo.wmnet with OS buster
  • 11:09 vgutierrez: pool cp4024 with HAProxy as TLS termination layer - T290005
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P21822 and previous config saved to /var/cache/conftool/dbconfig/20220304-110702-ladsgroup.json
  • 10:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4024.ulsfo.wmnet with reason: host reimage
  • 10:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4024.ulsfo.wmnet with reason: host reimage
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P21821 and previous config saved to /var/cache/conftool/dbconfig/20220304-105157-ladsgroup.json
  • 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3059.esams.wmnet with OS buster
  • 10:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4024.ulsfo.wmnet with OS buster
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300992)', diff saved to https://phabricator.wikimedia.org/P21820 and previous config saved to /var/cache/conftool/dbconfig/20220304-103652-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T300992)', diff saved to https://phabricator.wikimedia.org/P21819 and previous config saved to /var/cache/conftool/dbconfig/20220304-103444-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300992)', diff saved to https://phabricator.wikimedia.org/P21818 and previous config saved to /var/cache/conftool/dbconfig/20220304-103437-ladsgroup.json
  • 10:29 vgutierrez: pool cp5004 with HAProxy as TLS termination layer - T290005
  • 10:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5004.eqsin.wmnet with OS buster
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21817 and previous config saved to /var/cache/conftool/dbconfig/20220304-101932-ladsgroup.json
  • 10:08 aqu@deploy1002: Finished deploy [airflow-dags/analytics@1c8384f]: AF //tion default args (duration: 00m 07s)
  • 10:08 aqu@deploy1002: Started deploy [airflow-dags/analytics@1c8384f]: AF //tion default args
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21816 and previous config saved to /var/cache/conftool/dbconfig/20220304-100427-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300992)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20220304-094918-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T300992)', diff saved to https://phabricator.wikimedia.org/P21815 and previous config saved to /var/cache/conftool/dbconfig/20220304-094710-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21814 and previous config saved to /var/cache/conftool/dbconfig/20220304-094702-ladsgroup.json
  • 09:43 vgutierrez: restart varnish on cp3056
  • 09:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5004.eqsin.wmnet with reason: host reimage
  • 09:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5004.eqsin.wmnet with reason: host reimage
  • 09:37 vgutierrez: restart varnish on cp3058
  • 09:33 vgutierrez: restart varnish on cp3060
  • 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21813 and previous config saved to /var/cache/conftool/dbconfig/20220304-093157-ladsgroup.json
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21812 and previous config saved to /var/cache/conftool/dbconfig/20220304-091652-ladsgroup.json
  • 09:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5004.eqsin.wmnet with OS buster
  • 09:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[1005-1006].eqiad.wmnet
  • 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21811 and previous config saved to /var/cache/conftool/dbconfig/20220304-090147-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21810 and previous config saved to /var/cache/conftool/dbconfig/20220304-085939-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300992)', diff saved to https://phabricator.wikimedia.org/P21809 and previous config saved to /var/cache/conftool/dbconfig/20220304-085932-ladsgroup.json
  • 08:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21808 and previous config saved to /var/cache/conftool/dbconfig/20220304-084427-ladsgroup.json
  • 08:34 akosiaris: T303027 depool mw130[2-6]. Old jobrunners/videoscalers, being decommisioned
  • 08:33 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=mw130[2-6].eqiad.wmnet
  • 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21807 and previous config saved to /var/cache/conftool/dbconfig/20220304-082922-ladsgroup.json
  • 08:23 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 08:19 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[1005-1006].eqiad.wmnet
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300992)', diff saved to https://phabricator.wikimedia.org/P21806 and previous config saved to /var/cache/conftool/dbconfig/20220304-081417-ladsgroup.json
  • 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T300992)', diff saved to https://phabricator.wikimedia.org/P21805 and previous config saved to /var/cache/conftool/dbconfig/20220304-081210-ladsgroup.json
  • 08:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:27 XioNoX: push pfw policies - T303003
  • 01:35 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 01:34 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 01:34 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 01:33 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 01:33 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 01:32 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 01:32 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 01:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 01:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 01:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 01:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 01:30 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 01:30 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 01:29 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 01:29 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 01:27 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 01:27 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 01:25 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 01:25 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 01:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 01:24 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 01:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 01:24 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 01:23 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 01:23 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 01:22 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply

2022-03-03

  • 21:35 brennen: end of UTC late backport & config window / training
  • 21:30 brennen@deploy1002: Finished scap: Config: Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956) (duration: 01m 33s)
  • 21:28 brennen@deploy1002: Started scap: Config: Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956)
  • 21:28 brennen@deploy1002: Synchronized multiversion/MWRealm.php: Config: Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956) (duration: 00m 48s)
  • 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:35 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.24 refs T300200
  • 19:32 brennen: 1.38.0-wmf.24 train (T300200): no current blockers; proceeding to all wikis
  • 19:30 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/skins/Vector/includes/SkinVector.php: Backport: Unset data-toc in SkinVector (T302461) (duration: 00m 49s)
  • 19:23 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/skins/MinervaNeue/resources/skins.minerva.base.styles/userMenu.less: Backport: Remove user navigation min width and width (T302753) (duration: 00m 51s)
  • 19:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 18:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 18:50 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 18:39 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 18:32 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:29 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:11 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: updating wmf-puppet-dashboard (duration: 09m 12s)
  • 18:02 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 18:02 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: updating wmf-puppet-dashboard
  • 17:59 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:58 krinkle@deploy1002: Synchronized wmf-config/: Idf7b21159423 (duration: 00m 51s)
  • 17:49 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:48 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:47 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T302950)', diff saved to https://phabricator.wikimedia.org/P21802 and previous config saved to /var/cache/conftool/dbconfig/20220303-173630-ladsgroup.json
  • 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21801 and previous config saved to /var/cache/conftool/dbconfig/20220303-172125-ladsgroup.json
  • 17:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21800 and previous config saved to /var/cache/conftool/dbconfig/20220303-170621-ladsgroup.json
  • 17:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 17:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 17:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 16:53 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 16:53 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T302950)', diff saved to https://phabricator.wikimedia.org/P21799 and previous config saved to /var/cache/conftool/dbconfig/20220303-165116-ladsgroup.json
  • 16:30 godog: roll-restart logstash to pick up config changes - T291946
  • 16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1148.eqiad.wmnet with OS bullseye
  • 16:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1148.eqiad.wmnet with reason: host reimage
  • 15:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1148.eqiad.wmnet with reason: host reimage
  • 15:53 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:47 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1148.eqiad.wmnet with OS bullseye
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T302950)', diff saved to https://phabricator.wikimedia.org/P21798 and previous config saved to /var/cache/conftool/dbconfig/20220303-152242-ladsgroup.json
  • 15:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 15:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 15:21 moritzm: restarting FPM/Apache on mw job runners to pick up expat security updates
  • 15:08 mutante: T296022 - phabricator - disabled git cloning over ssh for 'stewardscripts' repo - stewards have been asked via mailing list
  • 14:48 godog: force a puppet run on cp6011 to unblock icinga and disable puppet again, cc bblack
  • 14:48 Lucas_WMDE: UTC afternoon backport window done
  • 14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport: GLAM event: Update landing page content (T301097) (full sync because of i18n change) (duration: 09m 45s)
  • 14:37 lucaswerkmeister-wmde@deploy1002: Started scap: Backport: GLAM event: Update landing page content (T301097) (full sync because of i18n change)
  • 14:26 XioNoX: merge Icinga: use parent switch shortname
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 14:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 14:04 volans: upgraded spicerack to v2.1.0 on cumin1001/cumin2002
  • 14:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T302950)', diff saved to https://phabricator.wikimedia.org/P21794 and previous config saved to /var/cache/conftool/dbconfig/20220303-135737-ladsgroup.json
  • 13:54 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:54 akosiaris: switch changeprop, changeprop-jobqueue to use rdb1011. T281217
  • 13:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:53 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 13:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 13:53 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 13:52 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:52 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:52 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 13:51 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 13:45 akosiaris: roll restart ores uwsgi and celery for rdb1005 decommissioning. T281217
  • 13:44 akosiaris@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21793 and previous config saved to /var/cache/conftool/dbconfig/20220303-134232-ladsgroup.json
  • 13:20 moritzm: restarting FPM/Apache on mw app servers to pick up expat security updates
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T302950)', diff saved to https://phabricator.wikimedia.org/P21791 and previous config saved to /var/cache/conftool/dbconfig/20220303-131223-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1149.eqiad.wmnet with OS bullseye
  • 12:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1149.eqiad.wmnet with reason: host reimage
  • 12:47 hashar: Upgrading Quibble on CI Jenkins jobs from 1.3.0 to 1.4.3 https://gerrit.wikimedia.org/r/c/integration/config/+/767749/
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1149.eqiad.wmnet with reason: host reimage
  • 12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1149.eqiad.wmnet with OS bullseye
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T302950)', diff saved to https://phabricator.wikimedia.org/P21790 and previous config saved to /var/cache/conftool/dbconfig/20220303-123030-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 11:49 volans: uploaded spicerack_2.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 11:33 kormat@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repooling to 100% after incident', diff saved to https://phabricator.wikimedia.org/P21789 and previous config saved to /var/cache/conftool/dbconfig/20220303-113304-kormat.json
  • 11:18 kormat@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repooling to 100% after incident', diff saved to https://phabricator.wikimedia.org/P21788 and previous config saved to /var/cache/conftool/dbconfig/20220303-111801-kormat.json
  • 11:02 kormat@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repooling to 100% after incident', diff saved to https://phabricator.wikimedia.org/P21787 and previous config saved to /var/cache/conftool/dbconfig/20220303-110257-kormat.json
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T302950)', diff saved to https://phabricator.wikimedia.org/P21786 and previous config saved to /var/cache/conftool/dbconfig/20220303-110224-ladsgroup.json
  • 11:02 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling db1126 to full weight', diff saved to https://phabricator.wikimedia.org/P21785 and previous config saved to /var/cache/conftool/dbconfig/20220303-110220-kormat.json
  • 10:58 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.23/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Backport: rdbms: Change getConnectionRef to return with getLazyConnectionRef (T255493) (duration: 00m 50s)
  • 10:50 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.24/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Backport: rdbms: Change getConnectionRef to return with getLazyConnectionRef (T255493) (duration: 00m 51s)
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21784 and previous config saved to /var/cache/conftool/dbconfig/20220303-104713-ladsgroup.json
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21783 and previous config saved to /var/cache/conftool/dbconfig/20220303-103659-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21782 and previous config saved to /var/cache/conftool/dbconfig/20220303-103209-ladsgroup.json
  • 10:30 XioNoX: repool ulsfo
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P21781 and previous config saved to /var/cache/conftool/dbconfig/20220303-102154-ladsgroup.json
  • 10:18 elukey: kubectl cordon kubernetes200[1-4] to avoid scheduling pods on nodes that will be decommed during the next weeks - T302208
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T302950)', diff saved to https://phabricator.wikimedia.org/P21780 and previous config saved to /var/cache/conftool/dbconfig/20220303-101704-ladsgroup.json
  • 10:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1160.eqiad.wmnet with OS bullseye
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P21779 and previous config saved to /var/cache/conftool/dbconfig/20220303-100649-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: host reimage
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21778 and previous config saved to /var/cache/conftool/dbconfig/20220303-095145-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: host reimage
  • 09:37 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@1c8384f]: AF //tion default args (duration: 00m 09s)
  • 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1160.eqiad.wmnet with OS bullseye
  • 09:37 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@1c8384f]: AF //tion default args
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T302950)', diff saved to https://phabricator.wikimedia.org/P21777 and previous config saved to /var/cache/conftool/dbconfig/20220303-093306-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2073 (T302950)', diff saved to https://phabricator.wikimedia.org/P21775 and previous config saved to /var/cache/conftool/dbconfig/20220303-091340-ladsgroup.json
  • 09:12 moritzm: restarting FPM/Apache on mw API servers to pick up expat security updates
  • 09:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2073.codfw.wmnet with OS bullseye
  • 09:01 moritzm: restarting superset on an-tool1010 to pick up expat security updates
  • 08:52 taavi: UTC morning deploys done
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21774 and previous config saved to /var/cache/conftool/dbconfig/20220303-085125-ladsgroup.json
  • 08:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21773 and previous config saved to /var/cache/conftool/dbconfig/20220303-085118-ladsgroup.json
  • 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2073.codfw.wmnet with reason: host reimage
  • 08:48 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GLAM event: Update wgGECampaigns and wgGECampaignTopics (T301029) (duration: 00m 51s)
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2073.codfw.wmnet with reason: host reimage
  • 08:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21772 and previous config saved to /var/cache/conftool/dbconfig/20220303-083613-ladsgroup.json
  • 08:34 moritzm: installing expat security updates
  • 08:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2073.codfw.wmnet with OS bullseye
  • 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2073 (T302950)', diff saved to https://phabricator.wikimedia.org/P21771 and previous config saved to /var/cache/conftool/dbconfig/20220303-082842-ladsgroup.json
  • 08:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance
  • 08:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2090 (T302950)', diff saved to https://phabricator.wikimedia.org/P21770 and previous config saved to /var/cache/conftool/dbconfig/20220303-082656-ladsgroup.json
  • 08:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21768 and previous config saved to /var/cache/conftool/dbconfig/20220303-082108-ladsgroup.json
  • 08:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4034.ulsfo.wmnet with OS buster
  • 08:18 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add centralauth-suppress to steward and wmf-supportsafety at metawiki (T302675) (duration: 00m 50s)
  • 08:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2090.codfw.wmnet with OS bullseye
  • 08:13 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: fawiki: Remove the Book namespace (T302957) (duration: 00m 51s)
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21767 and previous config saved to /var/cache/conftool/dbconfig/20220303-080603-ladsgroup.json
  • 08:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2090.codfw.wmnet with reason: host reimage
  • 07:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2090.codfw.wmnet with reason: host reimage
  • 07:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4034.ulsfo.wmnet with reason: host reimage
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21766 and previous config saved to /var/cache/conftool/dbconfig/20220303-075534-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 07:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 07:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4034.ulsfo.wmnet with reason: host reimage
  • 07:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2090.codfw.wmnet with OS bullseye
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2090 (T302950)', diff saved to https://phabricator.wikimedia.org/P21765 and previous config saved to /var/cache/conftool/dbconfig/20220303-074209-ladsgroup.json
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance
  • 07:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300992)', diff saved to https://phabricator.wikimedia.org/P21764 and previous config saved to /var/cache/conftool/dbconfig/20220303-073920-ladsgroup.json
  • 07:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 07:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21763 and previous config saved to /var/cache/conftool/dbconfig/20220303-072415-ladsgroup.json
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T302950)', diff saved to https://phabricator.wikimedia.org/P21762 and previous config saved to /var/cache/conftool/dbconfig/20220303-071800-ladsgroup.json
  • 07:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2106.codfw.wmnet with OS bullseye
  • 07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21761 and previous config saved to /var/cache/conftool/dbconfig/20220303-070910-ladsgroup.json
  • 06:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2106.codfw.wmnet with reason: host reimage
  • 06:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300992)', diff saved to https://phabricator.wikimedia.org/P21760 and previous config saved to /var/cache/conftool/dbconfig/20220303-065405-ladsgroup.json
  • 06:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2106.codfw.wmnet with reason: host reimage
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T300992)', diff saved to https://phabricator.wikimedia.org/P21759 and previous config saved to /var/cache/conftool/dbconfig/20220303-064945-ladsgroup.json
  • 06:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 06:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300992)', diff saved to https://phabricator.wikimedia.org/P21758 and previous config saved to /var/cache/conftool/dbconfig/20220303-064937-ladsgroup.json
  • 06:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2106.codfw.wmnet with OS bullseye
  • 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T302950)', diff saved to https://phabricator.wikimedia.org/P21757 and previous config saved to /var/cache/conftool/dbconfig/20220303-063514-ladsgroup.json
  • 06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21756 and previous config saved to /var/cache/conftool/dbconfig/20220303-063433-ladsgroup.json
  • 06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T302950)', diff saved to https://phabricator.wikimedia.org/P21755 and previous config saved to /var/cache/conftool/dbconfig/20220303-063350-ladsgroup.json
  • 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2119.codfw.wmnet with OS bullseye
  • 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21754 and previous config saved to /var/cache/conftool/dbconfig/20220303-061928-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2119.codfw.wmnet with reason: host reimage
  • 06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2119.codfw.wmnet with reason: host reimage
  • 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300992)', diff saved to https://phabricator.wikimedia.org/P21753 and previous config saved to /var/cache/conftool/dbconfig/20220303-060423-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T300992)', diff saved to https://phabricator.wikimedia.org/P21752 and previous config saved to /var/cache/conftool/dbconfig/20220303-060006-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300992)', diff saved to https://phabricator.wikimedia.org/P21751 and previous config saved to /var/cache/conftool/dbconfig/20220303-055959-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2119.codfw.wmnet with OS bullseye
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T302950)', diff saved to https://phabricator.wikimedia.org/P21750 and previous config saved to /var/cache/conftool/dbconfig/20220303-054657-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21749 and previous config saved to /var/cache/conftool/dbconfig/20220303-054454-ladsgroup.json
  • 05:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T302950)', diff saved to https://phabricator.wikimedia.org/P21748 and previous config saved to /var/cache/conftool/dbconfig/20220303-053324-ladsgroup.json
  • 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21747 and previous config saved to /var/cache/conftool/dbconfig/20220303-052949-ladsgroup.json
  • 05:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2136.codfw.wmnet with OS bullseye
  • 05:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300992)', diff saved to https://phabricator.wikimedia.org/P21746 and previous config saved to /var/cache/conftool/dbconfig/20220303-051444-ladsgroup.json
  • 04:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2136.codfw.wmnet with reason: host reimage
  • 04:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2136.codfw.wmnet with reason: host reimage
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T300992)', diff saved to https://phabricator.wikimedia.org/P21745 and previous config saved to /var/cache/conftool/dbconfig/20220303-044933-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 04:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300992)', diff saved to https://phabricator.wikimedia.org/P21744 and previous config saved to /var/cache/conftool/dbconfig/20220303-044926-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2136.codfw.wmnet with OS bullseye
  • 04:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T302950)', diff saved to https://phabricator.wikimedia.org/P21743 and previous config saved to /var/cache/conftool/dbconfig/20220303-043942-ladsgroup.json
  • 04:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 04:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 04:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T302950)', diff saved to https://phabricator.wikimedia.org/P21742 and previous config saved to /var/cache/conftool/dbconfig/20220303-043759-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P21741 and previous config saved to /var/cache/conftool/dbconfig/20220303-043421-ladsgroup.json
  • 04:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2140.codfw.wmnet with OS bullseye
  • 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P21740 and previous config saved to /var/cache/conftool/dbconfig/20220303-041916-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2140.codfw.wmnet with reason: host reimage
  • 04:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2140.codfw.wmnet with reason: host reimage
  • 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300992)', diff saved to https://phabricator.wikimedia.org/P21739 and previous config saved to /var/cache/conftool/dbconfig/20220303-040412-ladsgroup.json
  • 03:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T300992)', diff saved to https://phabricator.wikimedia.org/P21738 and previous config saved to /var/cache/conftool/dbconfig/20220303-035954-ladsgroup.json
  • 03:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 03:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 03:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 03:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2140.codfw.wmnet with OS bullseye
  • 03:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T302950)', diff saved to https://phabricator.wikimedia.org/P21737 and previous config saved to /var/cache/conftool/dbconfig/20220303-035328-ladsgroup.json
  • 03:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 03:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21736 and previous config saved to /var/cache/conftool/dbconfig/20220303-035134-ladsgroup.json
  • 03:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2147.codfw.wmnet with OS bullseye
  • 03:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21735 and previous config saved to /var/cache/conftool/dbconfig/20220303-033628-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2147.codfw.wmnet with reason: host reimage
  • 03:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2147.codfw.wmnet with reason: host reimage
  • 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21734 and previous config saved to /var/cache/conftool/dbconfig/20220303-032123-ladsgroup.json
  • 03:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2147.codfw.wmnet with OS bullseye
  • 03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21733 and previous config saved to /var/cache/conftool/dbconfig/20220303-030618-ladsgroup.json
  • 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21732 and previous config saved to /var/cache/conftool/dbconfig/20220303-030518-ladsgroup.json
  • 03:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 03:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21731 and previous config saved to /var/cache/conftool/dbconfig/20220303-025500-ladsgroup.json
  • 02:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 02:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 01:42 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on datahubsearch[1001-1003].eqiad.wmnet with reason: Still having errors setting up opensearch
  • 01:42 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on datahubsearch[1001-1003].eqiad.wmnet with reason: Still having errors setting up opensearch
  • 00:31 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dumpsdata1007.eqiad.wmnet
  • 00:31 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:25 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 00:21 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts dumpsdata1007.eqiad.wmnet

2022-03-02

  • 23:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 23:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 23:25 ryankemper: T276198 Re-enabled puppet across fleet: `ryankemper@cumin1001:~$ sudo -E cumin 'R:Elasticsearch::instance' 'enable-puppet "deploy fix from T276198"'`
  • 23:21 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 23:21 ryankemper: T276198 https://gerrit.wikimedia.org/r/c/operations/puppet/+/767600 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/767603/ fixed all the problems. Re-enabling puppet on elastic*, cloudelastic*, and relforge* shortly
  • 23:15 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 23:08 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 22:56 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 22:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 22:55 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:54 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:54 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 22:52 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 22:52 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 22:51 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 22:51 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 22:50 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 22:50 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 22:49 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 22:49 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 22:48 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 22:48 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 22:47 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 22:47 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 22:46 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 22:46 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 22:45 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 22:45 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 22:43 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 22:43 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 22:43 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 22:43 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 22:42 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 22:42 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 22:41 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 22:21 ryankemper: T276198 Downtimed `elastic1052` for 2 hours while troubleshooting
  • 22:16 ryankemper: T276198 Testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/766876/ on `elastic1052`; elasticsearch service fails to start. It's expecting to find `/etc/tmpfiles.d/elasticsearch-production-search-psi-eqiad.conf` but the actual filename is `elasticsearch-production-search-psi-eqiad-conf.conf`. Not sure why that trailing `-conf` is there in the filename. It doesn't look like something `systemd::tmpfile` is doing.
  • 22:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 21:59 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/Linter/includes/Hooks.php: Backport: Hooks.php: Check for non-array $tags (T302918) (duration: 00m 50s)
  • 21:53 ryankemper: T276198 Disabled puppet across all of elastic*, cloudelastic*, and relforge* to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/766876/ on a single elastic host
  • 21:44 mutante: rolling out scap 4.4.2 on 'all' T302919
  • 21:36 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 21:19 dancy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wmf-config: Undeploy the fawiki test survey from production (T300291) (duration: 00m 50s)
  • 21:13 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 21:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: testing scap 4.4.2
  • 21:05 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 21:00 mutante: deploy1002 - upgraded scap to 4.4.2-1 T302919
  • 20:48 mutante: running test-deploy to devcluster (restbase) to test new scap version, succesful and then rolled back, as the docs say T302919
  • 20:48 dzahn@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 41s)
  • 20:47 dzahn@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 20:44 mutante: testec 'scap pull' still worked on mwdebug1001; rolling out scap 4.4.2 to A:restbase-canary (T302919)
  • 20:38 mutante: rolling out scap 4.4.2 to A:mw-canary or A:parsoid-canary or A:mw-jobrunner-canary (T302919)
  • 20:20 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:03 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:57 brennen@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
  • 19:53 brennen@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
  • 19:47 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:46 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/ApiFeatureUsage: Backport: Add a non-namespaced alias for ApiFeatureUsageQueryEngineElastica (T302907) (duration: 00m 50s)
  • 19:45 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:36 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:33 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:30 mutante: stopped icinga-wm
  • 19:14 brennen@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.24 refs T300200 (duration: 00m 50s)
  • 19:13 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.24 refs T300200
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21729 and previous config saved to /var/cache/conftool/dbconfig/20220302-191323-ladsgroup.json
  • 19:10 brennen: 1.38.0-wmf.24 train (T300200): no current blockers; proceeding to group1
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21728 and previous config saved to /var/cache/conftool/dbconfig/20220302-185819-ladsgroup.json
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21727 and previous config saved to /var/cache/conftool/dbconfig/20220302-184314-ladsgroup.json
  • 18:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21726 and previous config saved to /var/cache/conftool/dbconfig/20220302-182809-ladsgroup.json
  • 18:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21725 and previous config saved to /var/cache/conftool/dbconfig/20220302-182153-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21724 and previous config saved to /var/cache/conftool/dbconfig/20220302-182145-ladsgroup.json
  • 18:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 18:14 rzl@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 18:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:13 rzl@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:13 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 18:13 rzl@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 18:13 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 18:11 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 18:11 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 18:11 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 18:11 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 18:09 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 18:09 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21723 and previous config saved to /var/cache/conftool/dbconfig/20220302-180640-ladsgroup.json
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21722 and previous config saved to /var/cache/conftool/dbconfig/20220302-175136-ladsgroup.json
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21721 and previous config saved to /var/cache/conftool/dbconfig/20220302-173631-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21720 and previous config saved to /var/cache/conftool/dbconfig/20220302-173112-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21719 and previous config saved to /var/cache/conftool/dbconfig/20220302-173104-ladsgroup.json
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21718 and previous config saved to /var/cache/conftool/dbconfig/20220302-171559-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21717 and previous config saved to /var/cache/conftool/dbconfig/20220302-170055-ladsgroup.json
  • 16:51 vgutierrez: pool cp3061 running HAProxy as TLS termination layer - T290005 T271421
  • 16:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3061.esams.wmnet with OS buster
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21716 and previous config saved to /var/cache/conftool/dbconfig/20220302-164550-ladsgroup.json
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21715 and previous config saved to /var/cache/conftool/dbconfig/20220302-163329-ladsgroup.json
  • 16:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 16:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21714 and previous config saved to /var/cache/conftool/dbconfig/20220302-163322-ladsgroup.json
  • 16:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
  • 16:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P21713 and previous config saved to /var/cache/conftool/dbconfig/20220302-161817-ladsgroup.json
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P21711 and previous config saved to /var/cache/conftool/dbconfig/20220302-160312-ladsgroup.json
  • 15:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3061.esams.wmnet with OS buster
  • 15:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5014.eqsin.wmnet with OS buster
  • 15:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21710 and previous config saved to /var/cache/conftool/dbconfig/20220302-154807-ladsgroup.json
  • 15:47 vgutierrez: pool cp5014 running HAProxy as TLS termination layer - T290005 T271421
  • 15:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 15:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21709 and previous config saved to /var/cache/conftool/dbconfig/20220302-154039-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21708 and previous config saved to /var/cache/conftool/dbconfig/20220302-154026-ladsgroup.json
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21707 and previous config saved to /var/cache/conftool/dbconfig/20220302-152519-ladsgroup.json
  • 15:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
  • 15:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21706 and previous config saved to /var/cache/conftool/dbconfig/20220302-151015-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21705 and previous config saved to /var/cache/conftool/dbconfig/20220302-145510-ladsgroup.json
  • 14:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5014.eqsin.wmnet with OS buster
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21704 and previous config saved to /var/cache/conftool/dbconfig/20220302-145054-ladsgroup.json
  • 14:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21703 and previous config saved to /var/cache/conftool/dbconfig/20220302-145046-ladsgroup.json
  • 14:41 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
  • 14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 14:38 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
  • 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21702 and previous config saved to /var/cache/conftool/dbconfig/20220302-143541-ladsgroup.json
  • 14:34 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4034.ulsfo.wmnet with OS buster
  • 14:27 moritzm: rebalance VMs in Ganeti row A after adding new servers (and decomissioning old ones)
  • 14:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 14:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
  • 14:21 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.23/extensions/FlaggedRevs/modules/ext.flaggedRevs.review/review.js: Backport: ext.flaggedRevs.review: Restore tolerance when setting "disabled" prop (duration: 00m 52s)
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21701 and previous config saved to /var/cache/conftool/dbconfig/20220302-142037-ladsgroup.json
  • 14:13 mmandere: pool cp6013
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21700 and previous config saved to /var/cache/conftool/dbconfig/20220302-140532-ladsgroup.json
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21699 and previous config saved to /var/cache/conftool/dbconfig/20220302-140112-ladsgroup.json
  • 14:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21698 and previous config saved to /var/cache/conftool/dbconfig/20220302-140105-ladsgroup.json
  • 13:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P21697 and previous config saved to /var/cache/conftool/dbconfig/20220302-134600-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P21696 and previous config saved to /var/cache/conftool/dbconfig/20220302-133055-ladsgroup.json
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21695 and previous config saved to /var/cache/conftool/dbconfig/20220302-131550-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21694 and previous config saved to /var/cache/conftool/dbconfig/20220302-131032-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21693 and previous config saved to /var/cache/conftool/dbconfig/20220302-131024-ladsgroup.json
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21692 and previous config saved to /var/cache/conftool/dbconfig/20220302-125519-ladsgroup.json
  • 12:47 reedy@deploy1002: Finished scap: Fix MassMessage translations T302840 (duration: 01m 50s)
  • 12:45 reedy@deploy1002: Started scap: Fix MassMessage translations T302840
  • 12:43 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21690 and previous config saved to /var/cache/conftool/dbconfig/20220302-124014-ladsgroup.json
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21689 and previous config saved to /var/cache/conftool/dbconfig/20220302-122510-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21688 and previous config saved to /var/cache/conftool/dbconfig/20220302-122049-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21687 and previous config saved to /var/cache/conftool/dbconfig/20220302-121754-ladsgroup.json
  • 12:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21686 and previous config saved to /var/cache/conftool/dbconfig/20220302-120250-ladsgroup.json
  • 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21685 and previous config saved to /var/cache/conftool/dbconfig/20220302-114745-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21684 and previous config saved to /var/cache/conftool/dbconfig/20220302-113240-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21683 and previous config saved to /var/cache/conftool/dbconfig/20220302-112824-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 11:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21682 and previous config saved to /var/cache/conftool/dbconfig/20220302-112347-ladsgroup.json
  • 11:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e" (duration: 01m 29s)
  • 11:22 mbsantos: rollback maps eqiad to a previous working state to mitigate geoshape errors
  • 11:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e"
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P21681 and previous config saved to /var/cache/conftool/dbconfig/20220302-110842-ladsgroup.json
  • 11:05 moritzm: installing expat security updates
  • 10:56 moritzm: restarting apache2 and mailman3-web on lists.wikimedia.org for expat security update
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P21680 and previous config saved to /var/cache/conftool/dbconfig/20220302-105336-ladsgroup.json
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21678 and previous config saved to /var/cache/conftool/dbconfig/20220302-103832-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21677 and previous config saved to /var/cache/conftool/dbconfig/20220302-103407-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:20 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 10:18 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 10:15 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 10:15 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d049589] (eqiad): Revert "Temporarily increase poolsize for debugging" (duration: 01m 45s)
  • 10:14 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 10:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d049589] (eqiad): Revert "Temporarily increase poolsize for debugging"
  • 10:13 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d049589] (codfw): Revert "Temporarily increase poolsize for debugging" (duration: 01m 36s)
  • 10:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d049589] (codfw): Revert "Temporarily increase poolsize for debugging"
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21676 and previous config saved to /var/cache/conftool/dbconfig/20220302-100903-ladsgroup.json
  • 10:04 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-ctrl2002.codfw.wmnet
  • 09:56 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 09:55 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 09:55 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21675 and previous config saved to /var/cache/conftool/dbconfig/20220302-095358-ladsgroup.json
  • 09:51 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@fd6bc59] (codfw): Temporarily increase poolsize for debugging (duration: 04m 26s)
  • 09:49 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 09:49 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-ctrl2002.codfw.wmnet
  • 09:48 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-ctrl2001.codfw.wmnet
  • 09:47 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@fd6bc59] (codfw): Temporarily increase poolsize for debugging
  • 09:46 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@fd6bc59] (eqiad): Temporarily increase poolsize for debugging (duration: 02m 13s)
  • 09:44 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@fd6bc59] (eqiad): Temporarily increase poolsize for debugging
  • 09:39 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21674 and previous config saved to /var/cache/conftool/dbconfig/20220302-093853-ladsgroup.json
  • 09:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 09:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-ctrl2001.codfw.wmnet
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21673 and previous config saved to /var/cache/conftool/dbconfig/20220302-093027-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21672 and previous config saved to /var/cache/conftool/dbconfig/20220302-092348-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21671 and previous config saved to /var/cache/conftool/dbconfig/20220302-092128-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21670 and previous config saved to /var/cache/conftool/dbconfig/20220302-092120-ladsgroup.json
  • 09:16 mmandere: rolling restart of varnishkafka-* on cp6*
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21669 and previous config saved to /var/cache/conftool/dbconfig/20220302-091523-ladsgroup.json
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21668 and previous config saved to /var/cache/conftool/dbconfig/20220302-090615-ladsgroup.json
  • 09:05 XioNoX: push Capirca managed labs-in firewall filter to eqiad routers
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21667 and previous config saved to /var/cache/conftool/dbconfig/20220302-090018-ladsgroup.json
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21666 and previous config saved to /var/cache/conftool/dbconfig/20220302-085111-ladsgroup.json
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21665 and previous config saved to /var/cache/conftool/dbconfig/20220302-084513-ladsgroup.json
  • 08:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1167.eqiad.wmnet with OS bullseye
  • 08:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21664 and previous config saved to /var/cache/conftool/dbconfig/20220302-083606-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21663 and previous config saved to /var/cache/conftool/dbconfig/20220302-083345-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21662 and previous config saved to /var/cache/conftool/dbconfig/20220302-083338-ladsgroup.json
  • 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage
  • 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage
  • 08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21661 and previous config saved to /var/cache/conftool/dbconfig/20220302-081832-ladsgroup.json
  • 08:09 godog: test thanos 0.24.0 on thanos-fe2001 to check if https://github.com/thanos-io/thanos/issues/4531 is fixed
  • 08:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1167.eqiad.wmnet with OS bullseye
  • 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21660 and previous config saved to /var/cache/conftool/dbconfig/20220302-080327-ladsgroup.json
  • 08:02 Amir1: killing all entity dumpers of wikidata in snapshot1008 (T300255)
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21659 and previous config saved to /var/cache/conftool/dbconfig/20220302-074822-ladsgroup.json
  • 07:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21658 and previous config saved to /var/cache/conftool/dbconfig/20220302-074602-ladsgroup.json
  • 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21657 and previous config saved to /var/cache/conftool/dbconfig/20220302-074210-ladsgroup.json
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21656 and previous config saved to /var/cache/conftool/dbconfig/20220302-073610-ladsgroup.json
  • 07:35 _joe_: filling request patterns in etcd
  • 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21655 and previous config saved to /var/cache/conftool/dbconfig/20220302-072105-ladsgroup.json
  • 07:09 _joe_: installing scap 4.4.1 everywhere T302464
  • 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21654 and previous config saved to /var/cache/conftool/dbconfig/20220302-070601-ladsgroup.json
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21653 and previous config saved to /var/cache/conftool/dbconfig/20220302-065056-ladsgroup.json
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21652 and previous config saved to /var/cache/conftool/dbconfig/20220302-063933-ladsgroup.json
  • 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21651 and previous config saved to /var/cache/conftool/dbconfig/20220302-062428-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21650 and previous config saved to /var/cache/conftool/dbconfig/20220302-060924-ladsgroup.json
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21649 and previous config saved to /var/cache/conftool/dbconfig/20220302-055419-ladsgroup.json
  • 05:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1101.eqiad.wmnet with OS bullseye
  • 05:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: host reimage
  • 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: host reimage
  • 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1101.eqiad.wmnet with OS bullseye
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21648 and previous config saved to /var/cache/conftool/dbconfig/20220302-052033-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21647 and previous config saved to /var/cache/conftool/dbconfig/20220302-051947-ladsgroup.json
  • 05:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21646 and previous config saved to /var/cache/conftool/dbconfig/20220302-051853-ladsgroup.json
  • 05:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21645 and previous config saved to /var/cache/conftool/dbconfig/20220302-050526-ladsgroup.json
  • 05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21644 and previous config saved to /var/cache/conftool/dbconfig/20220302-050442-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21643 and previous config saved to /var/cache/conftool/dbconfig/20220302-045021-ladsgroup.json
  • 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21642 and previous config saved to /var/cache/conftool/dbconfig/20220302-044938-ladsgroup.json
  • 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21641 and previous config saved to /var/cache/conftool/dbconfig/20220302-043516-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21640 and previous config saved to /var/cache/conftool/dbconfig/20220302-043433-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21639 and previous config saved to /var/cache/conftool/dbconfig/20220302-043313-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21638 and previous config saved to /var/cache/conftool/dbconfig/20220302-043229-ladsgroup.json
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21637 and previous config saved to /var/cache/conftool/dbconfig/20220302-042012-ladsgroup.json
  • 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21636 and previous config saved to /var/cache/conftool/dbconfig/20220302-041725-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1104.eqiad.wmnet with OS bullseye
  • 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21635 and previous config saved to /var/cache/conftool/dbconfig/20220302-040220-ladsgroup.json
  • 04:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1104.eqiad.wmnet with reason: host reimage
  • 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1104.eqiad.wmnet with reason: host reimage
  • 03:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1104.eqiad.wmnet with OS bullseye
  • 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21634 and previous config saved to /var/cache/conftool/dbconfig/20220302-034715-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21633 and previous config saved to /var/cache/conftool/dbconfig/20220302-034502-ladsgroup.json
  • 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21632 and previous config saved to /var/cache/conftool/dbconfig/20220302-034454-ladsgroup.json
  • 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 03:43 ejegg: updated CiviCRM from e9f0eff5 to cb0605ed
  • 02:13 ejegg: Fundraising CiviCRM updated from 2874d623 to e9f0eff5
  • 00:15 topranks: Re-enabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo.
  • 00:07 topranks: disabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo.

2022-03-01

  • 22:51 inflatador: T276198 reenabled puppet on elastic1052.eqiad.wmnet
  • 22:37 inflatador: T276198 rebooting elastic1052.eqiad.wmnet to test failure condition
  • 22:33 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
  • 22:33 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
  • 22:32 inflatador: T276198 disabling puppet on elastic1052.eqiad.wmnet to test failure condition (rebooting shortly)
  • 21:53 dancy@deploy1002: Finished scap: Resync to try to clear alerts (duration: 12m 08s)
  • 21:41 dancy@deploy1002: Started scap: Resync to try to clear alerts
  • 21:36 dancy@deploy1002: Started scap: Resync to try to clear alerts
  • 20:36 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.24 refs T300200
  • 20:33 brennen: 1.38.0-wmf.24 train (T300200): no current blockers; proceeding to group0; note this may briefly trigger some version alerts
  • 20:30 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/includes: Backport: Revert "preferences: Use a faster and simpler form descriptor when validating" (T302643) (duration: 00m 55s)
  • 20:05 mutante: alert1001 - re-enabled puppet
  • 20:05 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.24 refs T300200 (duration: 53m 17s)
  • 19:45 mutante: alert1001 - disable puppet, systemctl stop ircecho - to stop bot spam, caused somehow by new scap version breaking "mw versions mismwatch" alerting - affects labtestwiki,testwiki,testwikidatawiki
  • 19:38 mutante: mw1449 - scap pull
  • 19:36 mutante: mw1414 - scap pull
  • 19:11 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.24 refs T300200
  • 19:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2008.codfw.wmnet
  • 19:01 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 18:57 brennen: 1.38.0-wmf.24 train (T300200): there's currently a single blocker at T302643; staging to testwikis and holding there until backport's available
  • 18:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2008.codfw.wmnet
  • 18:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
  • 18:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21626 and previous config saved to /var/cache/conftool/dbconfig/20220301-180216-ladsgroup.json
  • 17:52 cwhite: completed grafana upgrade in eqiad T282863
  • 17:50 herron: re-enabling puppet and ircecho on alert1001
  • 17:47 cwhite: upgrade grafana in eqiad T282863
  • 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21625 and previous config saved to /var/cache/conftool/dbconfig/20220301-174711-ladsgroup.json
  • 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21624 and previous config saved to /var/cache/conftool/dbconfig/20220301-173206-ladsgroup.json
  • 17:24 dancy@deploy1002: Finished scap: testing container image build (duration: 28m 39s)
  • 17:17 herron: stopped ircecho on alert1001 due to systemd unit alert shower
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21622 and previous config saved to /var/cache/conftool/dbconfig/20220301-171701-ladsgroup.json
  • 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21621 and previous config saved to /var/cache/conftool/dbconfig/20220301-171441-ladsgroup.json
  • 17:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 17:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:55 dancy@deploy1002: Started scap: testing container image build
  • 16:24 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 03s)
  • 16:23 ebysans@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
  • 16:12 moritzm: restarting apache on logstash nodes to pick up expat update
  • 16:11 elukey@deploy1002: Finished deploy [ores/deploy@29de1cc]: ORES Winter deployment - T300195 (duration: 36m 13s)
  • 16:05 moritzm: restarting nginx on wcqs* nodes to pick up expat update
  • 15:35 elukey@deploy1002: Started deploy [ores/deploy@29de1cc]: ORES Winter deployment - T300195
  • 15:21 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 07s)
  • 15:21 ntsako@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
  • 15:06 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2003.codfw.wmnet
  • 14:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:52 elukey: elukey@deploy1002:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the node)
  • 14:51 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 14:51 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
  • 14:48 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2002.codfw.wmnet
  • 14:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 14:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 14:38 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:36 vgutierrez: pool cp1087 running HAProxy as TLS termination layer - T290005 T271421
  • 14:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1087.eqiad.wmnet with OS buster
  • 14:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 14:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
  • 14:32 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
  • 14:32 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2001.codfw.wmnet
  • 14:19 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 14:19 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 14:14 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
  • 14:09 moritzm: restarting nginx on wdqs* nodes to pick up expat update
  • 14:03 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
  • 14:03 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:57 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:53 mmandere: restart purged on cp60[15-16]
  • 13:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
  • 13:48 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:48 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
  • 13:48 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
  • 13:48 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
  • 13:44 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
  • 13:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:43 klausman@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:40 kormat: Deploying wmfmariadbpy 0.9 T302796
  • 13:40 kormat: uploaded wmfmariadbpy 0.9 to apt.wm.o T302796
  • 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:39 klausman@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
  • 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
  • 13:32 moritzm: restarting nginx on registry* nodes to pick up expat update
  • 13:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1087.eqiad.wmnet with OS buster
  • 13:15 XioNoX: restart cr1-drmrs for software upgrade
  • 13:03 moritzm: restarting FPM/Apache on parsoid hosts to pick up expat update
  • 12:50 vgutierrez: pool cp3062 running HAProxy as TLS termination layer - T290005 T271421
  • 12:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS buster
  • 12:39 moritzm: installing expat security updates
  • 12:34 mmandere: restart purged on cp60[12-14]
  • 12:32 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker (duration: 01m 06s)
  • 12:31 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker
  • 12:30 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker (duration: 01m 30s)
  • 12:28 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker
  • 12:15 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration (duration: 01m 41s)
  • 12:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration
  • 12:11 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration (duration: 02m 01s)
  • 12:09 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration
  • 11:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:36 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 11:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 11:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2001.codfw.wmnet
  • 11:33 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 11:32 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 11:30 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 11:28 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 11:27 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 11:27 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:21 _joe_: restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.T244843
  • 11:18 _joe_: also removed the ipvsadm entry for apaches:80 T244843
  • 11:17 jayme: rolled back linkrecommendation staging helm release to revision 12 - T302744
  • 11:17 _joe_: restarting pybal on lvs1020 T244843
  • 11:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
  • 11:11 _joe_: restarted pybal on lvs2009, T244843
  • 11:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
  • 11:07 _joe_: restarted pybal on lvs2010, T244843
  • 11:02 mmandere: restart purged on cp60[09,10,11]
  • 11:00 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:47 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 259 hosts
  • 10:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 259 hosts
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 1353 hosts
  • 10:39 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 1353 hosts
  • 10:31 mmandere: restart purged on cp600[6-8]
  • 10:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:05 vgutierrez: pool cp2039 running HAProxy as TLS termination layer - T290005 T271421
  • 09:48 elukey: elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host)
  • 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster
  • 09:33 _joe_: restarted pybal on lvs1019, removed the mw api from ipvsadm, the mw api is internally fully encrypted
  • 09:31 _joe_: restart pybal on lvs1020
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amuigai out of all services on: 1881 hosts
  • 09:25 elukey: restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka)
  • 09:25 _joe_: manually removed ipvs entries on lvs2*, so it is actually now that the http api is not available in codfw anymore
  • 09:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Amuigai out of all services on: 1881 hosts
  • 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ZPapierski out of all services on: 1881 hosts
  • 09:22 jmm@cumin2002: START - Cookbook sre.idm.logout Logging ZPapierski out of all services on: 1881 hosts
  • 09:22 _joe_: restarted pybal on lvs2009, the mw api is now effectively https-only in codfw T287820
  • 09:20 _joe_: restarted pybal on lvs2010
  • 09:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
  • 09:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
  • 09:06 elukey: restart purged on cp6005
  • 08:57 elukey: restart purged on cp6004
  • 08:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS buster
  • 08:27 urbanecm: UTC morning B&C window done
  • 08:25 elukey: restart purged on cp6003
  • 08:16 moritzm: drain instances off ganeti2008 for eventual decom
  • 08:08 urbanecm@deploy1002: Synchronized wmf-config/ProductionServices.php: d149208: Use service-proxy to connect to linkrecommendation (T302719) (duration: 00m 49s)
  • 07:59 elukey: restart purged on cp6002
  • 06:58 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test (duration: 00m 17s)
  • 06:57 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test
  • 06:56 elukey: restart purged on cp6001 to clear stale kafka TLS consumer state (or attempting to)
  • 06:46 _joe_: uploaded scap 4.4.1 to {stretch,buster,bullseye} T302464
  • 06:46 _joe_: uploaded scap 4.4.1 to {stretch,buster,bullseye}
  • 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21618 and previous config saved to /var/cache/conftool/dbconfig/20220301-025938-ladsgroup.json
  • 02:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21617 and previous config saved to /var/cache/conftool/dbconfig/20220301-024433-ladsgroup.json
  • 02:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21616 and previous config saved to /var/cache/conftool/dbconfig/20220301-022928-ladsgroup.json
  • 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21615 and previous config saved to /var/cache/conftool/dbconfig/20220301-021424-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21614 and previous config saved to /var/cache/conftool/dbconfig/20220301-011404-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 00:17 mutante: 15.wikipedia.org on k8s (staging) deploy1002:~] $ curl -s --resolve "15.wikipedia.org:4111:staging.svc.eqiad.wmnet" 'https://15.wikipedia.org' | grep grandpa => "“Wikipedia is like an all-knowing grandpa.”" | T300171


2000s

2010s

2020s