Jump to content

Server Admin Log/Archive 57

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2022-09-30

  • 23:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 23:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T314041)', diff saved to https://phabricator.wikimedia.org/P35243 and previous config saved to /var/cache/conftool/dbconfig/20220930-232546-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35242 and previous config saved to /var/cache/conftool/dbconfig/20220930-231040-ladsgroup.json
  • 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35241 and previous config saved to /var/cache/conftool/dbconfig/20220930-225534-ladsgroup.json
  • 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T314041)', diff saved to https://phabricator.wikimedia.org/P35240 and previous config saved to /var/cache/conftool/dbconfig/20220930-224027-ladsgroup.json
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2001.codfw.wmnet
  • 20:54 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2001.codfw.wmnet
  • 18:30 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
  • 18:08 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
  • 18:01 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
  • 17:43 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
  • 17:24 bblack@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cp4045.ulsfo.wmnet with OS bullseye
  • 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T314041)', diff saved to https://phabricator.wikimedia.org/P35237 and previous config saved to /var/cache/conftool/dbconfig/20220930-170620-ladsgroup.json
  • 17:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T314041)', diff saved to https://phabricator.wikimedia.org/P35236 and previous config saved to /var/cache/conftool/dbconfig/20220930-170546-ladsgroup.json
  • 16:54 bblack@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35235 and previous config saved to /var/cache/conftool/dbconfig/20220930-165040-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35234 and previous config saved to /var/cache/conftool/dbconfig/20220930-163533-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T314041)', diff saved to https://phabricator.wikimedia.org/P35233 and previous config saved to /var/cache/conftool/dbconfig/20220930-162027-ladsgroup.json
  • 15:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 14:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 13:51 moritzm: installing puppetdb-test2001 T318931
  • 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:23 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35232 and previous config saved to /var/cache/conftool/dbconfig/20220930-131638-root.json
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35231 and previous config saved to /var/cache/conftool/dbconfig/20220930-130133-root.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35230 and previous config saved to /var/cache/conftool/dbconfig/20220930-124628-root.json
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35229 and previous config saved to /var/cache/conftool/dbconfig/20220930-123123-root.json
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35228 and previous config saved to /var/cache/conftool/dbconfig/20220930-121618-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35227 and previous config saved to /var/cache/conftool/dbconfig/20220930-120113-root.json
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetdb-test2001.codfw.wmnet
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35226 and previous config saved to /var/cache/conftool/dbconfig/20220930-114605-root.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35225 and previous config saved to /var/cache/conftool/dbconfig/20220930-113101-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P35224 and previous config saved to /var/cache/conftool/dbconfig/20220930-112307-root.json
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetdb-test2001.codfw.wmnet on all recursors
  • 11:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache puppetdb-test2001.codfw.wmnet on all recursors
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb-test2001.codfw.wmnet
  • 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T314041)', diff saved to https://phabricator.wikimedia.org/P35223 and previous config saved to /var/cache/conftool/dbconfig/20220930-104004-ladsgroup.json
  • 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T314041)', diff saved to https://phabricator.wikimedia.org/P35222 and previous config saved to /var/cache/conftool/dbconfig/20220930-103943-ladsgroup.json
  • 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35221 and previous config saved to /var/cache/conftool/dbconfig/20220930-102436-ladsgroup.json
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35220 and previous config saved to /var/cache/conftool/dbconfig/20220930-100930-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T314041)', diff saved to https://phabricator.wikimedia.org/P35219 and previous config saved to /var/cache/conftool/dbconfig/20220930-095423-ladsgroup.json
  • 09:42 moritzm: installing Linux 5.10.140 updates on Bullseye hosts (released via 11.5 point release), just rollout of the package, no reboots involved
  • 07:37 XioNoX: add RPKI ROAs for 185.71.138.0/24 and 2001:67c:930::/48
  • 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36692
  • 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36692
  • 07:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52320
  • 07:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 52320
  • 07:19 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:18 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
  • 07:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35218 and previous config saved to /var/cache/conftool/dbconfig/20220930-070454-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35217 and previous config saved to /var/cache/conftool/dbconfig/20220930-065844-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35216 and previous config saved to /var/cache/conftool/dbconfig/20220930-064949-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35215 and previous config saved to /var/cache/conftool/dbconfig/20220930-064339-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35214 and previous config saved to /var/cache/conftool/dbconfig/20220930-063444-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35213 and previous config saved to /var/cache/conftool/dbconfig/20220930-062834-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35212 and previous config saved to /var/cache/conftool/dbconfig/20220930-061939-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35211 and previous config saved to /var/cache/conftool/dbconfig/20220930-061329-root.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35210 and previous config saved to /var/cache/conftool/dbconfig/20220930-060434-root.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35209 and previous config saved to /var/cache/conftool/dbconfig/20220930-055824-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35208 and previous config saved to /var/cache/conftool/dbconfig/20220930-054929-root.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35207 and previous config saved to /var/cache/conftool/dbconfig/20220930-054319-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35206 and previous config saved to /var/cache/conftool/dbconfig/20220930-053424-root.json
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35204 and previous config saved to /var/cache/conftool/dbconfig/20220930-052814-root.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35203 and previous config saved to /var/cache/conftool/dbconfig/20220930-051919-root.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35202 and previous config saved to /var/cache/conftool/dbconfig/20220930-051309-root.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P35201 and previous config saved to /var/cache/conftool/dbconfig/20220930-051206-root.json
  • 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P35200 and previous config saved to /var/cache/conftool/dbconfig/20220930-050533-root.json
  • 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T314041)', diff saved to https://phabricator.wikimedia.org/P35199 and previous config saved to /var/cache/conftool/dbconfig/20220930-041937-ladsgroup.json
  • 04:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 04:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T314041)', diff saved to https://phabricator.wikimedia.org/P35198 and previous config saved to /var/cache/conftool/dbconfig/20220930-041916-ladsgroup.json
  • 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35197 and previous config saved to /var/cache/conftool/dbconfig/20220930-040409-ladsgroup.json
  • 03:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35196 and previous config saved to /var/cache/conftool/dbconfig/20220930-034903-ladsgroup.json
  • 03:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T314041)', diff saved to https://phabricator.wikimedia.org/P35195 and previous config saved to /var/cache/conftool/dbconfig/20220930-033356-ladsgroup.json
  • 00:31 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
  • 00:22 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye

2022-09-29

  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T314041)', diff saved to https://phabricator.wikimedia.org/P35193 and previous config saved to /var/cache/conftool/dbconfig/20220929-224649-ladsgroup.json
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35192 and previous config saved to /var/cache/conftool/dbconfig/20220929-223143-ladsgroup.json
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35191 and previous config saved to /var/cache/conftool/dbconfig/20220929-221637-ladsgroup.json
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T314041)', diff saved to https://phabricator.wikimedia.org/P35190 and previous config saved to /var/cache/conftool/dbconfig/20220929-220130-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T314041)', diff saved to https://phabricator.wikimedia.org/P35189 and previous config saved to /var/cache/conftool/dbconfig/20220929-215333-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 21:43 sukhe: alert1001: restart icinga
  • 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:26 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4045.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4045.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:18 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:18 ejegg: payments-wiki upgraded from 839d6dde to aeee9676
  • 21:14 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:14 brennen: end of utc late backport and config window
  • 21:14 brennen@deploy1002: Finished scap: Backport for cirrus: Don't configure cloud clusters for private wikis (duration: 08m 22s)
  • 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:06 brennen@deploy1002: brennen and ebernhardson: Backport for cirrus: Don't configure cloud clusters for private wikis synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:05 brennen@deploy1002: Started scap: Backport for cirrus: Don't configure cloud clusters for private wikis
  • 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:59 ryankemper: T313431 Repooled `elastic[2073-2074,2080-2081,2083,2086].codfw.wmnet`. Codfw's all on 5 masters now and cluster is back to green.
  • 20:58 brennen@deploy1002: Sync cancelled.
  • 20:58 brennen@deploy1002: brennen and trainbranchbot: Backport for Revert "cirrus: Don't configure cloud clusters for private wikis" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:58 ryankemper: T313431 Updated cross-cluster seed conf with new masters; should resolve the settings check alerts
  • 20:58 brennen@deploy1002: Started scap: Backport for Revert "cirrus: Don't configure cloud clusters for private wikis"
  • 20:57 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4027.ulsfo.wmnet
  • 20:57 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:52 brennen@deploy1002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.40.0-wmf.3" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.gcoIZ0BTKW"' returned non-zero exit status 255. (duration: 00m 00s)
  • 20:52 brennen@deploy1002: Started scap: Backport for cirrus: Don't configure cloud clusters for private wikis
  • 20:49 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:46 brennen@deploy1002: Sync cancelled.
  • 20:45 brennen@deploy1002: brennen and trainbranchbot: Backport for Revert "Add Nepalese Wikipedia tagline" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:45 brennen@deploy1002: Started scap: Backport for Revert "Add Nepalese Wikipedia tagline"
  • 20:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 20:42 brennen@deploy1002: Sync cancelled.
  • 20:41 brennen@deploy1002: brennen and jdlrobson: Backport for Add Nepalese Wikipedia tagline (T318737) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:41 ryankemper: T313431 Restarting elasticsearch_7* services on `elastic2080` to pick up new master-eligible status
  • 20:41 brennen@deploy1002: Started scap: Backport for Add Nepalese Wikipedia tagline (T318737)
  • 20:38 brennen@deploy1002: Finished scap: Backport for Enable desktop improvements on nowikimedia (T318344) (duration: 08m 03s)
  • 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:35 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4027.ulsfo.wmnet
  • 20:35 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts cp4027.ulsfo.wmnet
  • 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:33 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4027.ulsfo.wmnet
  • 20:30 brennen@deploy1002: brennen and jdlrobson: Backport for Enable desktop improvements on nowikimedia (T318344) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:30 brennen@deploy1002: Started scap: Backport for Enable desktop improvements on nowikimedia (T318344)
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:25 brennen@deploy1002: Finished scap: Backport for Web team config cleanup (T316568) (duration: 08m 05s)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:19 hoo: Ran foreachwikiindblist wikidataclient-test extensions/Wikibase/client/maintenance/PopulateUnexpectedUnconnectedPagePageProp.php
  • 20:17 ejegg: payments-wiki upgraded from 0456850e to 839d6dde (with cache prefix altered for moved classes)
  • 20:17 ryankemper: T313431 Restarting elasticsearch_7* services on `elastic2086` to pick up new master-eligible status
  • 20:17 brennen@deploy1002: brennen and jdlrobson: Backport for Web team config cleanup (T316568) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:17 brennen@deploy1002: Started scap: Backport for Web team config cleanup (T316568)
  • 20:04 ejegg: payments-wiki rolled back from 839d6dde to 0456850e
  • 19:56 ejegg: payments-wiki upgraded from 0456850e to 839d6dde
  • 19:55 ryankemper: T313431 Restarting elasticsearch_7* services on `elastic208[1,3]` to pick up new master-eligible status
  • 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 19:33 ryankemper: T313431 Restarting elasticsearch_7* services on `elastic207[3,4]` to pick up new master-eligible status
  • 19:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: T313431
  • 19:29 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: T313431
  • 19:09 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4021.ulsfo.wmnet
  • 19:09 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1060.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:04 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 19:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1061.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1059.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1058.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1057.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1056.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1055.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:59 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4021.ulsfo.wmnet
  • 18:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1054.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1061.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1060.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1059.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1058.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1057.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1056.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1055.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1054.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.3 refs T314192
  • 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:10 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:08 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:07 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:06 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T314041)', diff saved to https://phabricator.wikimedia.org/P35188 and previous config saved to /var/cache/conftool/dbconfig/20220929-162812-ladsgroup.json
  • 16:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T314041)', diff saved to https://phabricator.wikimedia.org/P35187 and previous config saved to /var/cache/conftool/dbconfig/20220929-162750-ladsgroup.json
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P35186 and previous config saved to /var/cache/conftool/dbconfig/20220929-161244-ladsgroup.json
  • 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P35185 and previous config saved to /var/cache/conftool/dbconfig/20220929-155737-ladsgroup.json
  • 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:49 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Configure `mul` Wikibase language code on Beta wikis (beta-only, prod noop) (duration: 03m 41s)
  • 15:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T314041)', diff saved to https://phabricator.wikimedia.org/P35184 and previous config saved to /var/cache/conftool/dbconfig/20220929-154231-ladsgroup.json
  • 15:35 dancy@deploy1002: Installation of scap version "4.25.0" completed for 561 hosts
  • 15:35 dancy@deploy1002: Installing scap version "4.25.0" for 561 hosts
  • 14:30 moritzm: installing glib2.0 security updates
  • 14:29 moritzm: uploaded glib2.0 2.50.3-2+deb9u3+wmf1 to apt.wikimedia.org/stretch-wikimedia
  • 14:17 moritzm: rolling restart of apache2 in mw/eqiad to pick up Expat security updates
  • 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11164
  • 14:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11164
  • 13:54 claime: Enabled puppet for C:memcache hosts following merge C:memcached Fix memcached bootstrap
  • 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 32934
  • 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35179 and previous config saved to /var/cache/conftool/dbconfig/20220929-134844-root.json
  • 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:46 claime: Disabling puppet for C:memcache hosts to merge C:memcached Fix memcached bootstrap
  • 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:41 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 13:41 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Wikibase: Set UnconnectedPage page prop format for test wikis (duration: 06m 13s)
  • 13:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8966
  • 13:39 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8966
  • 13:35 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and hoo: Backport for Wikibase: Set UnconnectedPage page prop format for test wikis synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:34 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Wikibase: Set UnconnectedPage page prop format for test wikis
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35178 and previous config saved to /var/cache/conftool/dbconfig/20220929-133339-root.json
  • 13:33 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Stop mobile visual enhancements from rolling out to jawiki (T318871) (duration: 05m 36s)
  • 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:28 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and kemayo: Backport for Stop mobile visual enhancements from rolling out to jawiki (T318871) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:27 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Stop mobile visual enhancements from rolling out to jawiki (T318871)
  • 13:26 moritzm: restartting Apache on lists
  • 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:20 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384) (duration: 05m 23s)
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35176 and previous config saved to /var/cache/conftool/dbconfig/20220929-131834-root.json
  • 13:15 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and lucaswerkmeister-wmde: Backport for Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:15 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35175 and previous config saved to /var/cache/conftool/dbconfig/20220929-131507-root.json
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:11 moritzm: rolling restart of apache2 in mw/codfw to pick up Expat security updates
  • 13:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: votewiki: Change wgLanguageCode to zh for Sep 2022 admins election (T318147) (duration: 03m 40s)
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35174 and previous config saved to /var/cache/conftool/dbconfig/20220929-130329-root.json
  • 13:01 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.3 refs T314192 (duration: 04m 04s)
  • 13:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35173 and previous config saved to /var/cache/conftool/dbconfig/20220929-130003-root.json
  • 12:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:57 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.3 refs T314192
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35172 and previous config saved to /var/cache/conftool/dbconfig/20220929-124824-root.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35171 and previous config saved to /var/cache/conftool/dbconfig/20220929-124458-root.json
  • 12:44 ladsgroup@deploy1002: Finished scap: Backport for Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904) (duration: 09m 05s)
  • 12:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:35 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 12:34 ladsgroup@deploy1002: Started scap: Backport for Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35169 and previous config saved to /var/cache/conftool/dbconfig/20220929-123319-root.json
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35168 and previous config saved to /var/cache/conftool/dbconfig/20220929-122953-root.json
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35167 and previous config saved to /var/cache/conftool/dbconfig/20220929-121814-root.json
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35166 and previous config saved to /var/cache/conftool/dbconfig/20220929-121448-root.json
  • 12:10 ladsgroup@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3292
  • 12:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3292
  • 12:04 ladsgroup@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35165 and previous config saved to /var/cache/conftool/dbconfig/20220929-120309-root.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35164 and previous config saved to /var/cache/conftool/dbconfig/20220929-115943-root.json
  • 11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 199524
  • 11:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 199524
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178', diff saved to https://phabricator.wikimedia.org/P35163 and previous config saved to /var/cache/conftool/dbconfig/20220929-115612-root.json
  • 11:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 209453
  • 11:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 209453
  • 11:51 ladsgroup@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15695
  • 11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15695
  • 11:45 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 42
  • 11:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42
  • 11:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3856
  • 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35162 and previous config saved to /var/cache/conftool/dbconfig/20220929-114438-root.json
  • 11:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T314041)', diff saved to https://phabricator.wikimedia.org/P35161 and previous config saved to /var/cache/conftool/dbconfig/20220929-114431-ladsgroup.json
  • 11:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3856
  • 11:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 42
  • 11:41 ladsgroup@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:40 ladsgroup@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42
  • 11:39 ladsgroup@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62955
  • 11:38 ladsgroup@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:38 ladsgroup@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62955
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35160 and previous config saved to /var/cache/conftool/dbconfig/20220929-112933-root.json
  • 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P35159 and previous config saved to /var/cache/conftool/dbconfig/20220929-112925-ladsgroup.json
  • 11:16 XioNoX: re-pool cr2-eqord - T295690
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P35158 and previous config saved to /var/cache/conftool/dbconfig/20220929-111418-ladsgroup.json
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 T318892', diff saved to https://phabricator.wikimedia.org/P35157 and previous config saved to /var/cache/conftool/dbconfig/20220929-111217-root.json
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 codfw primary T318892', diff saved to https://phabricator.wikimedia.org/P35156 and previous config saved to /var/cache/conftool/dbconfig/20220929-111127-root.json
  • 11:10 marostegui: Starting s8 codfw failover from db2161 to db2165 - T318892
  • 11:06 XioNoX: restart cr2-eqord for upgrade - T295690
  • 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
  • 11:04 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
  • 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
  • 11:01 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T314041)', diff saved to https://phabricator.wikimedia.org/P35155 and previous config saved to /var/cache/conftool/dbconfig/20220929-105912-ladsgroup.json
  • 10:53 XioNoX: drain cr2-eqord - T295690
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 T318892', diff saved to https://phabricator.wikimedia.org/P35154 and previous config saved to /var/cache/conftool/dbconfig/20220929-105206-root.json
  • 10:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s8 T318892
  • 10:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s8 T318892
  • 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T318892
  • 10:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-eqord,cr2-eqord IPv6 with reason: router upgrade
  • 10:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T318892
  • 10:50 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cr2-eqord,cr2-eqord IPv6 with reason: router upgrade
  • 10:40 XioNoX: repool cr2-eqiad - T295690
  • 10:36 moritzm: installing poppler security updates
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T314041)', diff saved to https://phabricator.wikimedia.org/P35153 and previous config saved to /var/cache/conftool/dbconfig/20220929-100849-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T314041)', diff saved to https://phabricator.wikimedia.org/P35152 and previous config saved to /var/cache/conftool/dbconfig/20220929-100828-ladsgroup.json
  • 10:07 XioNoX: second (and longest) cr2-eqiad RE switchover - T295690
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P35150 and previous config saved to /var/cache/conftool/dbconfig/20220929-095321-ladsgroup.json
  • 09:45 moritzm: restarting superset to pick up expat security update
  • 09:43 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 09:42 XioNoX: first cr2-eqiad RE switchover - T295690
  • 09:41 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 09:38 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P35149 and previous config saved to /var/cache/conftool/dbconfig/20220929-093815-ladsgroup.json
  • 09:36 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 09:34 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:33 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 09:33 XioNoX: drain cr2-eqiad - T295690
  • 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr2-eqiad,cr2-eqiad IPv6,re0.cr2-eqiad.mgmt with reason: router upgrade
  • 09:28 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cr2-eqiad,cr2-eqiad IPv6,re0.cr2-eqiad.mgmt with reason: router upgrade
  • 09:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:26 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2098.codfw.wmnet with OS bullseye
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T314041)', diff saved to https://phabricator.wikimedia.org/P35148 and previous config saved to /var/cache/conftool/dbconfig/20220929-092308-ladsgroup.json
  • 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:16 XioNoX: repool cr1-eqiad - T295690
  • 09:11 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.40.0-wmf.3"
  • 09:07 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
  • 09:04 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
  • 08:52 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2098.codfw.wmnet with OS bullseye
  • 08:43 XioNoX: second cr1-eqiad RE switchover - T295690
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35146 and previous config saved to /var/cache/conftool/dbconfig/20220929-082757-root.json
  • 08:26 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:26 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:26 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:15 XioNoX: first cr1-eqiad RE switchover (for NVM firmware) - T295690
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35145 and previous config saved to /var/cache/conftool/dbconfig/20220929-081252-root.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35144 and previous config saved to /var/cache/conftool/dbconfig/20220929-080340-root.json
  • 07:57 XioNoX: drain traffic away from cr1-eqiad - T295690
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35143 and previous config saved to /var/cache/conftool/dbconfig/20220929-075747-root.json
  • 07:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr1-eqiad,cr1-eqiad IPv6,re0.cr1-eqiad.mgmt with reason: router upgrade
  • 07:49 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cr1-eqiad,cr1-eqiad IPv6,re0.cr1-eqiad.mgmt with reason: router upgrade
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35142 and previous config saved to /var/cache/conftool/dbconfig/20220929-074835-root.json
  • 07:45 moritzm: installing expat security updates
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35141 and previous config saved to /var/cache/conftool/dbconfig/20220929-074242-root.json
  • 07:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 18106
  • 07:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 18106
  • 07:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 38040
  • 07:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 38040
  • 07:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 35280
  • 07:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 35280
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35140 and previous config saved to /var/cache/conftool/dbconfig/20220929-073330-root.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35139 and previous config saved to /var/cache/conftool/dbconfig/20220929-072745-root.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35138 and previous config saved to /var/cache/conftool/dbconfig/20220929-072737-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35137 and previous config saved to /var/cache/conftool/dbconfig/20220929-071825-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35136 and previous config saved to /var/cache/conftool/dbconfig/20220929-071240-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35135 and previous config saved to /var/cache/conftool/dbconfig/20220929-071232-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35134 and previous config saved to /var/cache/conftool/dbconfig/20220929-070320-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35133 and previous config saved to /var/cache/conftool/dbconfig/20220929-065736-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35132 and previous config saved to /var/cache/conftool/dbconfig/20220929-065727-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35131 and previous config saved to /var/cache/conftool/dbconfig/20220929-064815-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35130 and previous config saved to /var/cache/conftool/dbconfig/20220929-064231-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35129 and previous config saved to /var/cache/conftool/dbconfig/20220929-064222-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177', diff saved to https://phabricator.wikimedia.org/P35128 and previous config saved to /var/cache/conftool/dbconfig/20220929-063508-root.json
  • 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35127 and previous config saved to /var/cache/conftool/dbconfig/20220929-063310-root.json
  • 06:27 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35126 and previous config saved to /var/cache/conftool/dbconfig/20220929-062726-root.json
  • 06:27 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35125 and previous config saved to /var/cache/conftool/dbconfig/20220929-061805-root.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35124 and previous config saved to /var/cache/conftool/dbconfig/20220929-061221-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T318888', diff saved to https://phabricator.wikimedia.org/P35123 and previous config saved to /var/cache/conftool/dbconfig/20220929-060532-root.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2118 to s7 primary and set section read-write T318888', diff saved to https://phabricator.wikimedia.org/P35122 and previous config saved to /var/cache/conftool/dbconfig/20220929-060425-root.json
  • 06:03 marostegui: Starting s7 codfw failover from db2121 to db2118 - T318888
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35121 and previous config saved to /var/cache/conftool/dbconfig/20220929-055716-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2118 from API T318888', diff saved to https://phabricator.wikimedia.org/P35120 and previous config saved to /var/cache/conftool/dbconfig/20220929-054542-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2118 with weight 0 T318888', diff saved to https://phabricator.wikimedia.org/P35119 and previous config saved to /var/cache/conftool/dbconfig/20220929-054509-root.json
  • 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T318888
  • 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T318888
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35118 and previous config saved to /var/cache/conftool/dbconfig/20220929-054211-root.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 from API T318886', diff saved to https://phabricator.wikimedia.org/P35117 and previous config saved to /var/cache/conftool/dbconfig/20220929-053951-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 T318886', diff saved to https://phabricator.wikimedia.org/P35116 and previous config saved to /var/cache/conftool/dbconfig/20220929-053407-root.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2140 to s4 primary and set section read-write T318886', diff saved to https://phabricator.wikimedia.org/P35115 and previous config saved to /var/cache/conftool/dbconfig/20220929-053302-root.json
  • 05:32 marostegui: Starting s4 codfw failover from db2110 to db2140 - T318886
  • 05:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T314041)', diff saved to https://phabricator.wikimedia.org/P35114 and previous config saved to /var/cache/conftool/dbconfig/20220929-052805-ladsgroup.json
  • 05:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 05:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T314041)', diff saved to https://phabricator.wikimedia.org/P35113 and previous config saved to /var/cache/conftool/dbconfig/20220929-052743-ladsgroup.json
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P35112 and previous config saved to /var/cache/conftool/dbconfig/20220929-051237-ladsgroup.json
  • 05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T318886
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2140 with weight 0 T318886', diff saved to https://phabricator.wikimedia.org/P35111 and previous config saved to /var/cache/conftool/dbconfig/20220929-051114-root.json
  • 05:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T318886
  • 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P35110 and previous config saved to /var/cache/conftool/dbconfig/20220929-045730-ladsgroup.json
  • 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T314041)', diff saved to https://phabricator.wikimedia.org/P35109 and previous config saved to /var/cache/conftool/dbconfig/20220929-044224-ladsgroup.json
  • 03:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T314041)', diff saved to https://phabricator.wikimedia.org/P35108 and previous config saved to /var/cache/conftool/dbconfig/20220929-035724-ladsgroup.json
  • 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P35107 and previous config saved to /var/cache/conftool/dbconfig/20220929-035647-ladsgroup.json
  • 03:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P35106 and previous config saved to /var/cache/conftool/dbconfig/20220929-034140-ladsgroup.json
  • 03:40 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 10s)
  • 03:40 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 03:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P35105 and previous config saved to /var/cache/conftool/dbconfig/20220929-032634-ladsgroup.json
  • 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P35104 and previous config saved to /var/cache/conftool/dbconfig/20220929-031127-ladsgroup.json
  • 02:29 ejegg: updated fundraising CiviCRM from f3461a44 to 5e1738a1
  • 02:20 ejegg: updated fundraising python tools from dd494413 to 14d60435
  • 01:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2037.codfw.wmnet with OS buster
  • 00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage
  • 00:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage

2022-09-28

  • 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2037.codfw.wmnet with OS buster
  • 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2037']
  • 23:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2037']
  • 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T314041)', diff saved to https://phabricator.wikimedia.org/P35103 and previous config saved to /var/cache/conftool/dbconfig/20220928-231719-ladsgroup.json
  • 23:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 23:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 22:20 ejegg: updated fundraising CiviCRM from d31c19a0 to f3461a44
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P35102 and previous config saved to /var/cache/conftool/dbconfig/20220928-213701-ladsgroup.json
  • 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P35101 and previous config saved to /var/cache/conftool/dbconfig/20220928-213640-ladsgroup.json
  • 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35100 and previous config saved to /var/cache/conftool/dbconfig/20220928-212131-ladsgroup.json
  • 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35099 and previous config saved to /var/cache/conftool/dbconfig/20220928-210624-ladsgroup.json
  • 21:06 volans: installed spicerack 4.0.0-1+deb11u1 on cumin1001
  • 20:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P35098 and previous config saved to /var/cache/conftool/dbconfig/20220928-205117-ladsgroup.json
  • 20:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12200
  • 20:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12200
  • 20:39 TheresNoTime: closing UTC late backport window
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:24 samtar@deploy1002: Finished scap: Backport for [config]: Deploy GDI survey Wave 3 (T318156) (duration: 06m 19s)
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:18 samtar@deploy1002: samtar and essexigyan: Backport for [config]: Deploy GDI survey Wave 3 (T318156) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:18 samtar@deploy1002: Started scap: Backport for [config]: Deploy GDI survey Wave 3 (T318156)
  • 20:11 samtar@deploy1002: Sync cancelled.
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:08 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:04 samtar@deploy1002: samtar and dani: Backport for Deploy Research Incentive survey on arwiki (T318328) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:04 samtar@deploy1002: Started scap: Backport for Deploy Research Incentive survey on arwiki (T318328)
  • 19:24 ejegg: updated fundraising CiviCRM from 916a8b08 to d31c19a0
  • 19:08 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:22 volans: installed spicerack 4.0.0-1+deb11u1 on cumin2002
  • 18:22 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3f23a1b]: (no justification provided) (duration: 00m 11s)
  • 18:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@3f23a1b]: (no justification provided)
  • 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:10 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.3 refs T314192 (duration: 03m 38s)
  • 18:07 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.3 refs T314192
  • 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:36 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 19653
  • 17:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 19653
  • 17:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32098
  • 17:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32098
  • 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4181
  • 17:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4181
  • 17:23 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 17:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T314041)', diff saved to https://phabricator.wikimedia.org/P35097 and previous config saved to /var/cache/conftool/dbconfig/20220928-171848-ladsgroup.json
  • 17:16 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 17:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 17:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P35096 and previous config saved to /var/cache/conftool/dbconfig/20220928-170342-ladsgroup.json
  • 16:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 10310
  • 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P35095 and previous config saved to /var/cache/conftool/dbconfig/20220928-164835-ladsgroup.json
  • 16:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
  • 16:36 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@f89d689]: (no justification provided) (duration: 00m 12s)
  • 16:36 nokafor@deploy1002: Started deploy [airflow-dags/analytics@f89d689]: (no justification provided)
  • 16:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
  • 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T314041)', diff saved to https://phabricator.wikimedia.org/P35093 and previous config saved to /var/cache/conftool/dbconfig/20220928-163329-ladsgroup.json
  • 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:31 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 10310
  • 16:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
  • 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:26 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4775
  • 16:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4775
  • 16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2635
  • 16:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2635
  • 16:15 volans: uploaded spicerack_4.0.0 to apt.wikimedia.org bullseye-wikimedia
  • 15:57 dancy@deploy1002: Installation of scap version "4.24.0" completed for 561 hosts
  • 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
  • 15:57 dancy@deploy1002: Installing scap version "4.24.0" for 561 hosts
  • 15:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40217
  • 15:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 40217
  • 15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36351
  • 15:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36351
  • 15:51 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0646be1]: (no justification provided) (duration: 00m 10s)
  • 15:51 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0646be1]: (no justification provided)
  • 15:47 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
  • 15:47 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 15:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2036.codfw.wmnet with OS buster
  • 15:26 moritzm: installing libgoogle-gson-java security updates on bullseye
  • 15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4922
  • 15:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4922
  • 15:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 714
  • 15:13 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
  • 15:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 714
  • 15:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 19108
  • 15:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 19108
  • 15:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
  • 15:09 moritzm: installing twisted security updates
  • 15:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8674
  • 15:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8674
  • 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P35092 and previous config saved to /var/cache/conftool/dbconfig/20220928-150230-ladsgroup.json
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T314041)', diff saved to https://phabricator.wikimedia.org/P35091 and previous config saved to /var/cache/conftool/dbconfig/20220928-150158-ladsgroup.json
  • 15:01 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 15:00 SandraEbele: deploying Airflow for hdfsarchiver operator fix
  • 15:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@aa7984f]: (no justification provided) (duration: 00m 14s)
  • 15:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@aa7984f]: (no justification provided)
  • 14:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite1005.eqiad.wmnet with OS bullseye
  • 14:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1003.wikimedia.org
  • 14:53 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 394354
  • 14:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 394354
  • 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 393950
  • 14:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 393950
  • 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262589
  • 14:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 262589
  • 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 209453
  • 14:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 209453
  • 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199524
  • 14:48 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 199524
  • 14:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 65517
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 65517
  • 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62955
  • 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62955
  • 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 57695
  • 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 57695
  • 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 53334
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P35090 and previous config saved to /var/cache/conftool/dbconfig/20220928-144651-ladsgroup.json
  • 14:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 53334
  • 14:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52320
  • 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 52320
  • 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46450
  • 14:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bullseye
  • 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on graphite1005.eqiad.wmnet with reason: host reimage
  • 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46450
  • 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40217
  • 14:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 40217
  • 14:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
  • 14:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
  • 14:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692
  • 14:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36351
  • 14:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36351
  • 14:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35280
  • 14:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on graphite1005.eqiad.wmnet with reason: host reimage
  • 14:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35280
  • 14:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32934
  • 14:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32934
  • 14:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32787
  • 14:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32787
  • 14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32098
  • 14:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32098
  • 14:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 29791
  • 14:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 29791
  • 14:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26744
  • 14:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 26744
  • 14:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25885
  • 14:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25885
  • 14:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22987
  • 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P35089 and previous config saved to /var/cache/conftool/dbconfig/20220928-143145-ladsgroup.json
  • 14:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22987
  • 14:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22773
  • 14:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22773
  • 14:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22616
  • 14:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22616
  • 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21949
  • 14:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 14:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host graphite1005.eqiad.wmnet with OS bullseye
  • 14:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21949
  • 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21928
  • 14:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21928
  • 14:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 20115
  • 14:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20115
  • 14:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19653
  • 14:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19653
  • 14:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19151
  • 14:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19151
  • 14:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19108
  • 14:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 14:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19108
  • 14:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18106
  • 14:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 18106
  • 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16735
  • 14:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16735
  • 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16276
  • 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16276
  • 14:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15695
  • 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15695
  • 14:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
  • 14:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15133
  • 14:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14630
  • 14:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14630
  • 14:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14361
  • 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14361
  • 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13760
  • 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13760
  • 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13489
  • 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13489
  • 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13335
  • 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T314041)', diff saved to https://phabricator.wikimedia.org/P35088 and previous config saved to /var/cache/conftool/dbconfig/20220928-141638-ladsgroup.json
  • 14:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host graphite1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13335
  • 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12200
  • 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12200
  • 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12041
  • 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12041
  • 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11164
  • 14:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11164
  • 14:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11039
  • 14:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11039
  • 14:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10310
  • 14:12 volans: added python3-gjson v0.0.5 to apt.w.o (bullseye only)
  • 14:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 10310
  • 14:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
  • 14:11 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 14:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8966
  • 14:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8781
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35087 and previous config saved to /var/cache/conftool/dbconfig/20220928-141007-root.json
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35086 and previous config saved to /var/cache/conftool/dbconfig/20220928-141001-root.json
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35085 and previous config saved to /var/cache/conftool/dbconfig/20220928-140956-root.json
  • 14:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8781
  • 14:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35084 and previous config saved to /var/cache/conftool/dbconfig/20220928-140950-root.json
  • 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
  • 14:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye
  • 14:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8674
  • 14:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8359
  • 14:08 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudrabbit1003.wikimedia.org
  • 14:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8359
  • 14:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8075
  • 14:08 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-eqiad
  • 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8075
  • 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7843
  • 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7843
  • 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7795
  • 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7795
  • 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7784
  • 14:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7784
  • 14:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7713
  • 14:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7713
  • 14:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7195
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
  • 14:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7195
  • 14:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6762
  • 14:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host graphite1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6762
  • 14:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6614
  • 14:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6614
  • 14:02 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-codfw
  • 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6128
  • 14:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6128
  • 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6079
  • 14:01 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6079
  • 14:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5650
  • 14:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5650
  • 14:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5400
  • 14:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5400
  • 14:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4922
  • 13:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4922
  • 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4826
  • 13:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4826
  • 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4775
  • 13:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4775
  • 13:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4637
  • 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4637
  • 13:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4230
  • 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4230
  • 13:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4181
  • 13:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4181
  • 13:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35083 and previous config saved to /var/cache/conftool/dbconfig/20220928-135502-root.json
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35082 and previous config saved to /var/cache/conftool/dbconfig/20220928-135456-root.json
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35081 and previous config saved to /var/cache/conftool/dbconfig/20220928-135451-root.json
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35080 and previous config saved to /var/cache/conftool/dbconfig/20220928-135445-root.json
  • 13:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3856
  • 13:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3300
  • 13:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:52 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 13:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3300
  • 13:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3292
  • 13:50 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
  • 13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3292
  • 13:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2906
  • 13:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
  • 13:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2906
  • 13:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2647
  • 13:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2647
  • 13:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2635
  • 13:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2635
  • 13:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2603
  • 13:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2603
  • 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1273
  • 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1273
  • 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 812
  • 13:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 812
  • 13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 714
  • 13:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 714
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35079 and previous config saved to /var/cache/conftool/dbconfig/20220928-133957-root.json
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35078 and previous config saved to /var/cache/conftool/dbconfig/20220928-133951-root.json
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35077 and previous config saved to /var/cache/conftool/dbconfig/20220928-133946-root.json
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35076 and previous config saved to /var/cache/conftool/dbconfig/20220928-133940-root.json
  • 13:34 jmm@cumin2002: END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=1) rolling restart_daemons on A:thanos-fe-codfw
  • 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 577
  • 13:32 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-codfw
  • 13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 577
  • 13:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42
  • 13:31 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
  • 13:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 42
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35075 and previous config saved to /var/cache/conftool/dbconfig/20220928-132452-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35074 and previous config saved to /var/cache/conftool/dbconfig/20220928-132446-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35073 and previous config saved to /var/cache/conftool/dbconfig/20220928-132442-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35072 and previous config saved to /var/cache/conftool/dbconfig/20220928-132435-root.json
  • 13:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:17 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:15 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35071 and previous config saved to /var/cache/conftool/dbconfig/20220928-130947-root.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35070 and previous config saved to /var/cache/conftool/dbconfig/20220928-130941-root.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35069 and previous config saved to /var/cache/conftool/dbconfig/20220928-130937-root.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35068 and previous config saved to /var/cache/conftool/dbconfig/20220928-130930-root.json
  • 13:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:01 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35067 and previous config saved to /var/cache/conftool/dbconfig/20220928-125442-root.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35066 and previous config saved to /var/cache/conftool/dbconfig/20220928-125436-root.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35065 and previous config saved to /var/cache/conftool/dbconfig/20220928-125432-root.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35064 and previous config saved to /var/cache/conftool/dbconfig/20220928-125425-root.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35063 and previous config saved to /var/cache/conftool/dbconfig/20220928-123937-root.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35062 and previous config saved to /var/cache/conftool/dbconfig/20220928-123932-root.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35061 and previous config saved to /var/cache/conftool/dbconfig/20220928-123927-root.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35060 and previous config saved to /var/cache/conftool/dbconfig/20220928-123920-root.json
  • 12:34 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35058 and previous config saved to /var/cache/conftool/dbconfig/20220928-122432-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35057 and previous config saved to /var/cache/conftool/dbconfig/20220928-122427-root.json
  • 12:24 gehel: copying wmf-elasticsearh-search-plugins from bullseye to buster (`reprepro -C thirdparty/elastic710 copy buster-wikimedia bullseye-wikimedia wmf-elasticsearch-search-plugins`)
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35056 and previous config saved to /var/cache/conftool/dbconfig/20220928-122422-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35055 and previous config saved to /var/cache/conftool/dbconfig/20220928-122421-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35054 and previous config saved to /var/cache/conftool/dbconfig/20220928-122415-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35053 and previous config saved to /var/cache/conftool/dbconfig/20220928-122414-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35052 and previous config saved to /var/cache/conftool/dbconfig/20220928-122411-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35051 and previous config saved to /var/cache/conftool/dbconfig/20220928-122403-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35050 and previous config saved to /var/cache/conftool/dbconfig/20220928-122356-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35049 and previous config saved to /var/cache/conftool/dbconfig/20220928-122350-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35048 and previous config saved to /var/cache/conftool/dbconfig/20220928-122346-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P35047 and previous config saved to /var/cache/conftool/dbconfig/20220928-122321-root.json
  • 12:22 gehel: above reprepro copy failed, elastic710 component does not exist yet
  • 12:21 XioNoX: re-enable Init7 in knams
  • 12:21 gehel: copying wmf-elasticsearh-search-plugins from bullseye to buster (`reprepro -C elastic710 buster-wikimedia bullseye-wikimedia wmf-elasticsearch-search-plugins`)
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180 db2146 db2122 es2022 for mariadb upgrade T318128', diff saved to https://phabricator.wikimedia.org/P35046 and previous config saved to /var/cache/conftool/dbconfig/20220928-121912-root.json
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 12:09 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35045 and previous config saved to /var/cache/conftool/dbconfig/20220928-120916-root.json
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35044 and previous config saved to /var/cache/conftool/dbconfig/20220928-120909-root.json
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35043 and previous config saved to /var/cache/conftool/dbconfig/20220928-120906-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35042 and previous config saved to /var/cache/conftool/dbconfig/20220928-120858-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35041 and previous config saved to /var/cache/conftool/dbconfig/20220928-120852-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35040 and previous config saved to /var/cache/conftool/dbconfig/20220928-120845-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35039 and previous config saved to /var/cache/conftool/dbconfig/20220928-120841-root.json
  • 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 11:58 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35038 and previous config saved to /var/cache/conftool/dbconfig/20220928-115411-root.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35037 and previous config saved to /var/cache/conftool/dbconfig/20220928-115404-root.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35036 and previous config saved to /var/cache/conftool/dbconfig/20220928-115401-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35035 and previous config saved to /var/cache/conftool/dbconfig/20220928-115354-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35034 and previous config saved to /var/cache/conftool/dbconfig/20220928-115347-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35033 and previous config saved to /var/cache/conftool/dbconfig/20220928-115340-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35032 and previous config saved to /var/cache/conftool/dbconfig/20220928-115336-root.json
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35031 and previous config saved to /var/cache/conftool/dbconfig/20220928-113906-root.json
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35030 and previous config saved to /var/cache/conftool/dbconfig/20220928-113900-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35029 and previous config saved to /var/cache/conftool/dbconfig/20220928-113856-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35028 and previous config saved to /var/cache/conftool/dbconfig/20220928-113849-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35027 and previous config saved to /var/cache/conftool/dbconfig/20220928-113842-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35026 and previous config saved to /var/cache/conftool/dbconfig/20220928-113835-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35025 and previous config saved to /var/cache/conftool/dbconfig/20220928-113831-root.json
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35024 and previous config saved to /var/cache/conftool/dbconfig/20220928-112401-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35023 and previous config saved to /var/cache/conftool/dbconfig/20220928-112355-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35022 and previous config saved to /var/cache/conftool/dbconfig/20220928-112351-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35021 and previous config saved to /var/cache/conftool/dbconfig/20220928-112344-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35020 and previous config saved to /var/cache/conftool/dbconfig/20220928-112337-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35019 and previous config saved to /var/cache/conftool/dbconfig/20220928-112330-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35018 and previous config saved to /var/cache/conftool/dbconfig/20220928-112326-root.json
  • 11:18 moritzm: installing expat security updates
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35017 and previous config saved to /var/cache/conftool/dbconfig/20220928-110856-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35016 and previous config saved to /var/cache/conftool/dbconfig/20220928-110850-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35015 and previous config saved to /var/cache/conftool/dbconfig/20220928-110846-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35014 and previous config saved to /var/cache/conftool/dbconfig/20220928-110839-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35013 and previous config saved to /var/cache/conftool/dbconfig/20220928-110832-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35012 and previous config saved to /var/cache/conftool/dbconfig/20220928-110825-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35011 and previous config saved to /var/cache/conftool/dbconfig/20220928-110821-root.json
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T314041)', diff saved to https://phabricator.wikimedia.org/P35010 and previous config saved to /var/cache/conftool/dbconfig/20220928-105531-ladsgroup.json
  • 10:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 10:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T314041)', diff saved to https://phabricator.wikimedia.org/P35009 and previous config saved to /var/cache/conftool/dbconfig/20220928-105520-ladsgroup.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35008 and previous config saved to /var/cache/conftool/dbconfig/20220928-105351-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35007 and previous config saved to /var/cache/conftool/dbconfig/20220928-105345-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35006 and previous config saved to /var/cache/conftool/dbconfig/20220928-105340-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35005 and previous config saved to /var/cache/conftool/dbconfig/20220928-105332-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35004 and previous config saved to /var/cache/conftool/dbconfig/20220928-105327-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35003 and previous config saved to /var/cache/conftool/dbconfig/20220928-105320-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35002 and previous config saved to /var/cache/conftool/dbconfig/20220928-105315-root.json
  • 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P35001 and previous config saved to /var/cache/conftool/dbconfig/20220928-104014-ladsgroup.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35000 and previous config saved to /var/cache/conftool/dbconfig/20220928-103847-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34999 and previous config saved to /var/cache/conftool/dbconfig/20220928-103840-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34998 and previous config saved to /var/cache/conftool/dbconfig/20220928-103835-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34997 and previous config saved to /var/cache/conftool/dbconfig/20220928-103827-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34996 and previous config saved to /var/cache/conftool/dbconfig/20220928-103822-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34995 and previous config saved to /var/cache/conftool/dbconfig/20220928-103815-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34994 and previous config saved to /var/cache/conftool/dbconfig/20220928-103810-root.json
  • 10:30 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:28 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 db1137 db1168 db1143 db1132 db1127 es1022 for mariadb upgrade T318128', diff saved to https://phabricator.wikimedia.org/P34993 and previous config saved to /var/cache/conftool/dbconfig/20220928-102759-root.json
  • 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P34992 and previous config saved to /var/cache/conftool/dbconfig/20220928-102508-ladsgroup.json
  • 10:19 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:18 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:11 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T314041)', diff saved to https://phabricator.wikimedia.org/P34990 and previous config saved to /var/cache/conftool/dbconfig/20220928-101001-ladsgroup.json
  • 10:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:21 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 59689
  • 09:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 59689
  • 08:49 jbond: disable puppet on cache serveres to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/832268
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T314041)', diff saved to https://phabricator.wikimedia.org/P34989 and previous config saved to /var/cache/conftool/dbconfig/20220928-084557-ladsgroup.json
  • 08:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 08:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T314041)', diff saved to https://phabricator.wikimedia.org/P34988 and previous config saved to /var/cache/conftool/dbconfig/20220928-084535-ladsgroup.json
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:37 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:36 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:35 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34987 and previous config saved to /var/cache/conftool/dbconfig/20220928-083029-ladsgroup.json
  • 08:29 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34985 and previous config saved to /var/cache/conftool/dbconfig/20220928-081522-ladsgroup.json
  • 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T314041)', diff saved to https://phabricator.wikimedia.org/P34984 and previous config saved to /var/cache/conftool/dbconfig/20220928-080015-ladsgroup.json
  • 07:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:30 XioNoX: disable BGP to init7 in knams
  • 07:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:08 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557) (duration: 05m 17s)
  • 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:03 kartik@deploy1002: kartik and kartik: Backport for testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:03 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)
  • 06:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:37 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T314041)', diff saved to https://phabricator.wikimedia.org/P34981 and previous config saved to /var/cache/conftool/dbconfig/20220928-043052-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T314041)', diff saved to https://phabricator.wikimedia.org/P34980 and previous config saved to /var/cache/conftool/dbconfig/20220928-043030-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P34979 and previous config saved to /var/cache/conftool/dbconfig/20220928-041524-ladsgroup.json
  • 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P34978 and previous config saved to /var/cache/conftool/dbconfig/20220928-040017-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T314041)', diff saved to https://phabricator.wikimedia.org/P34977 and previous config saved to /var/cache/conftool/dbconfig/20220928-034511-ladsgroup.json
  • 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T314041)', diff saved to https://phabricator.wikimedia.org/P34976 and previous config saved to /var/cache/conftool/dbconfig/20220928-020746-ladsgroup.json
  • 02:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 02:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T314041)', diff saved to https://phabricator.wikimedia.org/P34975 and previous config saved to /var/cache/conftool/dbconfig/20220928-020724-ladsgroup.json
  • 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34974 and previous config saved to /var/cache/conftool/dbconfig/20220928-015218-ladsgroup.json
  • 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34973 and previous config saved to /var/cache/conftool/dbconfig/20220928-013711-ladsgroup.json
  • 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T314041)', diff saved to https://phabricator.wikimedia.org/P34972 and previous config saved to /var/cache/conftool/dbconfig/20220928-012205-ladsgroup.json
  • 01:18 ejegg: updated fundraising python tools from b65109af to dd494413
  • 00:34 eileen: civicrm upgraded from 118c1d0b to 916a8b08
  • 00:11 eileen: civicrm upgraded from e198fb4c to 118c1d0b

2022-09-27

  • 22:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1002.eqiad.wmnet with OS bullseye
  • 22:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bullseye
  • 22:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
  • 21:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
  • 21:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
  • 21:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
  • 21:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bullseye
  • 21:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bullseye
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T314041)', diff saved to https://phabricator.wikimedia.org/P34971 and previous config saved to /var/cache/conftool/dbconfig/20220927-213028-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T314041)', diff saved to https://phabricator.wikimedia.org/P34970 and previous config saved to /var/cache/conftool/dbconfig/20220927-213006-ladsgroup.json
  • 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34969 and previous config saved to /var/cache/conftool/dbconfig/20220927-211500-ladsgroup.json
  • 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:12 TheresNoTime: closing UTC late backport window
  • 21:10 samtar@deploy1002: Finished scap: Backport for Remove figures from text extracts (T318727) (duration: 04m 53s)
  • 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:06 samtar@deploy1002: samtar and ssastry: Backport for Remove figures from text extracts (T318727) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:06 samtar@deploy1002: Started scap: Backport for Remove figures from text extracts (T318727)
  • 21:06 samtar@deploy1002: Finished scap: Backport for Remove figures from text extracts (T318727) (duration: 06m 58s)
  • 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34968 and previous config saved to /var/cache/conftool/dbconfig/20220927-205953-ladsgroup.json
  • 20:59 TheresNoTime: extending UTC late backport window
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:58 samtar@deploy1002: samtar and ssastry: Backport for Remove figures from text extracts (T318727) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:58 samtar@deploy1002: Started scap: Backport for Remove figures from text extracts (T318727)
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:53 samtar@deploy1002: Finished scap: Backport for romdwikimedia: Enable subpages in NS0 (T318491) (duration: 05m 29s)
  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:48 samtar@deploy1002: samtar and stang: Backport for romdwikimedia: Enable subpages in NS0 (T318491) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:48 samtar@deploy1002: Started scap: Backport for romdwikimedia: Enable subpages in NS0 (T318491)
  • 20:46 samtar@deploy1002: Finished scap: Backport for elastic: rebalance enwiki_content shard counts (T318270) (duration: 05m 14s)
  • 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T314041)', diff saved to https://phabricator.wikimedia.org/P34967 and previous config saved to /var/cache/conftool/dbconfig/20220927-204446-ladsgroup.json
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:41 samtar@deploy1002: samtar and ryankemper: Backport for elastic: rebalance enwiki_content shard counts (T318270) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:41 samtar@deploy1002: Started scap: Backport for elastic: rebalance enwiki_content shard counts (T318270)
  • 20:38 samtar@deploy1002: Finished scap: Backport for Add wmgMFDefaultEditor back in for future use (duration: 06m 02s)
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:33 samtar@deploy1002: samtar and kemayo: Backport for Add wmgMFDefaultEditor back in for future use synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:32 samtar@deploy1002: Started scap: Backport for Add wmgMFDefaultEditor back in for future use
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:24 samtar@deploy1002: Started scap: Backport for Disable MobileFrontend default editor a/b test (T302356)
  • 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:22 samtar@deploy1002: Started scap: Backport for Disable MobileFrontend default editor a/b test (T302356)
  • 20:20 samtar@deploy1002: Finished scap: Backport for Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626) (duration: 04m 58s)
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:15 samtar@deploy1002: samtar and kemayo: Backport for Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
  • 20:15 samtar@deploy1002: Started scap: Backport for Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:10 samtar@deploy1002: Finished scap: Backport for MobileWebUIActions sample rate to 1 on testwiki (T302108) (duration: 05m 46s)
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:04 samtar@deploy1002: samtar and kemayo: Backport for MobileWebUIActions sample rate to 1 on testwiki (T302108) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:04 samtar@deploy1002: Started scap: Backport for MobileWebUIActions sample rate to 1 on testwiki (T302108)
  • 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
  • 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
  • 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T314041)', diff saved to https://phabricator.wikimedia.org/P34966 and previous config saved to /var/cache/conftool/dbconfig/20220927-194908-ladsgroup.json
  • 19:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 19:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
  • 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.3 refs T314192
  • 18:02 brennen: 1.40.0-wmf.3 (T314192) no current blockers, promoting to group0
  • 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1001.eqiad.wmnet
  • 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1002.eqiad.wmnet
  • 17:49 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 17:48 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 17:48 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 17:48 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 17:47 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 17:47 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 17:39 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
  • 17:38 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
  • 17:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
  • 17:29 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
  • 17:28 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
  • 17:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
  • 17:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
  • 17:08 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
  • 14:56 mforns@deploy1002: Finished deploy [airflow-dags/analytics@25dda27]: (no justification provided) (duration: 00m 11s)
  • 14:56 mforns@deploy1002: Started deploy [airflow-dags/analytics@25dda27]: (no justification provided)
  • 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T314041)', diff saved to https://phabricator.wikimedia.org/P34958 and previous config saved to /var/cache/conftool/dbconfig/20220927-143831-ladsgroup.json
  • 14:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash2036.codfw.wmnet with OS buster
  • 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T314041)', diff saved to https://phabricator.wikimedia.org/P34957 and previous config saved to /var/cache/conftool/dbconfig/20220927-143109-ladsgroup.json
  • 14:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 14:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T314041)', diff saved to https://phabricator.wikimedia.org/P34956 and previous config saved to /var/cache/conftool/dbconfig/20220927-143047-ladsgroup.json
  • 14:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
  • 14:25 Lucas_WMDE: END lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # T315552, 710183 rows done
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34955 and previous config saved to /var/cache/conftool/dbconfig/20220927-142324-ladsgroup.json
  • 14:23 mforns@deploy1002: Finished deploy [airflow-dags/analytics@66dfa44]: (no justification provided) (duration: 00m 46s)
  • 14:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@66dfa44]: (no justification provided)
  • 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34954 and previous config saved to /var/cache/conftool/dbconfig/20220927-141541-ladsgroup.json
  • 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:11 Lucas_WMDE: BEGIN lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # T315552
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34953 and previous config saved to /var/cache/conftool/dbconfig/20220927-140817-ladsgroup.json
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:06 taavi@deploy1002: Finished scap: Backport for Track use of Searchbox footer on Wikidata (T306933), Track use of Searchbox footer on Wikidata (T306933) (duration: 06m 59s)
  • 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34952 and previous config saved to /var/cache/conftool/dbconfig/20220927-140034-ladsgroup.json
  • 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:59 taavi@deploy1002: taavi and migr: Backport for Track use of Searchbox footer on Wikidata (T306933), Track use of Searchbox footer on Wikidata (T306933) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:59 taavi@deploy1002: Started scap: Backport for Track use of Searchbox footer on Wikidata (T306933), Track use of Searchbox footer on Wikidata (T306933)
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T314041)', diff saved to https://phabricator.wikimedia.org/P34951 and previous config saved to /var/cache/conftool/dbconfig/20220927-135310-ladsgroup.json
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T314041)', diff saved to https://phabricator.wikimedia.org/P34950 and previous config saved to /var/cache/conftool/dbconfig/20220927-134528-ladsgroup.json
  • 12:42 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:36 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:31 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:28 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:26 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:23 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 12:18 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:15 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:57 jbond: upload new wmf-laptop_0.5.4 package
  • 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
  • 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:57 mvernon@cumin1001: START - Cookbook sre.dns.netbox
  • 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2028-2039].codfw.wmnet
  • 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:52 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 10:38 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 10:38 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 10:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
  • 10:14 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2028-2039].codfw.wmnet
  • 10:11 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 10:11 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 10:10 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
  • 10:06 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
  • 10:03 moritzm: rebalance ganeti/codfw row D after completed Bullseye update T311686
  • 09:14 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 09:13 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 09:12 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T314041)', diff saved to https://phabricator.wikimedia.org/P34942 and previous config saved to /var/cache/conftool/dbconfig/20220927-082023-ladsgroup.json
  • 08:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T314041)', diff saved to https://phabricator.wikimedia.org/P34941 and previous config saved to /var/cache/conftool/dbconfig/20220927-082001-ladsgroup.json
  • 08:15 moritzm: restarting apache/FPM on mw canaries to pick up Expat security updates
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34938 and previous config saved to /var/cache/conftool/dbconfig/20220927-080454-ladsgroup.json
  • 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-eqiad
  • 07:58 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-eqiad
  • 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
  • 07:54 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
  • 07:52 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin1001 - T310745
  • 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34937 and previous config saved to /var/cache/conftool/dbconfig/20220927-074948-ladsgroup.json
  • 07:49 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin2002 - T310745
  • 07:48 moritzm: installing expat security updates on stretch/buster/bullseye
  • 07:39 moritzm: uploaded expat 2.2.0-2+deb9u5+wmf1 to apt.wikimedia.org/stretch-wikimedia
  • 07:36 jayme: published image docker-registry.discovery.wmnet/golang1.18:1.18-1
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 (T314041)', diff saved to https://phabricator.wikimedia.org/P34936 and previous config saved to /var/cache/conftool/dbconfig/20220927-073523-ladsgroup.json
  • 07:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 07:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T314041)', diff saved to https://phabricator.wikimedia.org/P34935 and previous config saved to /var/cache/conftool/dbconfig/20220927-073451-ladsgroup.json
  • 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T314041)', diff saved to https://phabricator.wikimedia.org/P34934 and previous config saved to /var/cache/conftool/dbconfig/20220927-073441-ladsgroup.json
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34933 and previous config saved to /var/cache/conftool/dbconfig/20220927-071938-ladsgroup.json
  • 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34932 and previous config saved to /var/cache/conftool/dbconfig/20220927-070431-ladsgroup.json
  • 06:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 8220
  • 06:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 8220
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T314041)', diff saved to https://phabricator.wikimedia.org/P34930 and previous config saved to /var/cache/conftool/dbconfig/20220927-064925-ladsgroup.json
  • 05:28 marostegui: Install 10.6.10 on db1124, db1125, pc1014, pc2014 T318128
  • 03:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.1 (duration: 02m 03s)
  • 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.3 refs T314192 (duration: 36m 01s)
  • 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.3 refs T314192
  • 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T314041)', diff saved to https://phabricator.wikimedia.org/P34928 and previous config saved to /var/cache/conftool/dbconfig/20220927-020124-ladsgroup.json
  • 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T314041)', diff saved to https://phabricator.wikimedia.org/P34927 and previous config saved to /var/cache/conftool/dbconfig/20220927-020103-ladsgroup.json
  • 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34926 and previous config saved to /var/cache/conftool/dbconfig/20220927-014556-ladsgroup.json
  • 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34925 and previous config saved to /var/cache/conftool/dbconfig/20220927-013050-ladsgroup.json
  • 01:17 eileen: civicrm upgraded from dcef393d to e198fb4c
  • 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T314041)', diff saved to https://phabricator.wikimedia.org/P34924 and previous config saved to /var/cache/conftool/dbconfig/20220927-011543-ladsgroup.json
  • 00:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.wikimedia.org
  • 00:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
  • 00:40 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
  • 00:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
  • 00:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
  • 00:16 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
  • 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
  • 00:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
  • 00:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
  • 00:13 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
  • 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T314041)', diff saved to https://phabricator.wikimedia.org/P34923 and previous config saved to /var/cache/conftool/dbconfig/20220927-000525-ladsgroup.json
  • 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
  • 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P34922 and previous config saved to /var/cache/conftool/dbconfig/20220927-000434-ladsgroup.json

2022-09-26

  • 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
  • 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
  • 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
  • 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
  • 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
  • 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
  • 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
  • 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
  • 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
  • 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
  • 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
  • 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
  • 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
  • 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
  • 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:31 TheresNoTime: closing UTC late backport window
  • 20:18 samtar@deploy1002: Finished scap: Backport for Fix VisualEditor on wikis where RESTBase was never set up (T318325) (duration: 06m 52s)
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 20:11 samtar@deploy1002: samtar and matmarex: Backport for Fix VisualEditor on wikis where RESTBase was never set up (T318325) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:11 samtar@deploy1002: Started scap: Backport for Fix VisualEditor on wikis where RESTBase was never set up (T318325)
  • 20:10 samtar@deploy1002: Finished scap: Backport for wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070) (duration: 06m 13s)
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2036']
  • 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
  • 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2036']
  • 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
  • 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2032']
  • 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
  • 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2032']
  • 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
  • 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2031']
  • 20:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
  • 20:04 samtar@deploy1002: samtar and matmarex: Backport for wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2031']
  • 20:03 samtar@deploy1002: Started scap: Backport for wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)
  • 20:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
  • 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T314041)', diff saved to https://phabricator.wikimedia.org/P34918 and previous config saved to /var/cache/conftool/dbconfig/20220926-195019-ladsgroup.json
  • 19:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 19:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS bullseye
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
  • 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
  • 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS bullseye
  • 18:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2183.codfw.wmnet with OS bullseye
  • 18:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
  • 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
  • 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2183.codfw.wmnet with OS bullseye
  • 17:31 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:30 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:27 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:27 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2184']
  • 17:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2184']
  • 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2183']
  • 17:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2183']
  • 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2037
  • 17:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2037
  • 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2036
  • 17:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2036
  • 17:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P34914 and previous config saved to /var/cache/conftool/dbconfig/20220926-170213-ladsgroup.json
  • 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P34913 and previous config saved to /var/cache/conftool/dbconfig/20220926-170151-ladsgroup.json
  • 17:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2032
  • 16:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2032
  • 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2031
  • 16:55 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2031
  • 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34912 and previous config saved to /var/cache/conftool/dbconfig/20220926-164645-ladsgroup.json
  • 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34911 and previous config saved to /var/cache/conftool/dbconfig/20220926-163138-ladsgroup.json
  • 16:26 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34910 and previous config saved to /var/cache/conftool/dbconfig/20220926-162322-ladsgroup.json
  • 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:16 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P34909 and previous config saved to /var/cache/conftool/dbconfig/20220926-161632-ladsgroup.json
  • 16:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34908 and previous config saved to /var/cache/conftool/dbconfig/20220926-160817-ladsgroup.json
  • 16:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:04 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:03 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34907 and previous config saved to /var/cache/conftool/dbconfig/20220926-155312-ladsgroup.json
  • 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:43 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:40 ladsgroup@deploy1002: Synchronized portals: Migrate wikiversity.org to the modern portals (duration: 03m 36s)
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34906 and previous config saved to /var/cache/conftool/dbconfig/20220926-153807-ladsgroup.json
  • 15:37 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikiversity.org to the modern portals (duration: 03m 49s)
  • 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 13:59 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031] (duration: 00m 09s)
  • 13:59 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031]
  • 13:56 moritzm: installing mako security updates
  • 13:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031] (duration: 00m 10s)
  • 13:46 aqu@deploy1002: Started deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031]
  • 13:45 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:41 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/extension.json: Backport: Set default sortkey for prefixed pages (T315551) (2/2) (duration: 03m 39s)
  • 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: Backport: Set default sortkey for prefixed pages (T315551) (1/2) (duration: 03m 51s)
  • 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable wgCiteResponsiveReferences on etwiki (T318530) (duration: 03m 53s)
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:59 awight@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production (duration: 02m 40s)
  • 12:56 awight@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production
  • 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:51 moritzm: installing bind9 security updates on Bullseye
  • 12:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:51 ladsgroup@deploy1002: Finished scap: Backport for Bump portals to HEAD (T273179) (duration: 06m 05s)
  • 12:45 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Bump portals to HEAD (T273179) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 12:44 ladsgroup@deploy1002: Started scap: Backport for Bump portals to HEAD (T273179)
  • 12:25 moritzm: installing unzip security updates
  • 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:04 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34904 and previous config saved to /var/cache/conftool/dbconfig/20220926-094812-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T314041)', diff saved to https://phabricator.wikimedia.org/P34903 and previous config saved to /var/cache/conftool/dbconfig/20220926-094502-ladsgroup.json
  • 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 09:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
  • 08:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 033ab75: arwiki: Properly grant enrollasmentor to editor (T310905) (duration: 03m 46s)
  • 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:56 btullis: adding 80GB of virtual disk to matomo1002
  • 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0a54867: arwiki: Grant enrollasmentor to editor (T310905) (duration: 03m 40s)
  • 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:07 godog: upgrade grafana to 8.5.13
  • 08:04 godog: add 20G to prometheus/analytics in codfw
  • 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:31 oblivian@deploy1002: Finished scap: Backport for Move 100% of cookie-accepting clients to php 7.4 (T271736) (duration: 05m 31s)
  • 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:26 oblivian@deploy1002: oblivian and oblivian: Backport for Move 100% of cookie-accepting clients to php 7.4 (T271736) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:26 oblivian@deploy1002: Started scap: Backport for Move 100% of cookie-accepting clients to php 7.4 (T271736)
  • 07:23 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: 620bb80: Add ami, bjn, blk, dag, guw, ig, kcg, lmo, pcm, pwn, and shi to InterwikiSortOrders (duration: 03m 40s)
  • 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 81f6662: Add wordmark and tagline for mnwiki (T318478) (duration: 03m 46s)
  • 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: 81f6662: Add wordmark and tagline for mnwiki (T318478; 1/2) (duration: 03m 40s)
  • 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:36 elukey: clean up my old home dir on matomo1002, ran `apt-get clean` + some other clean up steps on matomo1002 to free space on the root partition
  • 06:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d2d2c08: eswiki: Enable structured mentor list (T310905) (duration: 04m 30s)
  • 06:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-09-25

  • 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye
  • 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 17:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bullseye
  • 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
  • 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 16:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
  • 15:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
  • 15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 15:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 15:26 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 02m 44s)
  • 15:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
  • 15:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 11s)
  • 15:20 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
  • 15:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 10s)
  • 15:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
  • 15:13 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye

2022-09-23

  • 19:10 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4c973d6]: (no justification provided) (duration: 00m 12s)
  • 19:10 mforns@deploy1002: Started deploy [airflow-dags/analytics@4c973d6]: (no justification provided)
  • 17:49 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@7620b25]: (no justification provided) (duration: 00m 10s)
  • 17:48 nokafor@deploy1002: Started deploy [airflow-dags/analytics@7620b25]: (no justification provided)
  • 13:39 hashar@deploy1002: Finished scap: Backport for Stop using Elastica::Type and set the target indices (T318356) (duration: 07m 10s)
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:32 hashar@deploy1002: hashar and hashar: Backport for Stop using Elastica::Type and set the target indices (T318356) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:31 hashar@deploy1002: Started scap: Backport for Stop using Elastica::Type and set the target indices (T318356)
  • 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling (duration: 03m 06s)
  • 13:26 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling
  • 13:24 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling (duration: 01m 11s)
  • 13:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling
  • 09:26 jynus: stopping db1117:s3 for maintenance T315713
  • 08:51 Emperor: rebalance ms-eqiad swift rings T294550
  • 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
  • 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
  • 06:10 marostegui: Shutdown db1189 T317662
  • 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance

2022-09-22

  • 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
  • 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
  • 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:23 dancy@deploy1002: backport aborted: (duration: 00m 05s)
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:55 brennen: end of utc late backport & config window
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:54 brennen@deploy1002: Finished scap: Backport for Restrict figure to the size of the media (T305357 T318300), Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300) (duration: 06m 33s)
  • 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
  • 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
  • 20:48 brennen@deploy1002: brennen and arlolra: Backport for Restrict figure to the size of the media (T305357 T318300), Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:47 brennen@deploy1002: Started scap: Backport for Restrict figure to the size of the media (T305357 T318300), Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)
  • 20:36 brennen@deploy1002: backport aborted: (duration: 02m 16s)
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:25 brennen@deploy1002: Finished scap: Backport for Drops JS-side creation of "Source" link (T318266) (duration: 06m 09s)
  • 20:19 brennen@deploy1002: brennen and tpt: Backport for Drops JS-side creation of "Source" link (T318266) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:19 brennen@deploy1002: Started scap: Backport for Drops JS-side creation of "Source" link (T318266)
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:38 jhuneidi@deploy1002: Started scap: testing
  • 18:38 dancy@deploy1002: Started scap: testing
  • 18:37 jhuneidi@deploy1002: Started scap: testing
  • 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
  • 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
  • 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2 refs T314191
  • 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
  • 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
  • 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
  • 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
  • 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:39 dancy@deploy1002: Sync cancelled.
  • 16:39 dancy@deploy1002: dancy and dancy: Backport for InitialiseSettings-labs.php: Added test text (to be reverted) (T317242) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 16:38 dancy@deploy1002: Started scap: Backport for InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: dcf3710: Remove Research Incentive survey on idwiki (T316466) (duration: 03m 50s)
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ff867a4: Add wgMetaNamespace for knwiktionary and knwikiquote (T318318) (duration: 03m 57s)
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
  • 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Content and Section Translation in Bhojpuri Wikipedia (T313296) (duration: 04m 03s)
  • 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-09-21

  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:46 tgr_: UTC late deploys done
  • 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: Block metrics: Bump schema to un-require some fields (T317343) (duration: 03m 42s)
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: Block metrics: Bump schema to un-require some fields (T317343) (duration: 03m 55s)
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:25 samtar@deploy1002: Finished scap: Backport for cirrus: Limit shard count to 1 in deployment-prep (T316711) (duration: 04m 19s)
  • 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for cirrus: Limit shard count to 1 in deployment-prep (T316711) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:20 samtar@deploy1002: Started scap: Backport for cirrus: Limit shard count to 1 in deployment-prep (T316711)
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:17 samtar@deploy1002: Finished scap: Backport for Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625) (duration: 05m 31s)
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:12 samtar@deploy1002: samtar and kemayo: Backport for Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:11 samtar@deploy1002: Started scap: Backport for Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:09 samtar@deploy1002: Finished scap: Backport for Remove deployment-db08 (T318126) (duration: 05m 16s)
  • 20:04 samtar@deploy1002: samtar and zabe: Backport for Remove deployment-db08 (T318126) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:04 samtar@deploy1002: Started scap: Backport for Remove deployment-db08 (T318126)
  • 19:33 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@ce20ecd]: (no justification provided) (duration: 00m 10s)
  • 19:33 nokafor@deploy1002: Started deploy [airflow-dags/analytics@ce20ecd]: (no justification provided)
  • 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b8b2ebd: Growth: Switch pilot wikis to structured mentor list (T310905) (duration: 03m 59s)
  • 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:55 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8] (duration: 00m 08s)
  • 18:55 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8]
  • 18:44 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8] (duration: 05m 40s)
  • 18:38 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8]
  • 14:56 Emperor: set thanos ring replicas to 3.75 T311690
  • 14:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: Pool deployment-db09, depool deployment-db08 (T318126) (Beta-only, exchange one replica for another) [*actually* sync it this time since I forgot to git rebase before the last sync 🤦] (duration: 03m 41s)
  • 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: Pool deployment-db09, depool deployment-db08 (T318126) (Beta-only, exchange one replica for another) (duration: 03m 48s)
  • 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:59 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: Add back deployment-db08 (T318126) (Beta-only, restore old replica) (duration: 03m 48s)
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: Replace deployment-db08 with deployment-db09 (T318126) (Beta-only, replace one replica with another) (duration: 03m 56s)
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add editcontentmodel right for metawiki translation administrators (T311587) (duration: 03m 50s)
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable wgParserEnableLegacyMediaDOM on enwikivoyage (T314318) (turning on new-style media output) (duration: 04m 03s)
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:19 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.2 refs T314191 (duration: 04m 02s)
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.2 refs T314191
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:07 hashar: Restarting Gerrit to clear stalled sockets in Zuul

2022-09-20

  • 20:19 cjming: end of UTC late backport window
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 cjming@deploy1002: Finished scap: Backport for Enable Nearby everywhere (T246493) (duration: 09m 02s)
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262] (duration: 00m 07s)
  • 20:05 mforns@deploy1002: Started deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262]
  • 20:05 cjming@deploy1002: cjming and jdlrobson: Backport for Enable Nearby everywhere (T246493) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:04 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262] (duration: 08m 00s)
  • 20:04 cjming@deploy1002: Started scap: Backport for Enable Nearby everywhere (T246493)
  • 20:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 20:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 20:01 eileen: civicrm upgraded from e82d9cd0 to dcef393d
  • 19:56 mforns@deploy1002: Started deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262]
  • 19:05 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 18:50 jynus: restart db2100:s7 to apply new config
  • 18:48 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 18:47 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 18:47 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 18:47 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 18:46 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 18:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 18:45 cstone: payments-wiki upgraded from de4b2bb9 to 0456850e
  • 18:45 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:36 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.2 refs T314191
  • 18:33 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:33 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:32 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:31 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:31 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:30 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:29 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:28 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 18:28 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:27 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 18:27 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:26 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 18:23 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:22 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 18:22 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:21 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 18:20 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:19 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 16:42 dancy@deploy1002: Sync cancelled.
  • 16:42 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 16:41 dancy@deploy1002: Started scap: testing, disregard
  • 16:09 awight@deploy1002: backport aborted: (duration: 00m 33s)
  • 16:04 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable Tech Wishes survey on dewiki (T316676) (take 2) (duration: 03m 42s)
  • 15:55 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable Tech Wishes survey on dewiki (T316676) (duration: 03m 53s)
  • 14:16 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 14:10 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 14:00 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided) (duration: 00m 15s)
  • 14:00 nokafor@deploy1002: Started deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided)
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34884 and previous config saved to /var/cache/conftool/dbconfig/20220920-135006-ladsgroup.json
  • 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:43 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/GrowthExperiments/extension.json: 1ac09d4: Update HomepageModule schema version (T310320) (duration: 03m 39s)
  • 13:39 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/GrowthExperiments/extension.json: 1a27e05: Update HomepageModule schema version (T310320) (duration: 03m 52s)
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:25 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided) (duration: 00m 11s)
  • 13:25 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided)
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0b55db6: Enable Tech Wishes survey on dewiki (T316676) (duration: 04m 12s)
  • 09:58 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 09:27 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 08:46 awight@deploy1002: Finished deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854" (duration: 02m 27s)
  • 08:43 awight@deploy1002: Started deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854"
  • 08:35 hashar: Restarted CI Jenkins for plugin update
  • 08:33 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 08:33 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 07:18 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: testwiki: Enable Section Translation on haw, la, ps and, xh Wikipedias (T317289) (duration: 03m 46s)
  • 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:10 kart_: Updated cxserver to 2022-09-15-113346-production (T317289, T315209)
  • 07:08 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:07 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 07:05 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 07:03 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 07:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 04:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.28 (duration: 02m 02s)
  • 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.2 refs T314191 (duration: 36m 08s)
  • 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.2 refs T314191
  • 02:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-09-19

  • 22:59 ebernhardson: T317200 start cirrussearch in-place reindex process for eqiad, codfw and cloudelastic
  • 21:21 maryum: Deployed security patch for T302479
  • 21:21 mstyles@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/Translate/src/: (no justification provided) (duration: 03m 40s)
  • 21:15 sbassett: Deployed security patch for T312820
  • 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:59 cjming: end of UTC late backport window
  • 20:59 ebernhardson@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/CirrusSearch/includes/Maintenance/MappingConfigBuilder.php: Backport: Add token_count subfield to outgoing_link (T317546) (duration: 03m 51s)
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:21 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Wikifunctions: Drop two config items moved to docker (duration: 03m 38s)
  • 20:21 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:16 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: ExtensionDistributor: Add REL1_39 (T313925) (duration: 03m 38s)
  • 20:12 cjming@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on cswiki (T314318) (duration: 06m 31s)
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:06 cjming@deploy1002: cjming and arlolra: Backport for Disable wgParserEnableLegacyMediaDOM on cswiki (T314318) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:06 cjming@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)
  • 19:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 19:33 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 19:33 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 19:30 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 17:43 dancy@deploy1002: Installation of scap version "4.21.0" completed for 561 hosts
  • 17:42 dancy@deploy1002: Installing scap version "4.21.0" for 561 hosts
  • 17:36 dancy@deploy1002: Sync cancelled.
  • 17:36 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 17:36 dancy@deploy1002: Started scap: testing, disregard
  • 14:03 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/ukwikivoyage{.png,-1.5x.png,-2x.png} (T317718)
  • 14:02 urbanecm@deploy1002: Synchronized static/images/project-logos/: 6c7151d: Regenerate ukwikivoyage logo (T317718) (duration: 03m 46s)
  • 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cbf161d: GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro (T314518) (duration: 04m 01s)
  • 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4a6c1dd: Remove unnecessary wgNamespaceAliases from bnwiki (T318003) (duration: 04m 16s)
  • 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-09-17

  • 12:17 Emperor: set thanos ring replicas to 3.80 T311690
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T314041)', diff saved to https://phabricator.wikimedia.org/P34879 and previous config saved to /var/cache/conftool/dbconfig/20220917-103903-ladsgroup.json
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34878 and previous config saved to /var/cache/conftool/dbconfig/20220917-102356-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34877 and previous config saved to /var/cache/conftool/dbconfig/20220917-100850-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T314041)', diff saved to https://phabricator.wikimedia.org/P34876 and previous config saved to /var/cache/conftool/dbconfig/20220917-095344-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34875 and previous config saved to /var/cache/conftool/dbconfig/20220917-094856-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34874 and previous config saved to /var/cache/conftool/dbconfig/20220917-093349-ladsgroup.json
  • 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34873 and previous config saved to /var/cache/conftool/dbconfig/20220917-091843-ladsgroup.json
  • 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34872 and previous config saved to /var/cache/conftool/dbconfig/20220917-090336-ladsgroup.json
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T314041)', diff saved to https://phabricator.wikimedia.org/P34871 and previous config saved to /var/cache/conftool/dbconfig/20220917-074806-ladsgroup.json
  • 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34870 and previous config saved to /var/cache/conftool/dbconfig/20220917-073300-ladsgroup.json
  • 07:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34869 and previous config saved to /var/cache/conftool/dbconfig/20220917-071753-ladsgroup.json
  • 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T314041)', diff saved to https://phabricator.wikimedia.org/P34868 and previous config saved to /var/cache/conftool/dbconfig/20220917-070247-ladsgroup.json
  • 05:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T314041)', diff saved to https://phabricator.wikimedia.org/P34867 and previous config saved to /var/cache/conftool/dbconfig/20220917-051719-ladsgroup.json
  • 05:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 05:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T314041)', diff saved to https://phabricator.wikimedia.org/P34866 and previous config saved to /var/cache/conftool/dbconfig/20220917-051527-ladsgroup.json
  • 05:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34865 and previous config saved to /var/cache/conftool/dbconfig/20220917-051203-ladsgroup.json
  • 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance

2022-09-16

  • 21:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T314041)', diff saved to https://phabricator.wikimedia.org/P34864 and previous config saved to /var/cache/conftool/dbconfig/20220916-212905-ladsgroup.json
  • 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34863 and previous config saved to /var/cache/conftool/dbconfig/20220916-211358-ladsgroup.json
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34862 and previous config saved to /var/cache/conftool/dbconfig/20220916-205852-ladsgroup.json
  • 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T314041)', diff saved to https://phabricator.wikimedia.org/P34861 and previous config saved to /var/cache/conftool/dbconfig/20220916-204345-ladsgroup.json
  • 19:16 mutante: cp1081 /usr/local/sbin/update-ocsp-all
  • 17:01 mutante: gitlab-runner*: deployed gerrit:832584 and systemctl restart buildkitd on 6 hosts for T317904
  • 16:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184
  • 16:42 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2184
  • 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2183
  • 16:41 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2183
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T314041)', diff saved to https://phabricator.wikimedia.org/P34860 and previous config saved to /var/cache/conftool/dbconfig/20220916-161409-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T314041)', diff saved to https://phabricator.wikimedia.org/P34859 and previous config saved to /var/cache/conftool/dbconfig/20220916-161346-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34858 and previous config saved to /var/cache/conftool/dbconfig/20220916-155840-ladsgroup.json
  • 15:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:52 dancy@deploy1002: Installation of scap version "4.20.0" completed for 561 hosts
  • 15:51 dancy@deploy1002: Installing scap version "4.20.0" for 561 hosts
  • 15:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:44 dancy@deploy1002: Finished scap: testing (duration: 04m 53s)
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34857 and previous config saved to /var/cache/conftool/dbconfig/20220916-154333-ladsgroup.json
  • 15:39 dancy@deploy1002: Started scap: testing
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T314041)', diff saved to https://phabricator.wikimedia.org/P34856 and previous config saved to /var/cache/conftool/dbconfig/20220916-152827-ladsgroup.json
  • 15:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:01 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 15:01 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 15:01 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:48 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:42 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:39 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:17 godog: add 100G to prometheus/eqiad instance k8s-mlserve
  • 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34855 and previous config saved to /var/cache/conftool/dbconfig/20220916-131902-root.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34854 and previous config saved to /var/cache/conftool/dbconfig/20220916-130357-root.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34853 and previous config saved to /var/cache/conftool/dbconfig/20220916-125841-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34852 and previous config saved to /var/cache/conftool/dbconfig/20220916-124850-root.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34851 and previous config saved to /var/cache/conftool/dbconfig/20220916-124336-root.json
  • 12:43 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34850 and previous config saved to /var/cache/conftool/dbconfig/20220916-123346-root.json
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34849 and previous config saved to /var/cache/conftool/dbconfig/20220916-122831-root.json
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34848 and previous config saved to /var/cache/conftool/dbconfig/20220916-121841-root.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34847 and previous config saved to /var/cache/conftool/dbconfig/20220916-121326-root.json
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34846 and previous config saved to /var/cache/conftool/dbconfig/20220916-120336-root.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34845 and previous config saved to /var/cache/conftool/dbconfig/20220916-115821-root.json
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34844 and previous config saved to /var/cache/conftool/dbconfig/20220916-114935-root.json
  • 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34843 and previous config saved to /var/cache/conftool/dbconfig/20220916-114831-root.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34842 and previous config saved to /var/cache/conftool/dbconfig/20220916-114316-root.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P34841 and previous config saved to /var/cache/conftool/dbconfig/20220916-113543-root.json
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34840 and previous config saved to /var/cache/conftool/dbconfig/20220916-113431-root.json
  • 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34839 and previous config saved to /var/cache/conftool/dbconfig/20220916-113325-root.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P34838 and previous config saved to /var/cache/conftool/dbconfig/20220916-112750-root.json
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34837 and previous config saved to /var/cache/conftool/dbconfig/20220916-111925-root.json
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34836 and previous config saved to /var/cache/conftool/dbconfig/20220916-110420-root.json
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T314041)', diff saved to https://phabricator.wikimedia.org/P34835 and previous config saved to /var/cache/conftool/dbconfig/20220916-105819-ladsgroup.json
  • 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34834 and previous config saved to /var/cache/conftool/dbconfig/20220916-105809-ladsgroup.json
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34832 and previous config saved to /var/cache/conftool/dbconfig/20220916-104916-root.json
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34831 and previous config saved to /var/cache/conftool/dbconfig/20220916-104303-ladsgroup.json
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34830 and previous config saved to /var/cache/conftool/dbconfig/20220916-103411-root.json
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34829 and previous config saved to /var/cache/conftool/dbconfig/20220916-102756-ladsgroup.json
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34828 and previous config saved to /var/cache/conftool/dbconfig/20220916-101905-root.json
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34827 and previous config saved to /var/cache/conftool/dbconfig/20220916-101250-ladsgroup.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34826 and previous config saved to /var/cache/conftool/dbconfig/20220916-100400-root.json
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34825 and previous config saved to /var/cache/conftool/dbconfig/20220916-093635-root.json
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34824 and previous config saved to /var/cache/conftool/dbconfig/20220916-093121-root.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34823 and previous config saved to /var/cache/conftool/dbconfig/20220916-092130-root.json
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34822 and previous config saved to /var/cache/conftool/dbconfig/20220916-091616-root.json
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34821 and previous config saved to /var/cache/conftool/dbconfig/20220916-091234-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34820 and previous config saved to /var/cache/conftool/dbconfig/20220916-090625-root.json
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34819 and previous config saved to /var/cache/conftool/dbconfig/20220916-090111-root.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34818 and previous config saved to /var/cache/conftool/dbconfig/20220916-085120-root.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34817 and previous config saved to /var/cache/conftool/dbconfig/20220916-084607-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34816 and previous config saved to /var/cache/conftool/dbconfig/20220916-083615-root.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34815 and previous config saved to /var/cache/conftool/dbconfig/20220916-083102-root.json
  • 08:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34814 and previous config saved to /var/cache/conftool/dbconfig/20220916-082110-root.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34813 and previous config saved to /var/cache/conftool/dbconfig/20220916-081557-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34812 and previous config saved to /var/cache/conftool/dbconfig/20220916-080605-root.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 3%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34811 and previous config saved to /var/cache/conftool/dbconfig/20220916-080052-root.json
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34810 and previous config saved to /var/cache/conftool/dbconfig/20220916-075100-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34809 and previous config saved to /var/cache/conftool/dbconfig/20220916-074548-root.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34808 and previous config saved to /var/cache/conftool/dbconfig/20220916-074251-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180', diff saved to https://phabricator.wikimedia.org/P34807 and previous config saved to /var/cache/conftool/dbconfig/20220916-072958-root.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34806 and previous config saved to /var/cache/conftool/dbconfig/20220916-072746-root.json
  • 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34805 and previous config saved to /var/cache/conftool/dbconfig/20220916-071241-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34804 and previous config saved to /var/cache/conftool/dbconfig/20220916-065737-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34803 and previous config saved to /var/cache/conftool/dbconfig/20220916-064232-root.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34802 and previous config saved to /var/cache/conftool/dbconfig/20220916-062727-root.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34801 and previous config saved to /var/cache/conftool/dbconfig/20220916-061222-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34800 and previous config saved to /var/cache/conftool/dbconfig/20220916-055717-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34799 and previous config saved to /var/cache/conftool/dbconfig/20220916-055542-root.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34798 and previous config saved to /var/cache/conftool/dbconfig/20220916-055424-root.json
  • 05:51 marostegui: Install 10.6 on db1168 T301879
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34797 and previous config saved to /var/cache/conftool/dbconfig/20220916-055031-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1198', diff saved to https://phabricator.wikimedia.org/P34795 and previous config saved to /var/cache/conftool/dbconfig/20220916-054438-root.json
  • 01:57 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 01:57 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 01:54 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 10s)
  • 01:54 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 00:14 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 17s)
  • 00:14 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)

2022-09-15

  • 23:51 mutante: gerrit1001 - disabled puppet - gerrit:832411
  • 22:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: T316236
  • 22:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: T316236
  • 21:30 ebernhardson: depool wcqs2001 for T316236
  • 20:25 thcipriani@deploy1002: Finished scap: Backport for Increase coverage of Research Incentive Survey on idwiki (T316466) (duration: 07m 06s)
  • 20:18 thcipriani@deploy1002: thcipriani and dani: Backport for Increase coverage of Research Incentive Survey on idwiki (T316466) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:18 thcipriani@deploy1002: Started scap: Backport for Increase coverage of Research Incentive Survey on idwiki (T316466)
  • 20:15 thcipriani@deploy1002: Finished scap: Backport for Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676) (duration: 07m 39s)
  • 20:08 thcipriani@deploy1002: thcipriani and dcausse: Backport for Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:07 thcipriani@deploy1002: Started scap: Backport for Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)
  • 19:26 ebernhardson: pool'd wdqs2001, some blockers before reload can start T316236
  • 18:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.1 refs T314190
  • 18:39 dancy@deploy1002: Finished scap: Backport for Use more permissive match for TOC_PLACEHOLDER in parser output (T317857) (duration: 09m 53s)
  • 18:38 cwhite: restart thanos-compact (thanos-fe2001) and swift_ring_manager (thanos-fe1001)
  • 18:29 dancy@deploy1002: dancy and cscott: Backport for Use more permissive match for TOC_PLACEHOLDER in parser output (T317857) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 18:29 dancy@deploy1002: Started scap: Backport for Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)
  • 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2003.codfw.wmnet on all recursors
  • 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2003.codfw.wmnet on all recursors
  • 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
  • 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
  • 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2001.codfw.wmnet on all recursors
  • 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2001.codfw.wmnet on all recursors
  • 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1003.eqiad.wmnet on all recursors
  • 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1003.eqiad.wmnet on all recursors
  • 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1002.eqiad.wmnet on all recursors
  • 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1002.eqiad.wmnet on all recursors
  • 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1001.eqiad.wmnet on all recursors
  • 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1001.eqiad.wmnet on all recursors
  • 18:15 ebernhardson: depool wcqs2001 for T316236
  • 18:15 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:13 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 18:07 godog: restart envoyproxy on thanos-fe*
  • 18:06 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
  • 18:06 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
  • 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:17 andrew@cumin1001: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging BryanDavis out of all services on: 2047 hosts
  • 16:16 andrew@cumin1001: START - Cookbook sre.idm.logout Logging BryanDavis out of all services on: 2047 hosts
  • 15:39 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 15:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
  • 15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
  • 15:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
  • 15:22 hnowlan: starting cassandra on sessionstore1001-a
  • 15:18 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T314041)', diff saved to https://phabricator.wikimedia.org/P34792 and previous config saved to /var/cache/conftool/dbconfig/20220915-151131-ladsgroup.json
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34791 and previous config saved to /var/cache/conftool/dbconfig/20220915-145625-ladsgroup.json
  • 14:41 moritzm: installing libtirpc security updates
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34790 and previous config saved to /var/cache/conftool/dbconfig/20220915-144118-ladsgroup.json
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T314041)', diff saved to https://phabricator.wikimedia.org/P34789 and previous config saved to /var/cache/conftool/dbconfig/20220915-142612-ladsgroup.json
  • 14:01 sukhe: retarting bird.service on A:dns-auth for zlib update
  • 14:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6b9784a: Enable the Vue version of the mentee overview in pilot wikis (T300532) (duration: 03m 45s)
  • 13:58 aqu@deploy1002: Finished deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d] (duration: 00m 09s)
  • 13:58 aqu@deploy1002: Started deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d]
  • 13:57 sukhe: retarting haproxy.service on A:dns-auth for zlib update
  • 13:57 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d] (duration: 00m 10s)
  • 13:56 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d]
  • 13:51 jayme: updated rsyslog to 8.2208.0-1~bpo11+1 on all kubernetes masters and nodes - T289766
  • 13:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:47 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383] (duration: 06m 01s)
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:41 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383]
  • 13:38 sukhe: restarting bird.service on A:dns-rec for zlib update
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:33 sukhe: restarting pdns-recursor on A:dns-rec for zlib update
  • 13:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.28/extensions/GrowthExperiments/: f592e85: Mentee overview: avoid requiring the non-vue mentee overview script when loading the Vue one (T300532) (duration: 04m 05s)
  • 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
  • 12:46 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
  • 12:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1001.eqiad.wmnet with OS buster
  • 12:17 jayme: fleet wide update of prometheus-rsyslog-exporter to 0.0.0+git20201008-4 - T289766
  • 12:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
  • 12:06 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34787 and previous config saved to /var/cache/conftool/dbconfig/20220915-120013-root.json
  • 11:51 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
  • 11:50 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
  • 11:45 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34786 and previous config saved to /var/cache/conftool/dbconfig/20220915-114508-root.json
  • 11:44 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
  • 11:43 moritzm: restart exim on lists1001 to pick up zlib security updates
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34785 and previous config saved to /var/cache/conftool/dbconfig/20220915-113003-root.json
  • 11:22 jayme: importing prometheus-rsyslog-exporter 0.0.0+git20201008-4 to stretch-wikimedia, buster-wikimedia, bullseye-wikimedia - T289766
  • 11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
  • 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 11:15 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34784 and previous config saved to /var/cache/conftool/dbconfig/20220915-111458-root.json
  • 11:12 hnowlan: sessionstore1001: c-foreach-nt drain
  • 11:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
  • 11:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'pool db2129 into s6 API', diff saved to https://phabricator.wikimedia.org/P34783 and previous config saved to /var/cache/conftool/dbconfig/20220915-110453-root.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34782 and previous config saved to /var/cache/conftool/dbconfig/20220915-105953-root.json
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34781 and previous config saved to /var/cache/conftool/dbconfig/20220915-104448-root.json
  • 10:36 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34780 and previous config saved to /var/cache/conftool/dbconfig/20220915-102943-root.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34779 and previous config saved to /var/cache/conftool/dbconfig/20220915-101438-root.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34778 and previous config saved to /var/cache/conftool/dbconfig/20220915-101425-root.json
  • 10:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2131.codfw.wmnet with reason: reboot
  • 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db2131.codfw.wmnet with reason: reboot
  • 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P34777 and previous config saved to /var/cache/conftool/dbconfig/20220915-100212-root.json
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34775 and previous config saved to /var/cache/conftool/dbconfig/20220915-095920-root.json
  • 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34774 and previous config saved to /var/cache/conftool/dbconfig/20220915-094415-root.json
  • 09:38 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383] (duration: 14m 21s)
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34773 and previous config saved to /var/cache/conftool/dbconfig/20220915-092910-root.json
  • 09:23 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383]
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34772 and previous config saved to /var/cache/conftool/dbconfig/20220915-091405-root.json
  • 09:13 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383] (duration: 00m 08s)
  • 09:13 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383]
  • 09:12 aqu@deploy1002: Finished deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383] (duration: 27m 31s)
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34770 and previous config saved to /var/cache/conftool/dbconfig/20220915-085900-root.json
  • 08:49 apergos: UTC backport training window closed at lsat
  • 08:46 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 08:45 aqu@deploy1002: Started deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383]
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34768 and previous config saved to /var/cache/conftool/dbconfig/20220915-084355-root.json
  • 08:43 aqu: about to deploy analytics/refinery
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34767 and previous config saved to /var/cache/conftool/dbconfig/20220915-084046-root.json
  • 08:34 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34766 and previous config saved to /var/cache/conftool/dbconfig/20220915-082851-root.json
  • 08:26 tsepothoabala@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable action blocks on ptwiki (T317157) (duration: 04m 07s)
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34765 and previous config saved to /var/cache/conftool/dbconfig/20220915-082541-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34764 and previous config saved to /var/cache/conftool/dbconfig/20220915-082112-root.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2129 T317850', diff saved to https://phabricator.wikimedia.org/P34763 and previous config saved to /var/cache/conftool/dbconfig/20220915-081627-root.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2114 to s6 codfw master T317850', diff saved to https://phabricator.wikimedia.org/P34762 and previous config saved to /var/cache/conftool/dbconfig/20220915-081517-marostegui.json
  • 08:14 marostegui: Starting s6 codfw failover from db2129 to db2114 - T317850
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34761 and previous config saved to /var/cache/conftool/dbconfig/20220915-081036-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34760 and previous config saved to /var/cache/conftool/dbconfig/20220915-080607-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2114 from API T317850', diff saved to https://phabricator.wikimedia.org/P34759 and previous config saved to /var/cache/conftool/dbconfig/20220915-080157-root.json
  • 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary codfw s6 T317850
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2114 with weight 0 T317850', diff saved to https://phabricator.wikimedia.org/P34758 and previous config saved to /var/cache/conftool/dbconfig/20220915-080122-root.json
  • 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary codfw s6 T317850
  • 07:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34757 and previous config saved to /var/cache/conftool/dbconfig/20220915-075531-root.json
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34756 and previous config saved to /var/cache/conftool/dbconfig/20220915-075102-root.json
  • 07:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
  • 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
  • 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2133,2160].codfw.wmnet with reason: reboot
  • 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2133,2160].codfw.wmnet with reason: reboot
  • 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2134,2160].codfw.wmnet with reason: reboot
  • 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2134,2160].codfw.wmnet with reason: reboot
  • 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2135,2160].codfw.wmnet with reason: reboot
  • 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2135,2160].codfw.wmnet with reason: reboot
  • 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34755 and previous config saved to /var/cache/conftool/dbconfig/20220915-074026-root.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34754 and previous config saved to /var/cache/conftool/dbconfig/20220915-073557-root.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34753 and previous config saved to /var/cache/conftool/dbconfig/20220915-072520-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34752 and previous config saved to /var/cache/conftool/dbconfig/20220915-072053-root.json
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2151.codfw.wmnet with reason: reboot
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2151.codfw.wmnet with reason: reboot
  • 07:14 moritzm: installing zlib security updates
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:11 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34751 and previous config saved to /var/cache/conftool/dbconfig/20220915-071015-root.json
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 07:06 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34750 and previous config saved to /var/cache/conftool/dbconfig/20220915-070548-root.json
  • 07:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34749 and previous config saved to /var/cache/conftool/dbconfig/20220915-065510-root.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34748 and previous config saved to /var/cache/conftool/dbconfig/20220915-065043-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to db2096 T317842', diff saved to https://phabricator.wikimedia.org/P34747 and previous config saved to /var/cache/conftool/dbconfig/20220915-064750-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2115 T317842', diff saved to https://phabricator.wikimedia.org/P34746 and previous config saved to /var/cache/conftool/dbconfig/20220915-064635-marostegui.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2096 to x1 primary and set section read-write T317842', diff saved to https://phabricator.wikimedia.org/P34745 and previous config saved to /var/cache/conftool/dbconfig/20220915-064525-root.json
  • 06:44 marostegui: Starting x1 codfw failover from db2115 to db2096 - T317842
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T317842
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2096 with weight 0 T317842', diff saved to https://phabricator.wikimedia.org/P34744 and previous config saved to /var/cache/conftool/dbconfig/20220915-064014-root.json
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T317842
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34743 and previous config saved to /var/cache/conftool/dbconfig/20220915-063538-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2105 T317839', diff saved to https://phabricator.wikimedia.org/P34742 and previous config saved to /var/cache/conftool/dbconfig/20220915-061421-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2127 to s3 codfw T317839', diff saved to https://phabricator.wikimedia.org/P34741 and previous config saved to /var/cache/conftool/dbconfig/20220915-061317-marostegui.json
  • 06:12 marostegui: Starting s3 codfw failover from db2105 to db2127 - T317839
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2127 with weight 0 T317839', diff saved to https://phabricator.wikimedia.org/P34740 and previous config saved to /var/cache/conftool/dbconfig/20220915-060307-root.json
  • 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Codfw switchover s3 T317839
  • 06:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Codfw switchover s3 T317839
  • 05:32 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down T317662
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down T317662
  • 05:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down T317662
  • 05:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down T317662

2022-09-14

  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T314041)', diff saved to https://phabricator.wikimedia.org/P34739 and previous config saved to /var/cache/conftool/dbconfig/20220914-220822-ladsgroup.json
  • 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 22:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34738 and previous config saved to /var/cache/conftool/dbconfig/20220914-220744-ladsgroup.json
  • 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34737 and previous config saved to /var/cache/conftool/dbconfig/20220914-215238-ladsgroup.json
  • 21:38 dduvall@deploy1002: Finished deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002 (duration: 01m 48s)
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34736 and previous config saved to /var/cache/conftool/dbconfig/20220914-213732-ladsgroup.json
  • 21:37 dduvall@deploy1002: Started deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002
  • 21:36 dduvall: testing phabricator deployment to phab2002. should have no production impact (not serving traffic, no access to r/w db)
  • 21:35 dduvall@deploy1002: Installation of scap version "4.19.1" completed for 561 hosts
  • 21:35 dduvall@deploy1002: Installing scap version "4.19.1" for 561 hosts
  • 21:34 dduvall: Deploying scap 4.19.1 (https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/832297/1/changelog)
  • 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34735 and previous config saved to /var/cache/conftool/dbconfig/20220914-212225-ladsgroup.json
  • 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:44 dancy@deploy1002: Sync cancelled.
  • 20:44 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:44 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:44 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:40 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:40 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:39 dancy@deploy1002: Started scap: testing
  • 20:38 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.1 refs T314190 (duration: 05m 49s)
  • 20:34 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:34 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:34 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:33 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:32 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.1 refs T314190
  • 20:28 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:24 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:24 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:21 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:19 dancy@deploy1002: deploy-promote aborted: (duration: 08m 52s)
  • 20:19 dancy@deploy1002: sync-file aborted: group1 wikis to 1.40.0-wmf.1 refs T314190 (duration: 01m 24s)
  • 20:18 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:18 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.1 refs T314190
  • 20:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:09 dancy@deploy1002: Sync cancelled.
  • 20:09 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:09 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:06 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:02 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:02 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:02 dancy@deploy1002: Started scap: testing
  • 20:01 TheresNoTime: Nothing to deploy in this UTC late backport window
  • 19:57 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync
  • 19:57 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: sync
  • 19:55 dancy@deploy1002: scap failed: CalledProcessError Command '['helmfile', '-e', 'eqiad', 'apply']' returned non-zero exit status 1. (duration: 07m 12s)
  • 19:55 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:51 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:51 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:49 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:49 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:49 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:48 dancy@deploy1002: Started scap: testing
  • 19:46 dancy@deploy1002: scap failed: CalledProcessError Command '['helmfile', '-e', 'eqiad', 'apply']' returned non-zero exit status 1. (duration: 07m 23s)
  • 19:46 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:39 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:39 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:39 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:38 dancy@deploy1002: Started scap: testing
  • 19:38 dancy@deploy1002: sync-world aborted: testing (duration: 13m 25s)
  • 19:35 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:26 dancy: dancy@deploy1002 touch /var/lib/deploy-mwdebug/pause
  • 19:24 dancy@deploy1002: Started scap: testing
  • 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@48e506e]: drop-snapshots: Remove directory handling (duration: 02m 03s)
  • 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:15 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@48e506e]: drop-snapshots: Remove directory handling
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:59 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.1 refs T314190
  • 18:50 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e358893]: drop-snapshots: tables are partitioned by wiki (duration: 02m 05s)
  • 18:48 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e358893]: drop-snapshots: tables are partitioned by wiki
  • 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:36 dancy@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1 refs T314190 (duration: 04m 41s)
  • 18:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:31 dancy@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.1 refs T314190
  • 16:47 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:18 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:17 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 16:10 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 16:05 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 - volans@cumin1001
  • 16:04 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 - volans@cumin1001
  • 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2024.codfw.wmnet on all recursors
  • 15:58 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2024.codfw.wmnet on all recursors
  • 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2002.codfw.wmnet on all recursors
  • 15:58 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2002.codfw.wmnet on all recursors
  • 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1026.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1026.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1029.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1029.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1028.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1028.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1027.eqiad.wmnet on all recursors
  • 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1027.eqiad.wmnet on all recursors
  • 15:55 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2001.codfw.wmnet on all recursors
  • 15:55 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2001.codfw.wmnet on all recursors
  • 15:50 dduvall@deploy1002: Finished deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002 (duration: 00m 39s)
  • 15:49 dduvall@deploy1002: Started deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002
  • 15:48 dduvall: testing phabricator deployment to phab2002. should have no production impact (not serving traffic, no access to r/w db)
  • 15:24 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:22 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 15:22 cwhite@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:17 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 15:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T314041)', diff saved to https://phabricator.wikimedia.org/P34732 and previous config saved to /var/cache/conftool/dbconfig/20220914-145956-ladsgroup.json
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P34731 and previous config saved to /var/cache/conftool/dbconfig/20220914-144449-ladsgroup.json
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P34730 and previous config saved to /var/cache/conftool/dbconfig/20220914-142941-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T314041)', diff saved to https://phabricator.wikimedia.org/P34729 and previous config saved to /var/cache/conftool/dbconfig/20220914-141434-ladsgroup.json
  • 14:06 ladsgroup@cumin1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 14:05 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
  • 13:59 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
  • 13:48 moritzm: imported zlib 1:1.2.8.dfsg-5+deb9u1+wmf1 to apt.wikimedia.org
  • 13:40 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bnwiktionary --fix # T317745 – dry run result: 6043 links to fix, 6043 were resolvable, 0 were deleted
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Move namespace in the Bengali Wiktionary: উইকিসরাস → পরিশিষ্ট and set wgNamespaceAliases for newly created namespaces (T317745) (duration: 03m 41s)
  • 13:28 topranks: upgrading routinator on rpki2002
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:20 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Content/Section translation on WPs with new MT support from Google (T313296) (duration: 03m 39s)
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Section Translation in Odia Wikipedia (T313300) (duration: 03m 55s)
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:54 jayme: imported rsyslog 8.2208.0-1~bpo11+1 into bullseye-wikimedia component/rsyslog-k8s - T289766
  • 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T314041)', diff saved to https://phabricator.wikimedia.org/P34725 and previous config saved to /var/cache/conftool/dbconfig/20220914-115920-ladsgroup.json
  • 11:49 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr2-eqdfw,cr2-eqdfw IPv6
  • 11:49 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr2-eqdfw,cr2-eqdfw IPv6
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P34723 and previous config saved to /var/cache/conftool/dbconfig/20220914-114413-ladsgroup.json
  • 11:29 topranks: rebooting cr2-eqdfw to complete upgrade
  • 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P34721 and previous config saved to /var/cache/conftool/dbconfig/20220914-112907-ladsgroup.json
  • 11:14 topranks: Shutting down internet transit and peering on cr2-eqdfw in advance of upgrade reboot
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T314041)', diff saved to https://phabricator.wikimedia.org/P34719 and previous config saved to /var/cache/conftool/dbconfig/20220914-111400-ladsgroup.json
  • 11:02 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: router upgrade
  • 11:02 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: router upgrade
  • 11:01 topranks: Prepping to upgrade JunOS on cr2-eqdfw. Adjusting OSPF costs to force traffic via alternate POPs.
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34717 and previous config saved to /var/cache/conftool/dbconfig/20220914-103810-root.json
  • 10:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:24 kharlan@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: BlockMetrics: Update to new event schema version (T306018) (duration: 03m 48s)
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34715 and previous config saved to /var/cache/conftool/dbconfig/20220914-102305-root.json
  • 10:18 moritzm: import routinator 0.11.3-1bullseye to thirdparty/routinator
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34714 and previous config saved to /var/cache/conftool/dbconfig/20220914-100800-root.json
  • 10:00 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 09:59 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 09:58 ladsgroup@cumin1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 09:57 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 09:57 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 09:53 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 09:53 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34713 and previous config saved to /var/cache/conftool/dbconfig/20220914-095255-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34712 and previous config saved to /var/cache/conftool/dbconfig/20220914-093750-root.json
  • 09:27 moritzm: installing zlib/libxslt security updates on buster
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T314041)', diff saved to https://phabricator.wikimedia.org/P34711 and previous config saved to /var/cache/conftool/dbconfig/20220914-092620-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T314041)', diff saved to https://phabricator.wikimedia.org/P34710 and previous config saved to /var/cache/conftool/dbconfig/20220914-092558-ladsgroup.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34709 and previous config saved to /var/cache/conftool/dbconfig/20220914-092245-root.json
  • 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P34708 and previous config saved to /var/cache/conftool/dbconfig/20220914-091052-ladsgroup.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34707 and previous config saved to /var/cache/conftool/dbconfig/20220914-090740-root.json
  • 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 09:05 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P34706 and previous config saved to /var/cache/conftool/dbconfig/20220914-085545-ladsgroup.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34705 and previous config saved to /var/cache/conftool/dbconfig/20220914-085235-root.json
  • 08:50 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-test
  • 08:49 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-test
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T314041)', diff saved to https://phabricator.wikimedia.org/P34704 and previous config saved to /var/cache/conftool/dbconfig/20220914-084039-ladsgroup.json
  • 08:38 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart on A:wdqs-test
  • 08:38 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart on A:wdqs-test
  • 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maint needed
  • 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maint needed
  • 08:32 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old templatelinks columns of enwiki (T312865) (duration: 06m 51s)
  • 08:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Stop writing to the old templatelinks columns of enwiki (T312865) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:25 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old templatelinks columns of enwiki (T312865)
  • 08:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1024.eqiad.wmnet with reason: down
  • 08:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on es1024.eqiad.wmnet with reason: down
  • 08:02 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Enable writes on es5 T317739 (duration: 03m 38s)
  • 07:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1024 T317739', diff saved to https://phabricator.wikimedia.org/P34703 and previous config saved to /var/cache/conftool/dbconfig/20220914-075722-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary T317739', diff saved to https://phabricator.wikimedia.org/P34702 and previous config saved to /var/cache/conftool/dbconfig/20220914-075550-marostegui.json
  • 07:55 marostegui: Starting es5 eqiad failover from es1024 to es1023 T317739
  • 07:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:50 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Disable writes on es5 T317739 (duration: 04m 13s)
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1023 with weight 0 T317739', diff saved to https://phabricator.wikimedia.org/P34701 and previous config saved to /var/cache/conftool/dbconfig/20220914-074617-marostegui.json
  • 07:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T317739
  • 07:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 T317739
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34700 and previous config saved to /var/cache/conftool/dbconfig/20220914-074248-root.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34699 and previous config saved to /var/cache/conftool/dbconfig/20220914-072743-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34698 and previous config saved to /var/cache/conftool/dbconfig/20220914-071238-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34697 and previous config saved to /var/cache/conftool/dbconfig/20220914-065733-root.json
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T314041)', diff saved to https://phabricator.wikimedia.org/P34696 and previous config saved to /var/cache/conftool/dbconfig/20220914-064330-ladsgroup.json
  • 06:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 06:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T314041)', diff saved to https://phabricator.wikimedia.org/P34695 and previous config saved to /var/cache/conftool/dbconfig/20220914-064309-ladsgroup.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34694 and previous config saved to /var/cache/conftool/dbconfig/20220914-064228-root.json
  • 06:38 elukey: restart kafka on kafka-logging2003 to pick up the new PKI TLS settings
  • 06:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging2003.codfw.wmnet with reason: Kafka PKI upgrade
  • 06:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging2003.codfw.wmnet with reason: Kafka PKI upgrade
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P34693 and previous config saved to /var/cache/conftool/dbconfig/20220914-062802-ladsgroup.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34692 and previous config saved to /var/cache/conftool/dbconfig/20220914-062723-root.json
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P34691 and previous config saved to /var/cache/conftool/dbconfig/20220914-061256-ladsgroup.json
  • 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: down
  • 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: down
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123 T317735', diff saved to https://phabricator.wikimedia.org/P34690 and previous config saved to /var/cache/conftool/dbconfig/20220914-060913-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2113 to s5 codfw primary T317735', diff saved to https://phabricator.wikimedia.org/P34689 and previous config saved to /var/cache/conftool/dbconfig/20220914-060807-marostegui.json
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T314041)', diff saved to https://phabricator.wikimedia.org/P34688 and previous config saved to /var/cache/conftool/dbconfig/20220914-055749-ladsgroup.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2113 with weight 0 T317735', diff saved to https://phabricator.wikimedia.org/P34687 and previous config saved to /var/cache/conftool/dbconfig/20220914-055156-marostegui.json
  • 05:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T317735
  • 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T317735
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34686 and previous config saved to /var/cache/conftool/dbconfig/20220914-052510-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 05:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 05:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T314041)', diff saved to https://phabricator.wikimedia.org/P34685 and previous config saved to /var/cache/conftool/dbconfig/20220914-052448-ladsgroup.json
  • 05:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P34684 and previous config saved to /var/cache/conftool/dbconfig/20220914-050942-ladsgroup.json
  • 04:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P34683 and previous config saved to /var/cache/conftool/dbconfig/20220914-045435-ladsgroup.json
  • 04:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T314041)', diff saved to https://phabricator.wikimedia.org/P34682 and previous config saved to /var/cache/conftool/dbconfig/20220914-043929-ladsgroup.json
  • 03:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T314041)', diff saved to https://phabricator.wikimedia.org/P34681 and previous config saved to /var/cache/conftool/dbconfig/20220914-035624-ladsgroup.json
  • 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 03:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34680 and previous config saved to /var/cache/conftool/dbconfig/20220914-035546-ladsgroup.json
  • 03:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P34679 and previous config saved to /var/cache/conftool/dbconfig/20220914-034040-ladsgroup.json
  • 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T314041)', diff saved to https://phabricator.wikimedia.org/P34678 and previous config saved to /var/cache/conftool/dbconfig/20220914-033921-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P34677 and previous config saved to /var/cache/conftool/dbconfig/20220914-032533-ladsgroup.json
  • 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P34676 and previous config saved to /var/cache/conftool/dbconfig/20220914-032415-ladsgroup.json
  • 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34675 and previous config saved to /var/cache/conftool/dbconfig/20220914-031027-ladsgroup.json
  • 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P34674 and previous config saved to /var/cache/conftool/dbconfig/20220914-030908-ladsgroup.json
  • 02:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T314041)', diff saved to https://phabricator.wikimedia.org/P34673 and previous config saved to /var/cache/conftool/dbconfig/20220914-025402-ladsgroup.json
  • 01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T314041)', diff saved to https://phabricator.wikimedia.org/P34672 and previous config saved to /var/cache/conftool/dbconfig/20220914-013204-ladsgroup.json
  • 01:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 01:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34671 and previous config saved to /var/cache/conftool/dbconfig/20220914-013143-ladsgroup.json
  • 01:24 eileen: civicrm upgraded from d91b4a2c to e82d9cd0
  • 01:18 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - T317686
  • 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34670 and previous config saved to /var/cache/conftool/dbconfig/20220914-011637-ladsgroup.json
  • 01:14 ejegg: disabled delete_deleted_contacts job (will take effect when current job ends)
  • 01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34669 and previous config saved to /var/cache/conftool/dbconfig/20220914-010130-ladsgroup.json
  • 00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34668 and previous config saved to /var/cache/conftool/dbconfig/20220914-004624-ladsgroup.json

2022-09-13

  • 23:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T314041)', diff saved to https://phabricator.wikimedia.org/P34667 and previous config saved to /var/cache/conftool/dbconfig/20220913-234607-ladsgroup.json
  • 23:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 23:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T314041)', diff saved to https://phabricator.wikimedia.org/P34666 and previous config saved to /var/cache/conftool/dbconfig/20220913-234546-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P34665 and previous config saved to /var/cache/conftool/dbconfig/20220913-233039-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P34664 and previous config saved to /var/cache/conftool/dbconfig/20220913-231533-ladsgroup.json
  • 23:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 23:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T314041)', diff saved to https://phabricator.wikimedia.org/P34663 and previous config saved to /var/cache/conftool/dbconfig/20220913-231257-ladsgroup.json
  • 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T314041)', diff saved to https://phabricator.wikimedia.org/P34662 and previous config saved to /var/cache/conftool/dbconfig/20220913-230317-ladsgroup.json
  • 23:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 23:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 23:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34661 and previous config saved to /var/cache/conftool/dbconfig/20220913-230255-ladsgroup.json
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T314041)', diff saved to https://phabricator.wikimedia.org/P34660 and previous config saved to /var/cache/conftool/dbconfig/20220913-230026-ladsgroup.json
  • 22:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P34659 and previous config saved to /var/cache/conftool/dbconfig/20220913-225750-ladsgroup.json
  • 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P34658 and previous config saved to /var/cache/conftool/dbconfig/20220913-224749-ladsgroup.json
  • 22:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P34657 and previous config saved to /var/cache/conftool/dbconfig/20220913-224244-ladsgroup.json
  • 22:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P34656 and previous config saved to /var/cache/conftool/dbconfig/20220913-223241-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T314041)', diff saved to https://phabricator.wikimedia.org/P34655 and previous config saved to /var/cache/conftool/dbconfig/20220913-223025-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 22:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 22:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T314041)', diff saved to https://phabricator.wikimedia.org/P34654 and previous config saved to /var/cache/conftool/dbconfig/20220913-222738-ladsgroup.json
  • 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34653 and previous config saved to /var/cache/conftool/dbconfig/20220913-221734-ladsgroup.json
  • 22:16 dancy: dancy@deploy1002$ rm /var/lib/deploy-mwdebug/pause
  • 22:15 dancy@deploy1002: Sync cancelled.
  • 22:15 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 22:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:14 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:14 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:13 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:12 dancy@deploy1002: Started scap: testing
  • 22:12 dancy@deploy1002: Sync cancelled.
  • 22:11 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 22:11 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:11 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:10 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:08 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:07 dancy@deploy1002: Started scap: testing
  • 22:07 dancy@deploy1002: Sync cancelled.
  • 22:07 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:06 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:06 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:05 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:03 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:02 dancy@deploy1002: Started scap: testing
  • 22:01 dancy@deploy1002: Sync cancelled.
  • 22:01 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 22:01 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:58 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:58 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:55 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:55 dancy@deploy1002: Started scap: testing
  • 21:55 dancy@deploy1002: Sync cancelled.
  • 21:54 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 21:54 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:50 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:50 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:48 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:47 dancy@deploy1002: Started scap: testing
  • 21:37 dancy@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/bin/make -C /srv/mwbuilder/release/make-container-image -f Makefile build-and-push-all-images http_proxy=http://webproxy.eqiad.wmnet:8080 https_proxy=http://webproxy.eqiad.wmnet:8080 GIT_BASE=https://gerrit.wikimedia.org/r/ MW_CONFIG_BRANCH=master workdir_volume=/srv/mediawiki-staging mv_image_name=docker-registry.discovery.wmnet/restric
  • 21:36 dancy@deploy1002: Started scap: testing
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:16 dancy: dancy@deploy1002 touch /var/lib/deploy-mwdebug/pause
  • 21:16 dancy@deploy1002: Sync cancelled.
  • 21:15 dancy@deploy1002: dancy: testing T299648 synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:14 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:14 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:10 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:04 dancy@deploy1002: Started scap: testing T299648
  • 20:25 cjming: end of UTC late backport window
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:14 cjming@deploy1002: Finished scap: Backport for add tagline and update wordmark in ptwikinews (T313174) (duration: 05m 50s)
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:09 cjming@deploy1002: cjming and aishik: Backport for add tagline and update wordmark in ptwikinews (T313174) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:09 cjming@deploy1002: Started scap: Backport for add tagline and update wordmark in ptwikinews (T313174)
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T314041)', diff saved to https://phabricator.wikimedia.org/P34652 and previous config saved to /var/cache/conftool/dbconfig/20220913-200344-ladsgroup.json
  • 20:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 20:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T314041)', diff saved to https://phabricator.wikimedia.org/P34651 and previous config saved to /var/cache/conftool/dbconfig/20220913-200214-ladsgroup.json
  • 20:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T314041)', diff saved to https://phabricator.wikimedia.org/P34650 and previous config saved to /var/cache/conftool/dbconfig/20220913-200152-ladsgroup.json
  • 19:55 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - T317686
  • 19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P34649 and previous config saved to /var/cache/conftool/dbconfig/20220913-194645-ladsgroup.json
  • 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P34648 and previous config saved to /var/cache/conftool/dbconfig/20220913-193139-ladsgroup.json
  • 19:19 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - T317686
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T314041)', diff saved to https://phabricator.wikimedia.org/P34647 and previous config saved to /var/cache/conftool/dbconfig/20220913-191632-ladsgroup.json
  • 19:01 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - T317686
  • 18:47 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - T317686
  • 18:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - T317686
  • 18:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34646 and previous config saved to /var/cache/conftool/dbconfig/20220913-183259-ladsgroup.json
  • 18:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 18:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34645 and previous config saved to /var/cache/conftool/dbconfig/20220913-183238-ladsgroup.json
  • 18:31 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: InitialiseSettings-labs.php: Set $wgPhonosPath (T317417) (duration: 03m 45s)
  • 18:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:28 TheresNoTime: deploying a beta cluster only config change, T317417
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P34644 and previous config saved to /var/cache/conftool/dbconfig/20220913-181731-ladsgroup.json
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P34643 and previous config saved to /var/cache/conftool/dbconfig/20220913-180225-ladsgroup.json
  • 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34642 and previous config saved to /var/cache/conftool/dbconfig/20220913-174718-ladsgroup.json
  • 17:43 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Upgrade wmf-netbox plugin - volans@cumin1001
  • 17:41 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Upgrade wmf-netbox plugin - volans@cumin1001
  • 17:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34640 and previous config saved to /var/cache/conftool/dbconfig/20220913-173721-ladsgroup.json
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T312863)', diff saved to https://phabricator.wikimedia.org/P34639 and previous config saved to /var/cache/conftool/dbconfig/20220913-173254-ladsgroup.json
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P34638 and previous config saved to /var/cache/conftool/dbconfig/20220913-172215-ladsgroup.json
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34637 and previous config saved to /var/cache/conftool/dbconfig/20220913-171747-ladsgroup.json
  • 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P34636 and previous config saved to /var/cache/conftool/dbconfig/20220913-170708-ladsgroup.json
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34635 and previous config saved to /var/cache/conftool/dbconfig/20220913-170241-ladsgroup.json
  • 16:56 ejegg: updated fundraising CiviCRM from efbbcb57 to d91b4a2c
  • 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34634 and previous config saved to /var/cache/conftool/dbconfig/20220913-165202-ladsgroup.json
  • 16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T314041)', diff saved to https://phabricator.wikimedia.org/P34633 and previous config saved to /var/cache/conftool/dbconfig/20220913-165117-ladsgroup.json
  • 16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T314041)', diff saved to https://phabricator.wikimedia.org/P34632 and previous config saved to /var/cache/conftool/dbconfig/20220913-165056-ladsgroup.json
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T312863)', diff saved to https://phabricator.wikimedia.org/P34631 and previous config saved to /var/cache/conftool/dbconfig/20220913-164734-ladsgroup.json
  • 16:37 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:37 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:36 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:36 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:36 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:36 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P34630 and previous config saved to /var/cache/conftool/dbconfig/20220913-163549-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P34629 and previous config saved to /var/cache/conftool/dbconfig/20220913-162043-ladsgroup.json
  • 16:13 godog: add 200G to prometheus/eqiad instance ops
  • 16:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 16:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 16:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet with reason: Maintenance
  • 16:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet with reason: Maintenance
  • 16:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Maintenance
  • 16:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T314041)', diff saved to https://phabricator.wikimedia.org/P34628 and previous config saved to /var/cache/conftool/dbconfig/20220913-160536-ladsgroup.json
  • 15:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: down
  • 15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: down
  • 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34626 and previous config saved to /var/cache/conftool/dbconfig/20220913-154810-root.json
  • 15:42 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@031604d]: Automatically drop hitsorical partitions of subgraph analysis (duration: 02m 07s)
  • 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P34625 and previous config saved to /var/cache/conftool/dbconfig/20220913-154151-ladsgroup.json
  • 15:40 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@031604d]: Automatically drop hitsorical partitions of subgraph analysis
  • 15:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P34624 and previous config saved to /var/cache/conftool/dbconfig/20220913-152644-ladsgroup.json
  • 15:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:14 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.28 refs T314190 (duration: 04m 31s)
  • 15:13 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.0 - volans@cumin1001
  • 15:12 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.0 - volans@cumin1001
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P34623 and previous config saved to /var/cache/conftool/dbconfig/20220913-151138-ladsgroup.json
  • 15:10 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.28 refs T314190
  • 15:08 dancy@deploy1002: deploy-promote aborted: (duration: 00m 02s)
  • 14:59 dancy@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1 refs T314190 (duration: 04m 43s)
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P34622 and previous config saved to /var/cache/conftool/dbconfig/20220913-145631-ladsgroup.json
  • 14:54 dancy@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.1 refs T314190
  • 14:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:47 dancy@deploy1002: deploy-promote aborted: (duration: 01m 03s)
  • 14:47 dancy@deploy1002: prep aborted: (duration: 00m 12s)
  • 14:46 moritzm: restarting FPM/Apache on mediawiki canaries
  • 14:44 moritzm: installing libxslt security updates on buster
  • 14:18 topranks: Core router upgrade in codfw complete - maintenance closed.
  • 14:12 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt
  • 14:12 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt
  • 14:07 topranks: re-activating Transit on IX BGP on cr2-codfw
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34621 and previous config saved to /var/cache/conftool/dbconfig/20220913-135729-ladsgroup.json
  • 13:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T314041)', diff saved to https://phabricator.wikimedia.org/P34620 and previous config saved to /var/cache/conftool/dbconfig/20220913-135707-ladsgroup.json
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P34619 and previous config saved to /var/cache/conftool/dbconfig/20220913-134201-ladsgroup.json
  • 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T314041)', diff saved to https://phabricator.wikimedia.org/P34618 and previous config saved to /var/cache/conftool/dbconfig/20220913-133339-ladsgroup.json
  • 13:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T314041)', diff saved to https://phabricator.wikimedia.org/P34617 and previous config saved to /var/cache/conftool/dbconfig/20220913-133317-ladsgroup.json
  • 13:33 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P34616 and previous config saved to /var/cache/conftool/dbconfig/20220913-132654-ladsgroup.json
  • 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: testwiki: Add mediawiki.edit_attempt stream (T309013) (2/2) (duration: 03m 33s)
  • 13:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: testwiki: Add mediawiki.edit_attempt stream (T309013) (1/2) (duration: 03m 39s)
  • 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:19 Emperor: set thanos ring replicas to 3.85 T311690
  • 13:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P34615 and previous config saved to /var/cache/conftool/dbconfig/20220913-131811-ladsgroup.json
  • 13:14 topranks: Flipping back to RE0 on cr2-codfw (last disruptive switch)
  • 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:13 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wgWMESearchRelevancePages (unused) (duration: 03m 53s)
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T314041)', diff saved to https://phabricator.wikimedia.org/P34614 and previous config saved to /var/cache/conftool/dbconfig/20220913-131148-ladsgroup.json
  • 13:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
  • 13:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
  • 13:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 13:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 13:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P34613 and previous config saved to /var/cache/conftool/dbconfig/20220913-130304-ladsgroup.json
  • 12:59 topranks: Switching active RE back to RE1 on cr1-codfw as firmware hadn't been loaded while it was master
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T314041)', diff saved to https://phabricator.wikimedia.org/P34612 and previous config saved to /var/cache/conftool/dbconfig/20220913-125745-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T314041)', diff saved to https://phabricator.wikimedia.org/P34611 and previous config saved to /var/cache/conftool/dbconfig/20220913-125723-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T314041)', diff saved to https://phabricator.wikimedia.org/P34610 and previous config saved to /var/cache/conftool/dbconfig/20220913-124758-ladsgroup.json
  • 12:46 topranks: forcing non-graceful RE switchover on cr2-codfw as part of upgrade
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P34609 and previous config saved to /var/cache/conftool/dbconfig/20220913-124217-ladsgroup.json
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P34608 and previous config saved to /var/cache/conftool/dbconfig/20220913-122710-ladsgroup.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34607 and previous config saved to /var/cache/conftool/dbconfig/20220913-122415-root.json
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T314041)', diff saved to https://phabricator.wikimedia.org/P34606 and previous config saved to /var/cache/conftool/dbconfig/20220913-121204-ladsgroup.json
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34605 and previous config saved to /var/cache/conftool/dbconfig/20220913-120910-root.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34604 and previous config saved to /var/cache/conftool/dbconfig/20220913-120653-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T314041)', diff saved to https://phabricator.wikimedia.org/P34603 and previous config saved to /var/cache/conftool/dbconfig/20220913-120632-ladsgroup.json
  • 11:58 topranks: Disabling transit and ixp BGP on cr2-codfw in advance of software upgrade
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34602 and previous config saved to /var/cache/conftool/dbconfig/20220913-115405-root.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P34601 and previous config saved to /var/cache/conftool/dbconfig/20220913-115125-ladsgroup.json
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34600 and previous config saved to /var/cache/conftool/dbconfig/20220913-113900-root.json
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P34599 and previous config saved to /var/cache/conftool/dbconfig/20220913-113619-ladsgroup.json
  • 11:34 hashar: Upgrading CI Jenkins T317418
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P34598 and previous config saved to /var/cache/conftool/dbconfig/20220913-112818-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 11:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34597 and previous config saved to /var/cache/conftool/dbconfig/20220913-112355-root.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T314041)', diff saved to https://phabricator.wikimedia.org/P34596 and previous config saved to /var/cache/conftool/dbconfig/20220913-112112-ladsgroup.json
  • 11:21 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: router upgrade
  • 11:20 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: router upgrade
  • 11:15 topranks: completed cr1-codfw upgrade, will proceed to cr2-codfw shortly
  • 11:14 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt
  • 11:14 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt
  • 11:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:09 ladsgroup@deploy1002: Synchronized php-1.40.0-wmf.1/includes/libs/rdbms/ChronologyProtector.php: Backport: rdbms: Bump ChronologyProtector cache key version (T317606) (duration: 03m 49s)
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34595 and previous config saved to /var/cache/conftool/dbconfig/20220913-110850-root.json
  • 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T312863)', diff saved to https://phabricator.wikimedia.org/P34594 and previous config saved to /var/cache/conftool/dbconfig/20220913-110755-ladsgroup.json
  • 11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34593 and previous config saved to /var/cache/conftool/dbconfig/20220913-110715-root.json
  • 11:03 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 T317627', diff saved to https://phabricator.wikimedia.org/P34592 and previous config saved to /var/cache/conftool/dbconfig/20220913-105733-root.json
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2107 to s2 codfw primary T317627', diff saved to https://phabricator.wikimedia.org/P34591 and previous config saved to /var/cache/conftool/dbconfig/20220913-105642-marostegui.json
  • 10:56 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 10:56 marostegui: Starting s2 codfw failover from db2104 to db2107 - T317627
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34590 and previous config saved to /var/cache/conftool/dbconfig/20220913-105210-root.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34589 and previous config saved to /var/cache/conftool/dbconfig/20220913-103705-root.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2107 from api T317627', diff saved to https://phabricator.wikimedia.org/P34588 and previous config saved to /var/cache/conftool/dbconfig/20220913-103658-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2107 with weight 0 T317627', diff saved to https://phabricator.wikimedia.org/P34587 and previous config saved to /var/cache/conftool/dbconfig/20220913-103621-marostegui.json
  • 10:35 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 10:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T317627
  • 10:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T317627
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T314041)', diff saved to https://phabricator.wikimedia.org/P34586 and previous config saved to /var/cache/conftool/dbconfig/20220913-102232-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34585 and previous config saved to /var/cache/conftool/dbconfig/20220913-102147-root.json
  • 10:16 topranks: Flipping master RE on cr1-codfw to backup as part of upgrade
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34584 and previous config saved to /var/cache/conftool/dbconfig/20220913-100642-root.json
  • 10:04 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
  • 09:52 elukey: restart kafka on kafka-logging2002 to move it to PKI-based TLS certs
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34583 and previous config saved to /var/cache/conftool/dbconfig/20220913-095137-root.json
  • 09:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging2002.codfw.wmnet with reason: Kafka PKI upgrade
  • 09:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging2002.codfw.wmnet with reason: Kafka PKI upgrade
  • 09:45 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
  • 09:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
  • 09:41 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 09:37 hashar: Restarting CI Jenkins on contint2001 (with new systemd service)
  • 09:33 hashar: Enabling Puppet on contint2001 for Jenkins systemd change
  • 09:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
  • 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T314041)', diff saved to https://phabricator.wikimedia.org/P34582 and previous config saved to /var/cache/conftool/dbconfig/20220913-092904-ladsgroup.json
  • 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T314041)', diff saved to https://phabricator.wikimedia.org/P34581 and previous config saved to /var/cache/conftool/dbconfig/20220913-092826-ladsgroup.json
  • 09:25 hashar: Stopped Puppet on contint2001 for a Jenkins systemd change
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103 T317614', diff saved to https://phabricator.wikimedia.org/P34580 and previous config saved to /var/cache/conftool/dbconfig/20220913-092200-root.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 primary T317614', diff saved to https://phabricator.wikimedia.org/P34579 and previous config saved to /var/cache/conftool/dbconfig/20220913-092032-root.json
  • 09:19 marostegui: Starting s1 codfw failover from db2103 to db2112 - T317614
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P34578 and previous config saved to /var/cache/conftool/dbconfig/20220913-091320-ladsgroup.json
  • 09:11 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 09:11 volans@cumin1001: START - Cookbook sre.network.cf
  • 09:02 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt with reason: router upgrade
  • 09:02 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt with reason: router upgrade
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P34577 and previous config saved to /var/cache/conftool/dbconfig/20220913-085814-ladsgroup.json
  • 08:56 topranks: Flipping primary routing engine to RE1 on cr1-codfw (disruptive) as part of upgrade.
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2112 with weight 0 T317614', diff saved to https://phabricator.wikimedia.org/P34576 and previous config saved to /var/cache/conftool/dbconfig/20220913-085456-marostegui.json
  • 08:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: Primary switchover s1 T317614
  • 08:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: Primary switchover s1 T317614
  • 08:46 topranks: Disabled LVS/PyBal peerings on cr1-codfw ain advance of upgrade to router.
  • 08:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T314041)', diff saved to https://phabricator.wikimedia.org/P34575 and previous config saved to /var/cache/conftool/dbconfig/20220913-084307-ladsgroup.json
  • 08:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
  • 08:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
  • 08:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
  • 08:27 cmooney@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:27 cmooney@cumin1001: START - Cookbook sre.network.cf
  • 08:17 moritzm: roll-restarting apache/FPM on mw canaries to pick up zlib security updates
  • 08:15 topranks: de-pooling codfw ahead of core router upgrades at the site
  • 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:18 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.28 refs T314190 (duration: 04m 29s)
  • 07:14 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.28 refs T314190
  • 07:11 jhuneidi@deploy1002: deploy-promote aborted: (duration: 00m 09s)
  • 06:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34574 and previous config saved to /var/cache/conftool/dbconfig/20220913-065457-ladsgroup.json
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P34573 and previous config saved to /var/cache/conftool/dbconfig/20220913-063951-ladsgroup.json
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T314041)', diff saved to https://phabricator.wikimedia.org/P34572 and previous config saved to /var/cache/conftool/dbconfig/20220913-063908-ladsgroup.json
  • 06:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 06:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P34571 and previous config saved to /var/cache/conftool/dbconfig/20220913-062444-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34570 and previous config saved to /var/cache/conftool/dbconfig/20220913-060938-ladsgroup.json
  • 04:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T314041)', diff saved to https://phabricator.wikimedia.org/P34569 and previous config saved to /var/cache/conftool/dbconfig/20220913-045832-ladsgroup.json
  • 04:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 04:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 04:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T314041)', diff saved to https://phabricator.wikimedia.org/P34568 and previous config saved to /var/cache/conftool/dbconfig/20220913-045811-ladsgroup.json
  • 04:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P34567 and previous config saved to /var/cache/conftool/dbconfig/20220913-044304-ladsgroup.json
  • 04:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P34566 and previous config saved to /var/cache/conftool/dbconfig/20220913-042758-ladsgroup.json
  • 04:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T314041)', diff saved to https://phabricator.wikimedia.org/P34565 and previous config saved to /var/cache/conftool/dbconfig/20220913-041251-ladsgroup.json
  • 04:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.27 (duration: 01m 59s)
  • 03:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1 refs T314190 (duration: 35m 37s)
  • 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.1 refs T314190
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34564 and previous config saved to /var/cache/conftool/dbconfig/20220913-022136-ladsgroup.json
  • 02:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 02:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T314041)', diff saved to https://phabricator.wikimedia.org/P34563 and previous config saved to /var/cache/conftool/dbconfig/20220913-022114-ladsgroup.json
  • 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P34562 and previous config saved to /var/cache/conftool/dbconfig/20220913-020608-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P34561 and previous config saved to /var/cache/conftool/dbconfig/20220913-015102-ladsgroup.json
  • 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T314041)', diff saved to https://phabricator.wikimedia.org/P34560 and previous config saved to /var/cache/conftool/dbconfig/20220913-013555-ladsgroup.json
  • 00:49 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2001.codfw.wmnet with reason: syntax error in sudo
  • 00:49 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2001.codfw.wmnet with reason: syntax error in sudo
  • 00:49 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2002.codfw.wmnet with reason: syntax error in sudo
  • 00:49 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2002.codfw.wmnet with reason: syntax error in sudo
  • 00:48 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: syntax error in sudo
  • 00:48 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: syntax error in sudo
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T314041)', diff saved to https://phabricator.wikimedia.org/P34559 and previous config saved to /var/cache/conftool/dbconfig/20220913-001908-ladsgroup.json
  • 00:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 00:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T314041)', diff saved to https://phabricator.wikimedia.org/P34558 and previous config saved to /var/cache/conftool/dbconfig/20220913-001846-ladsgroup.json
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P34557 and previous config saved to /var/cache/conftool/dbconfig/20220913-000340-ladsgroup.json

2022-09-12

  • 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P34556 and previous config saved to /var/cache/conftool/dbconfig/20220912-234833-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T314041)', diff saved to https://phabricator.wikimedia.org/P34555 and previous config saved to /var/cache/conftool/dbconfig/20220912-233327-ladsgroup.json
  • 22:53 mutante: phabricator - disabling MediaWiki extension repositories in Diffusion that have 0 commits - T296022 - T315706
  • 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T314041)', diff saved to https://phabricator.wikimedia.org/P34554 and previous config saved to /var/cache/conftool/dbconfig/20220912-224006-ladsgroup.json
  • 22:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T314041)', diff saved to https://phabricator.wikimedia.org/P34553 and previous config saved to /var/cache/conftool/dbconfig/20220912-223927-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P34552 and previous config saved to /var/cache/conftool/dbconfig/20220912-222420-ladsgroup.json
  • 22:23 mutante: phabricator - disabling repositories: tool-xh-bot, tool-editor-contribution-dashboard, tool-ranker, tool-editor-contribution, tool-mikasa-bot-1, tool-maintun, tool-add-text, tool-wikibookassamese-book.php (none of them had commits) T296022 - T315706
  • 22:20 mutante: phabricator - disabling repository "tool-ranker"
  • 22:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P34551 and previous config saved to /var/cache/conftool/dbconfig/20220912-220914-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T314041)', diff saved to https://phabricator.wikimedia.org/P34550 and previous config saved to /var/cache/conftool/dbconfig/20220912-215407-ladsgroup.json
  • 21:07 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 19s)
  • 21:07 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T312863)', diff saved to https://phabricator.wikimedia.org/P34549 and previous config saved to /var/cache/conftool/dbconfig/20220912-210123-ladsgroup.json
  • 20:57 TheresNoTime: closing UTC late backport window
  • 20:56 samtar@deploy1002: Finished scap: Backport for Set track_total_hits to true (duration: 05m 00s)
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:51 samtar@deploy1002: samtar and ebernhardson: Backport for Set track_total_hits to true synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:51 samtar@deploy1002: Started scap: Backport for Set track_total_hits to true
  • 20:49 samtar@deploy1002: Finished scap: Backport for Enable Nearby on Hebrew and French Wikipedia (T246493) (duration: 07m 27s)
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P34548 and previous config saved to /var/cache/conftool/dbconfig/20220912-204617-ladsgroup.json
  • 20:42 samtar@deploy1002: samtar and jdlrobson: Backport for Enable Nearby on Hebrew and French Wikipedia (T246493) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:42 samtar@deploy1002: Started scap: Backport for Enable Nearby on Hebrew and French Wikipedia (T246493)
  • 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:40 samtar@deploy1002: Finished scap: Backport for Deploy Research Incentive Survey to idwiki (T316466) (duration: 06m 25s)
  • 20:39 jhathaway: testing exim config change on mx1001.wikimedia.org
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:38 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dispatch-be1001.eqiad.wmnet
  • 20:34 samtar@deploy1002: samtar and dani: Backport for Deploy Research Incentive Survey to idwiki (T316466) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:34 samtar@deploy1002: Started scap: Backport for Deploy Research Incentive Survey to idwiki (T316466)
  • 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 samtar@deploy1002: Finished scap: Backport for Re-enable track_total_hits for elastic7 (T317374) (duration: 06m 12s)
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P34547 and previous config saved to /var/cache/conftool/dbconfig/20220912-203110-ladsgroup.json
  • 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:26 samtar@deploy1002: samtar and ebernhardson: Backport for Re-enable track_total_hits for elastic7 (T317374) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:26 samtar@deploy1002: Started scap: Backport for Re-enable track_total_hits for elastic7 (T317374)
  • 20:24 samtar@deploy1002: Finished scap: Backport for Create six more namespaces (three content namespaces and their corresponding three discussion namespaces) on the bn.wiktionary (T317424) (duration: 08m 14s)
  • 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T314041)', diff saved to https://phabricator.wikimedia.org/P34546 and previous config saved to /var/cache/conftool/dbconfig/20220912-202359-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:16 samtar@deploy1002: samtar and aishik: Backport for Create six more namespaces (three content namespaces and their corresponding three discussion namespaces) on the bn.wiktionary (T317424) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:16 samtar@deploy1002: Started scap: Backport for Create six more namespaces (three content namespaces and their corresponding three discussion namespaces) on the bn.wiktionary (T317424)
  • 20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T312863)', diff saved to https://phabricator.wikimedia.org/P34545 and previous config saved to /var/cache/conftool/dbconfig/20220912-201604-ladsgroup.json
  • 20:15 herron@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dispatch-be1001.eqiad.wmnet on all recursors
  • 20:14 herron@cumin1001: START - Cookbook sre.dns.wipe-cache dispatch-be1001.eqiad.wmnet on all recursors
  • 20:14 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:14 samtar@deploy1002: Finished scap: Backport for Mark spcomwiki and searchcomwiki as closed (T285685) (duration: 05m 40s)
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:12 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 20:12 herron@cumin1001: START - Cookbook sre.ganeti.makevm for new host dispatch-be1001.eqiad.wmnet
  • 20:11 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:09 samtar@deploy1002: samtar and zabe: Backport for Mark spcomwiki and searchcomwiki as closed (T285685) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:08 samtar@deploy1002: Started scap: Backport for Mark spcomwiki and searchcomwiki as closed (T285685)
  • 20:07 samtar@deploy1002: backport aborted: (duration: 03m 46s)
  • 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts theemin.codfw.wmnet
  • 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 19:59 maryum: deployed security patch for T314245
  • 19:59 pt1979@cumin2002: START - Cookbook sre.hosts.decommission for hosts theemin.codfw.wmnet
  • 19:58 mstyles@deploy1002: Synchronized php-1.39.0-wmf.28/extensions/PageTriage/includes/Api/ApiPageTriageAction.php: (no justification provided) (duration: 03m 42s)
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T314041)', diff saved to https://phabricator.wikimedia.org/P34544 and previous config saved to /var/cache/conftool/dbconfig/20220912-195540-ladsgroup.json
  • 19:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 19:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T314041)', diff saved to https://phabricator.wikimedia.org/P34543 and previous config saved to /var/cache/conftool/dbconfig/20220912-195519-ladsgroup.json
  • 19:53 sbassett: Deployed security patch for T311337
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P34542 and previous config saved to /var/cache/conftool/dbconfig/20220912-194013-ladsgroup.json
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T314041)', diff saved to https://phabricator.wikimedia.org/P34541 and previous config saved to /var/cache/conftool/dbconfig/20220912-192858-ladsgroup.json
  • 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 19:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34540 and previous config saved to /var/cache/conftool/dbconfig/20220912-192837-ladsgroup.json
  • 19:28 bking@deploy1002: Finished deploy [wdqs/wdqs@e012d14]: 0.3.116 (duration: 02m 04s)
  • 19:26 bking@deploy1002: Started deploy [wdqs/wdqs@e012d14]: 0.3.116
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P34539 and previous config saved to /var/cache/conftool/dbconfig/20220912-192506-ladsgroup.json
  • 19:20 dancy@deploy1002: Installation of scap version "4.19.0" completed for 561 hosts
  • 19:20 dancy@deploy1002: Installing scap version "4.19.0" for 561 hosts
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34538 and previous config saved to /var/cache/conftool/dbconfig/20220912-191330-ladsgroup.json
  • 19:12 dancy@deploy1002: Installation of scap version "4.18.0" completed for 561 hosts
  • 19:12 dancy@deploy1002: Installing scap version "4.18.0" for 561 hosts
  • 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T314041)', diff saved to https://phabricator.wikimedia.org/P34537 and previous config saved to /var/cache/conftool/dbconfig/20220912-191000-ladsgroup.json
  • 19:08 inflatador: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 19:04 ryankemper: [WCQS] Depooled `wcqs100[1,2]` while they catch up on ~1.5 days worth of lag (https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wcqs&viewPanel=8&from=1662910789183&to=1663068616559)
  • 19:00 inflatador: [WCQS Deploy] Test query passed on commons-query.wikimedia.org; WCQS deploy complete
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34536 and previous config saved to /var/cache/conftool/dbconfig/20220912-185823-ladsgroup.json
  • 18:56 bking@deploy1002: Finished deploy [wdqs/wdqs@e012d14] (wcqs): Deploy 0.3.116 to WCQS (duration: 08m 01s)
  • 18:48 bking@deploy1002: Started deploy [wdqs/wdqs@e012d14] (wcqs): Deploy 0.3.116 to WCQS
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34535 and previous config saved to /var/cache/conftool/dbconfig/20220912-184317-ladsgroup.json
  • 18:37 inflatador: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 18:37 inflatador: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 18:37 inflatador: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 18:22 bking@deploy1002: Finished deploy [wdqs/wdqs@e012d14]: 0.3.116 (duration: 07m 31s)
  • 18:14 bking@deploy1002: Started deploy [wdqs/wdqs@e012d14]: 0.3.116
  • 18:14 dancy@deploy1002: Installation of scap version "4.16.0" completed for 561 hosts
  • 18:13 dancy@deploy1002: Installing scap version "4.16.0" for 561 hosts
  • 18:08 bking@deploy1002: Finished deploy [wdqs/wdqs@e012d14]: 0.3.116 (duration: 05m 37s)
  • 18:02 bking@deploy1002: Started deploy [wdqs/wdqs@e012d14]: 0.3.116
  • 18:01 inflatador: [WDQS Deploy] Tests passing following deploy of `wdqs1003` on canary `wdqs1003`; proceeding to rest of fleet
  • 17:57 inflatador: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.116`. Pre-deploy tests passing on canary `wdqs1003`
  • 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T314041)', diff saved to https://phabricator.wikimedia.org/P34532 and previous config saved to /var/cache/conftool/dbconfig/20220912-174301-ladsgroup.json
  • 17:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 17:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34531 and previous config saved to /var/cache/conftool/dbconfig/20220912-174239-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P34529 and previous config saved to /var/cache/conftool/dbconfig/20220912-172733-ladsgroup.json
  • 17:21 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 17:21 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P34528 and previous config saved to /var/cache/conftool/dbconfig/20220912-171227-ladsgroup.json
  • 17:08 cwhite: rebuilt raid on logstash2027 T316996
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34527 and previous config saved to /var/cache/conftool/dbconfig/20220912-165720-ladsgroup.json
  • 15:54 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 15:54 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 (T314041)', diff saved to https://phabricator.wikimedia.org/P34524 and previous config saved to /var/cache/conftool/dbconfig/20220912-152920-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T314041)', diff saved to https://phabricator.wikimedia.org/P34523 and previous config saved to /var/cache/conftool/dbconfig/20220912-152858-ladsgroup.json
  • 15:18 dancy@deploy1002: Installation of scap version "4.18.0" completed for 561 hosts
  • 15:17 dancy@deploy1002: Installing scap version "4.18.0" for 561 hosts
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P34522 and previous config saved to /var/cache/conftool/dbconfig/20220912-151352-ladsgroup.json
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P34521 and previous config saved to /var/cache/conftool/dbconfig/20220912-145845-ladsgroup.json
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T314041)', diff saved to https://phabricator.wikimedia.org/P34520 and previous config saved to /var/cache/conftool/dbconfig/20220912-144339-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T312863)', diff saved to https://phabricator.wikimedia.org/P34519 and previous config saved to /var/cache/conftool/dbconfig/20220912-141427-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T312863)', diff saved to https://phabricator.wikimedia.org/P34518 and previous config saved to /var/cache/conftool/dbconfig/20220912-141405-ladsgroup.json
  • 14:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
  • 14:02 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wtp[1028-1030]
  • 14:02 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:01 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P34517 and previous config saved to /var/cache/conftool/dbconfig/20220912-135859-ladsgroup.json
  • 13:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
  • 13:53 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts wtp[1028-1030]
  • 13:50 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 13:49 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P34516 and previous config saved to /var/cache/conftool/dbconfig/20220912-134353-ladsgroup.json
  • 13:43 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:40 Lucas_WMDE: scap pull on mwdebug1001 to restore good code (confirmed that T317520 affects production)
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34515 and previous config saved to /var/cache/conftool/dbconfig/20220912-133848-root.json
  • 13:35 Lucas_WMDE: manually applying gerrit:830691 on mwdebug1001 to test if T317520 affects production (expected to cause getExpensiveParserFunctionLimit-related logstash errors)
  • 13:33 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/%s\n' {mobile/copyright/wikipedia-ko-600k.svg,project-logos/kowiki-600k{,-1.5x,-2x}.png} | mwscript purgeList.php # T315127
  • 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized static/images/mobile/copyright/: Config: Revert "kowiki: Add logo (legacy vector and vector-2022) for 600k articles" (T315127) (2/2; deleted file requires syncing whole directory) (duration: 03m 44s)
  • 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T312863)', diff saved to https://phabricator.wikimedia.org/P34514 and previous config saved to /var/cache/conftool/dbconfig/20220912-132846-ladsgroup.json
  • 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:26 lucaswerkmeister-wmde@deploy1002: Synchronized static/images/project-logos/: Config: Revert "kowiki: Add logo (legacy vector and vector-2022) for 600k articles" (T315127) (1/2; deleted files require syncing whole directory) (duration: 03m 50s)
  • 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34513 and previous config saved to /var/cache/conftool/dbconfig/20220912-132343-root.json
  • 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "kowiki: Change logo for 600k articles" (T315127) (3/3) (duration: 03m 53s)
  • 13:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/logos.php: Config: Revert "kowiki: Change logo for 600k articles" (T315127) (2/3) (duration: 03m 39s)
  • 13:12 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 13:12 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 13:10 lucaswerkmeister-wmde@deploy1002: Synchronized logos/config.yaml: Config: Revert "kowiki: Change logo for 600k articles" (T315127) (1/3) (duration: 03m 53s)
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maint
  • 13:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maint
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34512 and previous config saved to /var/cache/conftool/dbconfig/20220912-130838-root.json
  • 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34511 and previous config saved to /var/cache/conftool/dbconfig/20220912-125333-root.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34510 and previous config saved to /var/cache/conftool/dbconfig/20220912-123828-root.json
  • 12:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34509 and previous config saved to /var/cache/conftool/dbconfig/20220912-122654-root.json
  • 12:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34508 and previous config saved to /var/cache/conftool/dbconfig/20220912-122323-root.json
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34507 and previous config saved to /var/cache/conftool/dbconfig/20220912-122242-ladsgroup.json
  • 12:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34506 and previous config saved to /var/cache/conftool/dbconfig/20220912-122221-ladsgroup.json
  • 12:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34505 and previous config saved to /var/cache/conftool/dbconfig/20220912-121150-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34504 and previous config saved to /var/cache/conftool/dbconfig/20220912-120818-root.json
  • 12:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1096.eqiad.wmnet
  • 12:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1146.eqiad.wmnet
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P34503 and previous config saved to /var/cache/conftool/dbconfig/20220912-120715-ladsgroup.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34502 and previous config saved to /var/cache/conftool/dbconfig/20220912-115645-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34501 and previous config saved to /var/cache/conftool/dbconfig/20220912-115313-root.json
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P34500 and previous config saved to /var/cache/conftool/dbconfig/20220912-115208-ladsgroup.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34499 and previous config saved to /var/cache/conftool/dbconfig/20220912-114140-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34498 and previous config saved to /var/cache/conftool/dbconfig/20220912-113808-root.json
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34497 and previous config saved to /var/cache/conftool/dbconfig/20220912-113702-ladsgroup.json
  • 11:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:28 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Enable writes on es4 T317522 (duration: 03m 36s)
  • 11:27 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 11:27 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34496 and previous config saved to /var/cache/conftool/dbconfig/20220912-112635-root.json
  • 11:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 T317522', diff saved to https://phabricator.wikimedia.org/P34495 and previous config saved to /var/cache/conftool/dbconfig/20220912-112343-root.json
  • 11:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1146.eqiad.wmnet
  • 11:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1020 to es4 primary T317522', diff saved to https://phabricator.wikimedia.org/P34494 and previous config saved to /var/cache/conftool/dbconfig/20220912-112039-root.json
  • 11:20 marostegui: Starting es4 eqiad failover from es1021 to es1020 - T317522
  • 11:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:18 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Disable writes on es4 T317522 (duration: 04m 10s)
  • 11:16 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1143-1148].eqiad.wmnet
  • 11:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-etcd1001.eqiad.wmnet
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34493 and previous config saved to /var/cache/conftool/dbconfig/20220912-111442-root.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1020 with weight 0 T317522', diff saved to https://phabricator.wikimedia.org/P34492 and previous config saved to /var/cache/conftool/dbconfig/20220912-111424-root.json
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 T317522
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 T317522
  • 11:12 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1143-1148].eqiad.wmnet
  • 11:11 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-etcd1001.eqiad.wmnet
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34491 and previous config saved to /var/cache/conftool/dbconfig/20220912-111130-root.json
  • 11:10 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1142.eqiad.wmnet
  • 11:09 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1142.eqiad.wmnet
  • 11:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
  • 11:04 moritzm: updated bullseye install image for 11.5 release T317416
  • 10:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34490 and previous config saved to /var/cache/conftool/dbconfig/20220912-105937-root.json
  • 10:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T314041)', diff saved to https://phabricator.wikimedia.org/P34489 and previous config saved to /var/cache/conftool/dbconfig/20220912-105841-ladsgroup.json
  • 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34488 and previous config saved to /var/cache/conftool/dbconfig/20220912-105625-root.json
  • 10:55 topranks: re-pooliong esams after successful upgrade of core router cr3-esams T295690
  • 10:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34487 and previous config saved to /var/cache/conftool/dbconfig/20220912-104432-root.json
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34486 and previous config saved to /var/cache/conftool/dbconfig/20220912-104120-root.json
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1034', diff saved to https://phabricator.wikimedia.org/P34485 and previous config saved to /var/cache/conftool/dbconfig/20220912-103428-root.json
  • 10:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34484 and previous config saved to /var/cache/conftool/dbconfig/20220912-102928-root.json
  • 10:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
  • 10:23 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 37s)
  • 10:22 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34483 and previous config saved to /var/cache/conftool/dbconfig/20220912-101842-root.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34481 and previous config saved to /var/cache/conftool/dbconfig/20220912-101423-root.json
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34480 and previous config saved to /var/cache/conftool/dbconfig/20220912-100337-root.json
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34479 and previous config saved to /var/cache/conftool/dbconfig/20220912-095918-root.json
  • 09:55 Emperor: rebalance thanos rings T311690
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34478 and previous config saved to /var/cache/conftool/dbconfig/20220912-094832-root.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1033', diff saved to https://phabricator.wikimedia.org/P34477 and previous config saved to /var/cache/conftool/dbconfig/20220912-094818-root.json
  • 09:45 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr3-esams,cr3-esams IPv6,re0.cr3-esams.mgmt with reason: router upgrade
  • 09:45 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr3-esams,cr3-esams IPv6,re0.cr3-esams.mgmt with reason: router upgrade
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34476 and previous config saved to /var/cache/conftool/dbconfig/20220912-093327-root.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34475 and previous config saved to /var/cache/conftool/dbconfig/20220912-093318-root.json
  • 09:31 moritzm: updated buster install image for 10.13 release T317413
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34474 and previous config saved to /var/cache/conftool/dbconfig/20220912-092244-root.json
  • 09:22 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:18 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34473 and previous config saved to /var/cache/conftool/dbconfig/20220912-091822-root.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34472 and previous config saved to /var/cache/conftool/dbconfig/20220912-091813-root.json
  • 09:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
  • 09:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34471 and previous config saved to /var/cache/conftool/dbconfig/20220912-090739-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34470 and previous config saved to /var/cache/conftool/dbconfig/20220912-090317-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34469 and previous config saved to /var/cache/conftool/dbconfig/20220912-090308-root.json
  • 08:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34468 and previous config saved to /var/cache/conftool/dbconfig/20220912-085234-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34467 and previous config saved to /var/cache/conftool/dbconfig/20220912-084812-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34466 and previous config saved to /var/cache/conftool/dbconfig/20220912-084803-root.json
  • 08:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
  • 08:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet
  • 08:39 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr3-esams,cr3-esams IPv6,re0.cr3-esams.mgmt with reason: router upgrade
  • 08:39 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr3-esams,cr3-esams IPv6,re0.cr3-esams.mgmt with reason: router upgrade
  • 08:38 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1096.eqiad.wmnet
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34465 and previous config saved to /var/cache/conftool/dbconfig/20220912-083729-root.json
  • 08:36 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr3-esams,cr3-esams IPv6,cr3-esams.mgmt with reason: router upgrade
  • 08:36 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr3-esams,cr3-esams IPv6,cr3-esams.mgmt with reason: router upgrade
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34464 and previous config saved to /var/cache/conftool/dbconfig/20220912-083308-root.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34463 and previous config saved to /var/cache/conftool/dbconfig/20220912-083258-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34462 and previous config saved to /var/cache/conftool/dbconfig/20220912-082224-root.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032', diff saved to https://phabricator.wikimedia.org/P34461 and previous config saved to /var/cache/conftool/dbconfig/20220912-081936-root.json
  • 08:17 moritzm: imported jenkins 2.361.1 to thirdparty/ci T317418
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34460 and previous config saved to /var/cache/conftool/dbconfig/20220912-081754-root.json
  • 08:09 cmooney@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:08 cmooney@cumin1001: START - Cookbook sre.network.cf
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34459 and previous config saved to /var/cache/conftool/dbconfig/20220912-080719-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2023 T317508', diff saved to https://phabricator.wikimedia.org/P34458 and previous config saved to /var/cache/conftool/dbconfig/20220912-080602-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2024 to es5 codfw primary T317508', diff saved to https://phabricator.wikimedia.org/P34457 and previous config saved to /var/cache/conftool/dbconfig/20220912-080400-root.json
  • 08:03 marostegui: Starting es5 codfw failover from es2023 to es2024 - T317508
  • 08:01 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr3-esams with reason: router upgrade
  • 08:01 elukey: restart kafka on kafka2001 to pick up new PKI settings
  • 08:01 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr3-esams with reason: router upgrade
  • 08:00 hashar: Restarting CI Jenkins for upgrade T317418
  • 08:00 topranks: de-pooliong esams in advance of upgrade to core router cr3-esams T295690
  • 07:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging2001.codfw.wmnet with reason: Kafka PKI upgrade
  • 07:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging2001.codfw.wmnet with reason: Kafka PKI upgrade
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2024 with weight 0 T317508', diff saved to https://phabricator.wikimedia.org/P34456 and previous config saved to /var/cache/conftool/dbconfig/20220912-075739-root.json
  • 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T317508
  • 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 T317508
  • 07:47 hashar: Upgraded Jenkins instances from 2.346.1 to 2.346.3 # T317418
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2021 T317507', diff saved to https://phabricator.wikimedia.org/P34455 and previous config saved to /var/cache/conftool/dbconfig/20220912-074258-root.json
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2020 to es4 primary and set section read-write T317507', diff saved to https://phabricator.wikimedia.org/P34454 and previous config saved to /var/cache/conftool/dbconfig/20220912-074100-root.json
  • 07:39 marostegui: Starting es4 codfw failover from es2021 to es2020 - T317507
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2020 with weight 0 T317507', diff saved to https://phabricator.wikimedia.org/P34453 and previous config saved to /var/cache/conftool/dbconfig/20220912-073408-root.json
  • 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 T317507
  • 07:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 T317507
  • 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T312863)', diff saved to https://phabricator.wikimedia.org/P34452 and previous config saved to /var/cache/conftool/dbconfig/20220912-072931-ladsgroup.json
  • 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T312863)', diff saved to https://phabricator.wikimedia.org/P34450 and previous config saved to /var/cache/conftool/dbconfig/20220912-072909-ladsgroup.json
  • 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34449 and previous config saved to /var/cache/conftool/dbconfig/20220912-071829-root.json
  • 07:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T314041)', diff saved to https://phabricator.wikimedia.org/P34448 and previous config saved to /var/cache/conftool/dbconfig/20220912-071700-ladsgroup.json
  • 07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 07:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P34447 and previous config saved to /var/cache/conftool/dbconfig/20220912-071403-ladsgroup.json
  • 07:11 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old templatelinks fields everywhere (T312865) (duration: 06m 57s)
  • 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:04 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Stop writing to the old templatelinks fields everywhere (T312865) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:04 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old templatelinks fields everywhere (T312865)
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34446 and previous config saved to /var/cache/conftool/dbconfig/20220912-070324-root.json
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P34445 and previous config saved to /var/cache/conftool/dbconfig/20220912-065856-ladsgroup.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34444 and previous config saved to /var/cache/conftool/dbconfig/20220912-065028-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34443 and previous config saved to /var/cache/conftool/dbconfig/20220912-064819-root.json
  • 06:47 moritzm: installing 5.10.136 updates on buster systems running 5.10
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T312863)', diff saved to https://phabricator.wikimedia.org/P34442 and previous config saved to /var/cache/conftool/dbconfig/20220912-064350-ladsgroup.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34441 and previous config saved to /var/cache/conftool/dbconfig/20220912-063523-root.json
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34440 and previous config saved to /var/cache/conftool/dbconfig/20220912-063314-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34439 and previous config saved to /var/cache/conftool/dbconfig/20220912-062018-root.json
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34438 and previous config saved to /var/cache/conftool/dbconfig/20220912-061810-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34437 and previous config saved to /var/cache/conftool/dbconfig/20220912-060513-root.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34436 and previous config saved to /var/cache/conftool/dbconfig/20220912-060305-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2024 for upgrade', diff saved to https://phabricator.wikimedia.org/P34435 and previous config saved to /var/cache/conftool/dbconfig/20220912-055101-root.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34434 and previous config saved to /var/cache/conftool/dbconfig/20220912-055008-root.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34433 and previous config saved to /var/cache/conftool/dbconfig/20220912-053504-root.json
  • 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 T317507
  • 05:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 T317507
  • 05:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 05:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 05:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P34432 and previous config saved to /var/cache/conftool/dbconfig/20220912-052143-ladsgroup.json
  • 05:21 marostegui: dbmaint Reboot es2020 for kernel upgrade T317507
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2020 for upgrade T317507', diff saved to https://phabricator.wikimedia.org/P34431 and previous config saved to /var/cache/conftool/dbconfig/20220912-051906-root.json
  • 05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P34430 and previous config saved to /var/cache/conftool/dbconfig/20220912-050636-ladsgroup.json
  • 04:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P34429 and previous config saved to /var/cache/conftool/dbconfig/20220912-045130-ladsgroup.json
  • 04:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P34428 and previous config saved to /var/cache/conftool/dbconfig/20220912-043624-ladsgroup.json
  • 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T314041)', diff saved to https://phabricator.wikimedia.org/P34427 and previous config saved to /var/cache/conftool/dbconfig/20220912-020638-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P34426 and previous config saved to /var/cache/conftool/dbconfig/20220912-015131-ladsgroup.json
  • 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P34425 and previous config saved to /var/cache/conftool/dbconfig/20220912-013625-ladsgroup.json
  • 01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T314041)', diff saved to https://phabricator.wikimedia.org/P34424 and previous config saved to /var/cache/conftool/dbconfig/20220912-012118-ladsgroup.json
  • 00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T312863)', diff saved to https://phabricator.wikimedia.org/P34423 and previous config saved to /var/cache/conftool/dbconfig/20220912-004952-ladsgroup.json
  • 00:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 00:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 00:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 00:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P34422 and previous config saved to /var/cache/conftool/dbconfig/20220912-004915-ladsgroup.json
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P34421 and previous config saved to /var/cache/conftool/dbconfig/20220912-003409-ladsgroup.json
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P34420 and previous config saved to /var/cache/conftool/dbconfig/20220912-001902-ladsgroup.json
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P34419 and previous config saved to /var/cache/conftool/dbconfig/20220912-000356-ladsgroup.json

2022-09-11

  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P34418 and previous config saved to /var/cache/conftool/dbconfig/20220911-175643-ladsgroup.json
  • 17:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 17:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P34417 and previous config saved to /var/cache/conftool/dbconfig/20220911-175621-ladsgroup.json
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P34416 and previous config saved to /var/cache/conftool/dbconfig/20220911-174114-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P34415 and previous config saved to /var/cache/conftool/dbconfig/20220911-172608-ladsgroup.json
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P34414 and previous config saved to /var/cache/conftool/dbconfig/20220911-171102-ladsgroup.json
  • 13:22 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 13:22 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 12:47 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 08s)
  • 12:46 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 12:36 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 12:36 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 12:09 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 08s)
  • 12:09 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 11:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P34412 and previous config saved to /var/cache/conftool/dbconfig/20220911-114850-ladsgroup.json
  • 11:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T314041)', diff saved to https://phabricator.wikimedia.org/P34411 and previous config saved to /var/cache/conftool/dbconfig/20220911-114829-ladsgroup.json
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P34410 and previous config saved to /var/cache/conftool/dbconfig/20220911-113323-ladsgroup.json
  • 11:26 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 11:26 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P34409 and previous config saved to /var/cache/conftool/dbconfig/20220911-111816-ladsgroup.json
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T314041)', diff saved to https://phabricator.wikimedia.org/P34408 and previous config saved to /var/cache/conftool/dbconfig/20220911-110310-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P34407 and previous config saved to /var/cache/conftool/dbconfig/20220911-110228-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T312863)', diff saved to https://phabricator.wikimedia.org/P34406 and previous config saved to /var/cache/conftool/dbconfig/20220911-110207-ladsgroup.json
  • 10:56 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 10:56 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P34405 and previous config saved to /var/cache/conftool/dbconfig/20220911-104700-ladsgroup.json
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P34404 and previous config saved to /var/cache/conftool/dbconfig/20220911-103154-ladsgroup.json
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T312863)', diff saved to https://phabricator.wikimedia.org/P34403 and previous config saved to /var/cache/conftool/dbconfig/20220911-101647-ladsgroup.json
  • 10:06 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 10:06 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T314041)', diff saved to https://phabricator.wikimedia.org/P34402 and previous config saved to /var/cache/conftool/dbconfig/20220911-084529-ladsgroup.json
  • 08:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 08:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T312863)', diff saved to https://phabricator.wikimedia.org/P34401 and previous config saved to /var/cache/conftool/dbconfig/20220911-041936-ladsgroup.json
  • 04:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 04:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T312863)', diff saved to https://phabricator.wikimedia.org/P34400 and previous config saved to /var/cache/conftool/dbconfig/20220911-041914-ladsgroup.json
  • 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34399 and previous config saved to /var/cache/conftool/dbconfig/20220911-040407-ladsgroup.json
  • 03:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34398 and previous config saved to /var/cache/conftool/dbconfig/20220911-034901-ladsgroup.json
  • 03:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T312863)', diff saved to https://phabricator.wikimedia.org/P34397 and previous config saved to /var/cache/conftool/dbconfig/20220911-033355-ladsgroup.json

2022-09-10

  • 21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T312863)', diff saved to https://phabricator.wikimedia.org/P34396 and previous config saved to /var/cache/conftool/dbconfig/20220910-213300-ladsgroup.json
  • 21:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 21:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T312863)', diff saved to https://phabricator.wikimedia.org/P34395 and previous config saved to /var/cache/conftool/dbconfig/20220910-213238-ladsgroup.json
  • 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34394 and previous config saved to /var/cache/conftool/dbconfig/20220910-211732-ladsgroup.json
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34393 and previous config saved to /var/cache/conftool/dbconfig/20220910-210225-ladsgroup.json
  • 20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T312863)', diff saved to https://phabricator.wikimedia.org/P34392 and previous config saved to /var/cache/conftool/dbconfig/20220910-204719-ladsgroup.json
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T314041)', diff saved to https://phabricator.wikimedia.org/P34391 and previous config saved to /var/cache/conftool/dbconfig/20220910-191455-ladsgroup.json
  • 19:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 19:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T314041)', diff saved to https://phabricator.wikimedia.org/P34390 and previous config saved to /var/cache/conftool/dbconfig/20220910-191434-ladsgroup.json
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P34389 and previous config saved to /var/cache/conftool/dbconfig/20220910-185927-ladsgroup.json
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P34388 and previous config saved to /var/cache/conftool/dbconfig/20220910-184421-ladsgroup.json
  • 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T314041)', diff saved to https://phabricator.wikimedia.org/P34387 and previous config saved to /var/cache/conftool/dbconfig/20220910-182914-ladsgroup.json
  • 17:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P34386 and previous config saved to /var/cache/conftool/dbconfig/20220910-174141-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P34385 and previous config saved to /var/cache/conftool/dbconfig/20220910-172635-ladsgroup.json
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P34384 and previous config saved to /var/cache/conftool/dbconfig/20220910-171127-ladsgroup.json
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P34383 and previous config saved to /var/cache/conftool/dbconfig/20220910-165621-ladsgroup.json
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T312863)', diff saved to https://phabricator.wikimedia.org/P34382 and previous config saved to /var/cache/conftool/dbconfig/20220910-145558-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 14:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T312863)', diff saved to https://phabricator.wikimedia.org/P34381 and previous config saved to /var/cache/conftool/dbconfig/20220910-121124-ladsgroup.json
  • 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P34380 and previous config saved to /var/cache/conftool/dbconfig/20220910-115617-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P34379 and previous config saved to /var/cache/conftool/dbconfig/20220910-114111-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T312863)', diff saved to https://phabricator.wikimedia.org/P34378 and previous config saved to /var/cache/conftool/dbconfig/20220910-112605-ladsgroup.json
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T312863)', diff saved to https://phabricator.wikimedia.org/P34377 and previous config saved to /var/cache/conftool/dbconfig/20220910-093703-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34376 and previous config saved to /var/cache/conftool/dbconfig/20220910-092156-ladsgroup.json
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34375 and previous config saved to /var/cache/conftool/dbconfig/20220910-090650-ladsgroup.json
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T312863)', diff saved to https://phabricator.wikimedia.org/P34374 and previous config saved to /var/cache/conftool/dbconfig/20220910-085143-ladsgroup.json
  • 05:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T312863)', diff saved to https://phabricator.wikimedia.org/P34373 and previous config saved to /var/cache/conftool/dbconfig/20220910-052410-ladsgroup.json
  • 05:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T312863)', diff saved to https://phabricator.wikimedia.org/P34372 and previous config saved to /var/cache/conftool/dbconfig/20220910-052349-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P34371 and previous config saved to /var/cache/conftool/dbconfig/20220910-050842-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P34370 and previous config saved to /var/cache/conftool/dbconfig/20220910-045336-ladsgroup.json
  • 04:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T312863)', diff saved to https://phabricator.wikimedia.org/P34369 and previous config saved to /var/cache/conftool/dbconfig/20220910-043829-ladsgroup.json
  • 02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T312863)', diff saved to https://phabricator.wikimedia.org/P34368 and previous config saved to /var/cache/conftool/dbconfig/20220910-025548-ladsgroup.json
  • 02:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 02:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T312863)', diff saved to https://phabricator.wikimedia.org/P34367 and previous config saved to /var/cache/conftool/dbconfig/20220910-025526-ladsgroup.json
  • 02:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T314041)', diff saved to https://phabricator.wikimedia.org/P34366 and previous config saved to /var/cache/conftool/dbconfig/20220910-024401-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 02:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T314041)', diff saved to https://phabricator.wikimedia.org/P34365 and previous config saved to /var/cache/conftool/dbconfig/20220910-024339-ladsgroup.json
  • 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34364 and previous config saved to /var/cache/conftool/dbconfig/20220910-024019-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P34363 and previous config saved to /var/cache/conftool/dbconfig/20220910-022833-ladsgroup.json
  • 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34362 and previous config saved to /var/cache/conftool/dbconfig/20220910-022513-ladsgroup.json
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P34361 and previous config saved to /var/cache/conftool/dbconfig/20220910-021326-ladsgroup.json
  • 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T312863)', diff saved to https://phabricator.wikimedia.org/P34360 and previous config saved to /var/cache/conftool/dbconfig/20220910-021007-ladsgroup.json
  • 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T314041)', diff saved to https://phabricator.wikimedia.org/P34359 and previous config saved to /var/cache/conftool/dbconfig/20220910-015820-ladsgroup.json
  • 00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P34358 and previous config saved to /var/cache/conftool/dbconfig/20220910-005046-ladsgroup.json
  • 00:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 00:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T314041)', diff saved to https://phabricator.wikimedia.org/P34357 and previous config saved to /var/cache/conftool/dbconfig/20220910-005025-ladsgroup.json
  • 00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P34356 and previous config saved to /var/cache/conftool/dbconfig/20220910-003518-ladsgroup.json
  • 00:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P34355 and previous config saved to /var/cache/conftool/dbconfig/20220910-002012-ladsgroup.json
  • 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T314041)', diff saved to https://phabricator.wikimedia.org/P34354 and previous config saved to /var/cache/conftool/dbconfig/20220910-000504-ladsgroup.json

2022-09-09

  • 22:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T312863)', diff saved to https://phabricator.wikimedia.org/P34353 and previous config saved to /var/cache/conftool/dbconfig/20220909-224245-ladsgroup.json
  • 22:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 22:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 22:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T312863)', diff saved to https://phabricator.wikimedia.org/P34352 and previous config saved to /var/cache/conftool/dbconfig/20220909-224223-ladsgroup.json
  • 22:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34351 and previous config saved to /var/cache/conftool/dbconfig/20220909-222717-ladsgroup.json
  • 22:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34350 and previous config saved to /var/cache/conftool/dbconfig/20220909-221210-ladsgroup.json
  • 21:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T312863)', diff saved to https://phabricator.wikimedia.org/P34349 and previous config saved to /var/cache/conftool/dbconfig/20220909-215704-ladsgroup.json
  • 20:27 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=93) for new host dispatch-be1001.eqiad.wmnet
  • 20:27 herron@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dispatch-be1001.eqiad.wmnet on all recursors
  • 20:27 herron@cumin1001: START - Cookbook sre.dns.wipe-cache dispatch-be1001.eqiad.wmnet on all recursors
  • 20:27 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T312863)', diff saved to https://phabricator.wikimedia.org/P34348 and previous config saved to /var/cache/conftool/dbconfig/20220909-201857-ladsgroup.json
  • 20:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 20:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T312863)', diff saved to https://phabricator.wikimedia.org/P34347 and previous config saved to /var/cache/conftool/dbconfig/20220909-201835-ladsgroup.json
  • 20:03 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 20:03 herron@cumin1001: START - Cookbook sre.ganeti.makevm for new host dispatch-be1001.eqiad.wmnet
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P34345 and previous config saved to /var/cache/conftool/dbconfig/20220909-200329-ladsgroup.json
  • 20:02 herron@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P34344 and previous config saved to /var/cache/conftool/dbconfig/20220909-194822-ladsgroup.json
  • 19:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T312863)', diff saved to https://phabricator.wikimedia.org/P34343 and previous config saved to /var/cache/conftool/dbconfig/20220909-193316-ladsgroup.json
  • 18:21 herron@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 16:21 herron@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 15:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T312863)', diff saved to https://phabricator.wikimedia.org/P34342 and previous config saved to /var/cache/conftool/dbconfig/20220909-155234-ladsgroup.json
  • 15:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 15:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 15:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P34341 and previous config saved to /var/cache/conftool/dbconfig/20220909-155213-ladsgroup.json
  • 15:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34340 and previous config saved to /var/cache/conftool/dbconfig/20220909-153706-ladsgroup.json
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34339 and previous config saved to /var/cache/conftool/dbconfig/20220909-152159-ladsgroup.json
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P34338 and previous config saved to /var/cache/conftool/dbconfig/20220909-150651-ladsgroup.json
  • 14:44 moritzm: imported jenkins 2.346.3 to thirdparty/ci
  • 14:43 dcausse@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T317381: Revert "Disable CirrusSearch completion suggester" (duration: 03m 57s)
  • 14:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:40 herron@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2112 (T312863)', diff saved to https://phabricator.wikimedia.org/P34336 and previous config saved to /var/cache/conftool/dbconfig/20220909-133846-ladsgroup.json
  • 13:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 13:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 13:33 dcausse: restartin blazegraph on wdqs2003 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 13:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 12:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 12:24 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T312863)', diff saved to https://phabricator.wikimedia.org/P34334 and previous config saved to /var/cache/conftool/dbconfig/20220909-120029-ladsgroup.json
  • 11:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P34333 and previous config saved to /var/cache/conftool/dbconfig/20220909-114522-ladsgroup.json
  • 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P34331 and previous config saved to /var/cache/conftool/dbconfig/20220909-113016-ladsgroup.json
  • 11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T312863)', diff saved to https://phabricator.wikimedia.org/P34330 and previous config saved to /var/cache/conftool/dbconfig/20220909-111509-ladsgroup.json
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34329 and previous config saved to /var/cache/conftool/dbconfig/20220909-103334-root.json
  • 10:31 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 10:31 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34328 and previous config saved to /var/cache/conftool/dbconfig/20220909-101830-root.json
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34327 and previous config saved to /var/cache/conftool/dbconfig/20220909-100324-root.json
  • 09:53 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
  • 09:53 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34326 and previous config saved to /var/cache/conftool/dbconfig/20220909-094819-root.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34325 and previous config saved to /var/cache/conftool/dbconfig/20220909-093314-root.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34323 and previous config saved to /var/cache/conftool/dbconfig/20220909-091809-root.json
  • 08:59 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wtp[1029-1033].eqiad.wmnet
  • 08:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:56 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 08:37 cgoubert@cumin1001: START - Cookbook sre.hosts.decommission for hosts wtp[1029-1033].eqiad.wmnet
  • 08:32 dcausse: rebuilding all completion indices in elastic@codfw
  • 08:16 dcausse: restarting on blazegraph on wdqs2002 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 08:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T314041)', diff saved to https://phabricator.wikimedia.org/P34322 and previous config saved to /var/cache/conftool/dbconfig/20220909-081103-ladsgroup.json
  • 08:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 08:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T314041)', diff saved to https://phabricator.wikimedia.org/P34321 and previous config saved to /var/cache/conftool/dbconfig/20220909-081042-ladsgroup.json
  • 08:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 08:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 08:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 08:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P34320 and previous config saved to /var/cache/conftool/dbconfig/20220909-080609-ladsgroup.json
  • 08:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T312863)', diff saved to https://phabricator.wikimedia.org/P34319 and previous config saved to /var/cache/conftool/dbconfig/20220909-074710-ladsgroup.json
  • 07:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 07:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 07:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 for upgrade', diff saved to https://phabricator.wikimedia.org/P34318 and previous config saved to /var/cache/conftool/dbconfig/20220909-071255-root.json
  • 07:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet
  • 05:44 marostegui: dbmaint s8 wikidatawiki eqiad T317349
  • 05:43 marostegui: dbmaint s3 testwikidatawiki eqiad T317349
  • 05:42 marostegui: dbmaint s4 commonswiki eqiad T317349
  • 05:41 marostegui: dbmaint s4 testcommonswiki eqiad T317349
  • 05:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 05:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 05:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 05:11 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cirrus: Switch all wikis from completion suggester to prefix search, yesterdays completion index builds in codfw weren't all succesfull and users are getting incomplete results (duration: 04m 01s)
  • 05:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:09 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 22s)
  • 00:08 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)

2022-09-08

  • 23:56 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 27s)
  • 23:55 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
  • 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.28 refs T314189
  • 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:02 TheresNoTime: closing UTC late backport and config training
  • 21:01 samtar@deploy1002: Finished scap: Backport for Fix selser on html endpoints (T317215) (duration: 06m 48s)
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:55 samtar@deploy1002: samtar and arlolra: Backport for Fix selser on html endpoints (T317215) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:55 samtar@deploy1002: Started scap: Backport for Fix selser on html endpoints (T317215)
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:33 samtar@deploy1002: Finished scap: Backport for Fix selser on html endpoints (T317215) (duration: 12m 06s)
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:21 samtar@deploy1002: samtar and arlolra: Backport for Fix selser on html endpoints (T317215) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:21 samtar@deploy1002: Started scap: Backport for Fix selser on html endpoints (T317215)
  • 20:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:56 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.39.0-wmf.27 refs T314189
  • 19:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:36 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.28 refs T314189
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:15 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.28 refs T314189 (duration: 03m 39s)
  • 19:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:11 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.28 refs T314189
  • 17:33 sukhe: stat1008: sudo ipmitool -I lanplus -H "stat1008.mgmt.eqiad.wmnet" -U root -E chassis power cycle
  • 17:22 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:22 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:22 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:21 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:21 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:20 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:22 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:04 dancy@deploy1002: Installing scap version "4.17.0" for 566 hosts
  • 15:57 pt1979@cumin1001: START - Cookbook sre.hosts.provision for host kafka-logging1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:55 pt1979@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:52 pt1979@cumin1001: START - Cookbook sre.dns.netbox
  • 15:51 pt1979@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:50 cgoubert@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid
  • 15:49 pt1979@cumin1001: START - Cookbook sre.dns.netbox
  • 15:45 cgoubert@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wtp: Purge wtp servers following migration to parse (T317025) (duration: 04m 00s)
  • 15:40 pt1979@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:39 pt1979@cumin1001: START - Cookbook sre.dns.netbox
  • 15:36 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver
  • 15:36 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver
  • 15:35 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api-https
  • 15:33 akosiaris: restart etcdmirror on conf2005
  • 15:28 cgoubert@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wtp: Purge wtp servers following migration to parse (T317025) (duration: 12m 48s)
  • 15:25 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 15:25 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
  • 15:25 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 15:24 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
  • 15:21 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:21 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
  • 15:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:11 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:02 moritzm: installing nginx security updates on bullseye
  • 14:58 papaul: maintenance on mr1-codfw complete
  • 14:57 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wtp[1025-1028,1048].eqiad.wmnet
  • 14:57 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:54 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 14:40 cgoubert@cumin1001: START - Cookbook sre.hosts.decommission for hosts wtp[1025-1028,1048].eqiad.wmnet
  • 14:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wtp[1043-1047].eqiad.wmnet
  • 14:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:38 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:36 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 14:35 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:25 cgoubert@cumin1001: START - Cookbook sre.hosts.decommission for hosts wtp[1043-1047].eqiad.wmnet
  • 14:23 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wtp[1038-1042].eqiad.wmnet
  • 14:23 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:20 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Will do maint later', diff saved to https://phabricator.wikimedia.org/P34312 and previous config saved to /var/cache/conftool/dbconfig/20220908-142029-ladsgroup.json
  • 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: Maint over long time ago', diff saved to https://phabricator.wikimedia.org/P34311 and previous config saved to /var/cache/conftool/dbconfig/20220908-141600-ladsgroup.json
  • 14:07 cgoubert@cumin1001: START - Cookbook sre.hosts.decommission for hosts wtp[1038-1042].eqiad.wmnet
  • 14:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wtp1037.eqiad.wmnet
  • 14:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Will do maint later', diff saved to https://phabricator.wikimedia.org/P34310 and previous config saved to /var/cache/conftool/dbconfig/20220908-140524-ladsgroup.json
  • 14:04 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: Maint over long time ago', diff saved to https://phabricator.wikimedia.org/P34309 and previous config saved to /var/cache/conftool/dbconfig/20220908-140055-ladsgroup.json
  • 14:00 papaul: on going maintenance on mr1-codfw
  • 13:57 cgoubert@cumin1001: START - Cookbook sre.hosts.decommission for hosts wtp1037.eqiad.wmnet
  • 13:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wtp1036.eqiad.wmnet
  • 13:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:53 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 13:51 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Will do maint later', diff saved to https://phabricator.wikimedia.org/P34307 and previous config saved to /var/cache/conftool/dbconfig/20220908-135019-ladsgroup.json
  • 13:47 cgoubert@cumin1001: START - Cookbook sre.hosts.decommission for hosts wtp1036.eqiad.wmnet
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: Maint over long time ago', diff saved to https://phabricator.wikimedia.org/P34305 and previous config saved to /var/cache/conftool/dbconfig/20220908-134550-ladsgroup.json
  • 13:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wtp1035.eqiad.wmnet
  • 13:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 vgutierrez: rolling upgrade to ats 9 in cp drmrs - T309651
  • 13:41 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 13:39 vgutierrez: disable puppet on A:cp-drmrs during the update to ATS 9.1.3 - T309651
  • 13:36 cgoubert@cumin1001: START - Cookbook sre.hosts.decommission for hosts wtp1035.eqiad.wmnet
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Will do maint later', diff saved to https://phabricator.wikimedia.org/P34304 and previous config saved to /var/cache/conftool/dbconfig/20220908-133514-ladsgroup.json
  • 13:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wtp1034.eqiad.wmnet
  • 13:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 10%: Maint over long time ago', diff saved to https://phabricator.wikimedia.org/P34303 and previous config saved to /var/cache/conftool/dbconfig/20220908-133045-ladsgroup.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34302 and previous config saved to /var/cache/conftool/dbconfig/20220908-133036-root.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34301 and previous config saved to /var/cache/conftool/dbconfig/20220908-133031-root.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34300 and previous config saved to /var/cache/conftool/dbconfig/20220908-133024-root.json
  • 13:29 moritzm: installing apache2 security updates on Bullseye
  • 13:28 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 13:23 cgoubert@cumin1001: START - Cookbook sre.hosts.decommission for hosts wtp1034.eqiad.wmnet
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34299 and previous config saved to /var/cache/conftool/dbconfig/20220908-131531-root.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34298 and previous config saved to /var/cache/conftool/dbconfig/20220908-131526-root.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34297 and previous config saved to /var/cache/conftool/dbconfig/20220908-131519-root.json
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34296 and previous config saved to /var/cache/conftool/dbconfig/20220908-130026-root.json
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34295 and previous config saved to /var/cache/conftool/dbconfig/20220908-130021-root.json
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34294 and previous config saved to /var/cache/conftool/dbconfig/20220908-130014-root.json
  • 12:56 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9e4ed94]: (no justification provided) (duration: 00m 09s)
  • 12:56 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9e4ed94]: (no justification provided)
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34293 and previous config saved to /var/cache/conftool/dbconfig/20220908-124521-root.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34292 and previous config saved to /var/cache/conftool/dbconfig/20220908-124516-root.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34291 and previous config saved to /var/cache/conftool/dbconfig/20220908-124509-root.json
  • 12:42 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9e4ed94]: (no justification provided) (duration: 00m 09s)
  • 12:42 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9e4ed94]: (no justification provided)
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1027 to es1 eqiad master, promote es1026 to es2 eqiad master, promote es1028 to es3 eqiad master', diff saved to https://phabricator.wikimedia.org/P34290 and previous config saved to /var/cache/conftool/dbconfig/20220908-123955-marostegui.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34289 and previous config saved to /var/cache/conftool/dbconfig/20220908-123016-root.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34288 and previous config saved to /var/cache/conftool/dbconfig/20220908-123011-root.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34287 and previous config saved to /var/cache/conftool/dbconfig/20220908-123004-root.json
  • 12:26 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1033.eqiad.wmnet
  • 12:26 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1032.eqiad.wmnet
  • 12:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1032-1033].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 12:26 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1032-1033].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 12:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1032-1033].mgmt with reason: Downtiming replaced wtp servers
  • 12:25 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1032-1033].mgmt with reason: Downtiming replaced wtp servers
  • 12:17 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:15 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34286 and previous config saved to /var/cache/conftool/dbconfig/20220908-121511-root.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34285 and previous config saved to /var/cache/conftool/dbconfig/20220908-121506-root.json
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34284 and previous config saved to /var/cache/conftool/dbconfig/20220908-121459-root.json
  • 12:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:11 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:09 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:06 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029 es1030 es1031 for upgrade', diff saved to https://phabricator.wikimedia.org/P34283 and previous config saved to /var/cache/conftool/dbconfig/20220908-120528-root.json
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34282 and previous config saved to /var/cache/conftool/dbconfig/20220908-120439-root.json
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34281 and previous config saved to /var/cache/conftool/dbconfig/20220908-120435-root.json
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34280 and previous config saved to /var/cache/conftool/dbconfig/20220908-120427-root.json
  • 12:03 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34278 and previous config saved to /var/cache/conftool/dbconfig/20220908-115407-root.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34277 and previous config saved to /var/cache/conftool/dbconfig/20220908-115401-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34276 and previous config saved to /var/cache/conftool/dbconfig/20220908-115355-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34275 and previous config saved to /var/cache/conftool/dbconfig/20220908-115351-root.json
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34274 and previous config saved to /var/cache/conftool/dbconfig/20220908-114934-root.json
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34273 and previous config saved to /var/cache/conftool/dbconfig/20220908-114930-root.json
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34272 and previous config saved to /var/cache/conftool/dbconfig/20220908-114922-root.json
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34271 and previous config saved to /var/cache/conftool/dbconfig/20220908-113902-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34270 and previous config saved to /var/cache/conftool/dbconfig/20220908-113856-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34269 and previous config saved to /var/cache/conftool/dbconfig/20220908-113850-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34268 and previous config saved to /var/cache/conftool/dbconfig/20220908-113846-root.json
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34267 and previous config saved to /var/cache/conftool/dbconfig/20220908-113429-root.json
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34266 and previous config saved to /var/cache/conftool/dbconfig/20220908-113425-root.json
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34265 and previous config saved to /var/cache/conftool/dbconfig/20220908-113417-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34264 and previous config saved to /var/cache/conftool/dbconfig/20220908-112357-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34263 and previous config saved to /var/cache/conftool/dbconfig/20220908-112351-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34262 and previous config saved to /var/cache/conftool/dbconfig/20220908-112345-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34261 and previous config saved to /var/cache/conftool/dbconfig/20220908-112341-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34260 and previous config saved to /var/cache/conftool/dbconfig/20220908-112329-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34259 and previous config saved to /var/cache/conftool/dbconfig/20220908-112324-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34258 and previous config saved to /var/cache/conftool/dbconfig/20220908-112319-root.json
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34257 and previous config saved to /var/cache/conftool/dbconfig/20220908-111924-root.json
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34256 and previous config saved to /var/cache/conftool/dbconfig/20220908-111920-root.json
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34255 and previous config saved to /var/cache/conftool/dbconfig/20220908-111912-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34254 and previous config saved to /var/cache/conftool/dbconfig/20220908-110852-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34253 and previous config saved to /var/cache/conftool/dbconfig/20220908-110846-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34252 and previous config saved to /var/cache/conftool/dbconfig/20220908-110840-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34251 and previous config saved to /var/cache/conftool/dbconfig/20220908-110836-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34250 and previous config saved to /var/cache/conftool/dbconfig/20220908-110825-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34249 and previous config saved to /var/cache/conftool/dbconfig/20220908-110819-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34248 and previous config saved to /var/cache/conftool/dbconfig/20220908-110814-root.json
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34247 and previous config saved to /var/cache/conftool/dbconfig/20220908-110419-root.json
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34246 and previous config saved to /var/cache/conftool/dbconfig/20220908-110415-root.json
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34245 and previous config saved to /var/cache/conftool/dbconfig/20220908-110407-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34244 and previous config saved to /var/cache/conftool/dbconfig/20220908-105347-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34243 and previous config saved to /var/cache/conftool/dbconfig/20220908-105341-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34242 and previous config saved to /var/cache/conftool/dbconfig/20220908-105335-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34241 and previous config saved to /var/cache/conftool/dbconfig/20220908-105331-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34240 and previous config saved to /var/cache/conftool/dbconfig/20220908-105320-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34239 and previous config saved to /var/cache/conftool/dbconfig/20220908-105314-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34238 and previous config saved to /var/cache/conftool/dbconfig/20220908-105309-root.json
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34237 and previous config saved to /var/cache/conftool/dbconfig/20220908-104914-root.json
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34236 and previous config saved to /var/cache/conftool/dbconfig/20220908-104910-root.json
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34235 and previous config saved to /var/cache/conftool/dbconfig/20220908-104902-root.json
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1027 es1026 es1028 for upgrade', diff saved to https://phabricator.wikimedia.org/P34234 and previous config saved to /var/cache/conftool/dbconfig/20220908-104152-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34233 and previous config saved to /var/cache/conftool/dbconfig/20220908-103842-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34232 and previous config saved to /var/cache/conftool/dbconfig/20220908-103836-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34231 and previous config saved to /var/cache/conftool/dbconfig/20220908-103830-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34230 and previous config saved to /var/cache/conftool/dbconfig/20220908-103826-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34229 and previous config saved to /var/cache/conftool/dbconfig/20220908-103815-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34228 and previous config saved to /var/cache/conftool/dbconfig/20220908-103809-root.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34227 and previous config saved to /var/cache/conftool/dbconfig/20220908-103804-root.json
  • 10:31 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 10:30 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 10:29 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 10:29 marostegui@cumin2002: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34226 and previous config saved to /var/cache/conftool/dbconfig/20220908-102859-root.json
  • 10:28 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 10:27 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 10:26 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:23 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 10:23 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34225 and previous config saved to /var/cache/conftool/dbconfig/20220908-102310-root.json
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34224 and previous config saved to /var/cache/conftool/dbconfig/20220908-102304-root.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34223 and previous config saved to /var/cache/conftool/dbconfig/20220908-102259-root.json
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022, es1025, es2022, es2025 for upgrade', diff saved to https://phabricator.wikimedia.org/P34222 and previous config saved to /var/cache/conftool/dbconfig/20220908-102040-root.json
  • 10:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 15 hosts with reason: Downtiming replaced wtp servers
  • 10:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 15 hosts with reason: Downtiming replaced wtp servers
  • 10:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 7 hosts with reason: Downtiming replaced wtp servers
  • 10:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 7 hosts with reason: Downtiming replaced wtp servers
  • 10:13 marostegui@cumin2002: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34221 and previous config saved to /var/cache/conftool/dbconfig/20220908-101329-root.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34220 and previous config saved to /var/cache/conftool/dbconfig/20220908-100805-root.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34219 and previous config saved to /var/cache/conftool/dbconfig/20220908-100759-root.json
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34218 and previous config saved to /var/cache/conftool/dbconfig/20220908-100754-root.json
  • 10:07 XioNoX: re-pool esams after routers upgrade - T295690
  • 10:06 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1029.eqiad.wmnet
  • 10:06 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1031.eqiad.wmnet
  • 10:06 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1030.eqiad.wmnet
  • 10:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1029-1031].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 10:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1029-1031].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 10:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es[2032-2034].codfw.wmnet with reason: Upgrade
  • 10:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on es[2032-2034].codfw.wmnet with reason: Upgrade
  • 10:01 claime: Serving 100% of parsoid traffic with php 7.4 T307219
  • 10:00 claime: depooled wtp1033.eqiad.wmnet from parsoid cluster T307219
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34217 and previous config saved to /var/cache/conftool/dbconfig/20220908-100028-root.json
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34216 and previous config saved to /var/cache/conftool/dbconfig/20220908-100027-root.json
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34215 and previous config saved to /var/cache/conftool/dbconfig/20220908-100025-root.json
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2032 es2033 es2034 for upgrade', diff saved to https://phabricator.wikimedia.org/P34214 and previous config saved to /var/cache/conftool/dbconfig/20220908-100014-root.json
  • 09:58 marostegui@cumin2002: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34213 and previous config saved to /var/cache/conftool/dbconfig/20220908-095759-root.json
  • 09:50 claime: pooled parse1024.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 09:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1024.eqiad.wmnet
  • 09:50 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1024.eqiad.wmnet
  • 09:47 claime: depooled wtp1032.eqiad.wmnet from parsoid cluster T307219
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34212 and previous config saved to /var/cache/conftool/dbconfig/20220908-094523-root.json
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34211 and previous config saved to /var/cache/conftool/dbconfig/20220908-094522-root.json
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34210 and previous config saved to /var/cache/conftool/dbconfig/20220908-094520-root.json
  • 09:42 marostegui@cumin2002: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34209 and previous config saved to /var/cache/conftool/dbconfig/20220908-094229-root.json
  • 09:38 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1024.eqiad.wmnet
  • 09:37 claime: pooled parse1023.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 09:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1023.eqiad.wmnet
  • 09:36 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1023.eqiad.wmnet
  • 09:35 XioNoX: drain draffic from cr3-knams - T295690
  • 09:33 claime: depooled wtp1031.eqiad.wmnet from parsoid cluster T307219
  • 09:31 vgutierrez: rolling restart of purged - T317064
  • 09:31 ayounsi@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr3-knams,cr3-knams IPv6 with reason: router upgrade
  • 09:31 ayounsi@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr3-knams,cr3-knams IPv6 with reason: router upgrade
  • 09:31 vgutierrez: upload purged 0.18 to apt.wm.o (buster) - T317064
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34208 and previous config saved to /var/cache/conftool/dbconfig/20220908-093018-root.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34207 and previous config saved to /var/cache/conftool/dbconfig/20220908-093017-root.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34206 and previous config saved to /var/cache/conftool/dbconfig/20220908-093015-root.json
  • 09:27 marostegui@cumin2002: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34205 and previous config saved to /var/cache/conftool/dbconfig/20220908-092700-root.json
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2027 to es3 codfw master', diff saved to https://phabricator.wikimedia.org/P34204 and previous config saved to /var/cache/conftool/dbconfig/20220908-092436-marostegui.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2026 to es2 codfw master', diff saved to https://phabricator.wikimedia.org/P34203 and previous config saved to /var/cache/conftool/dbconfig/20220908-092346-marostegui.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2028 to es1 codfw master', diff saved to https://phabricator.wikimedia.org/P34202 and previous config saved to /var/cache/conftool/dbconfig/20220908-092301-marostegui.json
  • 09:21 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1023.eqiad.wmnet
  • 09:21 claime: pooled parse1022.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 09:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1022.eqiad.wmnet
  • 09:19 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1022.eqiad.wmnet
  • 09:18 vgutierrez: testing purged 0.18 in cp4026 and cp4032
  • 09:18 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1025.eqiad.wmnet
  • 09:17 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1026.eqiad.wmnet
  • 09:17 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=wtp1029.eqiad.wmnet
  • 09:16 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1029.eqiad.wmnet
  • 09:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1025-1028].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 09:16 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1028.eqiad.wmnet
  • 09:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1025-1028].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34201 and previous config saved to /var/cache/conftool/dbconfig/20220908-091513-root.json
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34200 and previous config saved to /var/cache/conftool/dbconfig/20220908-091512-root.json
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34199 and previous config saved to /var/cache/conftool/dbconfig/20220908-091510-root.json
  • 09:14 claime: depooled wtp1030.eqiad.wmnet from parsoid cluster T307219
  • 09:12 marostegui@cumin2002: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34198 and previous config saved to /var/cache/conftool/dbconfig/20220908-091200-root.json
  • 09:11 marostegui@cumin2002: dbctl commit (dc=all): 'es2028 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34197 and previous config saved to /var/cache/conftool/dbconfig/20220908-091157-root.json
  • 09:11 marostegui@cumin2002: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34196 and previous config saved to /var/cache/conftool/dbconfig/20220908-091151-root.json
  • 09:11 marostegui@cumin2002: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34195 and previous config saved to /var/cache/conftool/dbconfig/20220908-091129-root.json
  • 09:10 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:09 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:08 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:07 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:05 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1022.eqiad.wmnet
  • 09:04 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:04 claime: pooled parse1021.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 09:03 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:03 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1021.eqiad.wmnet
  • 09:03 cgoubert@cumin2002: START - Cookbook sre.hosts.remove-downtime for parse1021.eqiad.wmnet
  • 09:02 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34194 and previous config saved to /var/cache/conftool/dbconfig/20220908-090008-root.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34193 and previous config saved to /var/cache/conftool/dbconfig/20220908-090007-root.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34192 and previous config saved to /var/cache/conftool/dbconfig/20220908-090005-root.json
  • 08:56 marostegui@cumin2002: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34190 and previous config saved to /var/cache/conftool/dbconfig/20220908-085630-root.json
  • 08:56 marostegui@cumin2002: dbctl commit (dc=all): 'es2028 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34189 and previous config saved to /var/cache/conftool/dbconfig/20220908-085627-root.json
  • 08:56 marostegui@cumin2002: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34188 and previous config saved to /var/cache/conftool/dbconfig/20220908-085621-root.json
  • 08:56 marostegui@cumin2002: dbctl commit (dc=all): 'db1127 (re)pooling @ 4%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34187 and previous config saved to /var/cache/conftool/dbconfig/20220908-085559-root.json
  • 08:55 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1021.eqiad.wmnet
  • 08:53 claime: depooled wtp1029.eqiad.wmnet from parsoid cluster T307219
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1011.eqiad.wmnet
  • 08:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1011.eqiad.wmnet
  • 08:45 ayounsi@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-esams,cr2-esams IPv6,re0.cr2-esams.mgmt with reason: router upgrade
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34186 and previous config saved to /var/cache/conftool/dbconfig/20220908-084503-root.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34185 and previous config saved to /var/cache/conftool/dbconfig/20220908-084502-root.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34184 and previous config saved to /var/cache/conftool/dbconfig/20220908-084500-root.json
  • 08:44 ayounsi@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-esams,cr2-esams IPv6,re0.cr2-esams.mgmt with reason: router upgrade
  • 08:44 XioNoX: reverting cr3-esams changes (JTAC will be needed for a firmware upgrade), and moving on to cr2-esams - T295690
  • 08:41 marostegui@cumin2002: dbctl commit (dc=all): 'db1203 (re)pooling @ 100%: Pooling for the first time in s8', diff saved to https://phabricator.wikimedia.org/P34183 and previous config saved to /var/cache/conftool/dbconfig/20220908-084133-root.json
  • 08:41 claime: pooled parse1020.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 08:41 marostegui@cumin2002: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34182 and previous config saved to /var/cache/conftool/dbconfig/20220908-084059-root.json
  • 08:40 marostegui@cumin2002: dbctl commit (dc=all): 'es2028 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34181 and previous config saved to /var/cache/conftool/dbconfig/20220908-084057-root.json
  • 08:40 marostegui@cumin2002: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34180 and previous config saved to /var/cache/conftool/dbconfig/20220908-084051-root.json
  • 08:40 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1020.eqiad.wmnet
  • 08:40 cgoubert@cumin2002: START - Cookbook sre.hosts.remove-downtime for parse1020.eqiad.wmnet
  • 08:40 marostegui@cumin2002: dbctl commit (dc=all): 'db1127 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34179 and previous config saved to /var/cache/conftool/dbconfig/20220908-084029-root.json
  • 08:40 claime: depooled wtp1028.eqiad.wmnet from parsoid cluster T307219
  • 08:39 marostegui@cumin2002: dbctl commit (dc=all): 'Depool es2029, es2030, es2031', diff saved to https://phabricator.wikimedia.org/P34178 and previous config saved to /var/cache/conftool/dbconfig/20220908-083941-marostegui.json
  • 08:31 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1020.eqiad.wmnet
  • 08:26 marostegui@cumin2002: dbctl commit (dc=all): 'db1203 (re)pooling @ 75%: Pooling for the first time in s8', diff saved to https://phabricator.wikimedia.org/P34176 and previous config saved to /var/cache/conftool/dbconfig/20220908-082604-root.json
  • 08:25 marostegui@cumin2002: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34175 and previous config saved to /var/cache/conftool/dbconfig/20220908-082530-root.json
  • 08:25 marostegui@cumin2002: dbctl commit (dc=all): 'es2028 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34174 and previous config saved to /var/cache/conftool/dbconfig/20220908-082528-root.json
  • 08:25 marostegui@cumin2002: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34173 and previous config saved to /var/cache/conftool/dbconfig/20220908-082521-root.json
  • 08:25 marostegui@cumin2002: dbctl commit (dc=all): 'db1127 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34172 and previous config saved to /var/cache/conftool/dbconfig/20220908-082500-root.json
  • 08:24 claime: pooled parse1019.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 08:22 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1019.eqiad.wmnet
  • 08:22 cgoubert@cumin2002: START - Cookbook sre.hosts.remove-downtime for parse1019.eqiad.wmnet
  • 08:12 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin1001.eqiad.wmnet
  • 08:10 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:10 ayounsi@cumin2002: START - Cookbook sre.network.cf
  • 08:10 marostegui@cumin2002: dbctl commit (dc=all): 'db1203 (re)pooling @ 50%: Pooling for the first time in s8', diff saved to https://phabricator.wikimedia.org/P34171 and previous config saved to /var/cache/conftool/dbconfig/20220908-081034-root.json
  • 08:09 marostegui@cumin2002: dbctl commit (dc=all): 'es2028 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34170 and previous config saved to /var/cache/conftool/dbconfig/20220908-080958-root.json
  • 08:09 marostegui@cumin2002: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34169 and previous config saved to /var/cache/conftool/dbconfig/20220908-080951-root.json
  • 08:09 marostegui@cumin2002: dbctl commit (dc=all): 'es2026 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34168 and previous config saved to /var/cache/conftool/dbconfig/20220908-080946-root.json
  • 08:09 ayounsi@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr3-esams,cr3-esams IPv6,re0.cr3-esams.mgmt with reason: router upgrade
  • 08:09 ayounsi@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr3-esams,cr3-esams IPv6,re0.cr3-esams.mgmt with reason: router upgrade
  • 08:08 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1019.eqiad.wmnet
  • 08:08 marostegui@cumin2002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: Pooling for the first time in s7', diff saved to https://phabricator.wikimedia.org/P34167 and previous config saved to /var/cache/conftool/dbconfig/20220908-080823-root.json
  • 08:07 XioNoX: drain draffic from cr3-esams - T295690
  • 08:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
  • 07:55 marostegui@cumin2002: dbctl commit (dc=all): 'db1203 (re)pooling @ 25%: Pooling for the first time in s8', diff saved to https://phabricator.wikimedia.org/P34166 and previous config saved to /var/cache/conftool/dbconfig/20220908-075504-root.json
  • 07:54 marostegui@cumin2002: dbctl commit (dc=all): 'es2028 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34165 and previous config saved to /var/cache/conftool/dbconfig/20220908-075429-root.json
  • 07:54 marostegui@cumin2002: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34164 and previous config saved to /var/cache/conftool/dbconfig/20220908-075421-root.json
  • 07:54 marostegui@cumin2002: dbctl commit (dc=all): 'es2026 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34163 and previous config saved to /var/cache/conftool/dbconfig/20220908-075416-root.json
  • 07:52 marostegui@cumin2002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: Pooling for the first time in s7', diff saved to https://phabricator.wikimedia.org/P34162 and previous config saved to /var/cache/conftool/dbconfig/20220908-075253-root.json
  • 07:41 XioNoX: depool esams for routers upgrade - T295690
  • 07:39 marostegui@cumin2002: dbctl commit (dc=all): 'db1203 (re)pooling @ 10%: Pooling for the first time in s8', diff saved to https://phabricator.wikimedia.org/P34161 and previous config saved to /var/cache/conftool/dbconfig/20220908-073935-root.json
  • 07:39 marostegui@cumin2002: dbctl commit (dc=all): 'es2028 (re)pooling @ 4%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34160 and previous config saved to /var/cache/conftool/dbconfig/20220908-073900-root.json
  • 07:38 marostegui@cumin2002: dbctl commit (dc=all): 'es2027 (re)pooling @ 4%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34159 and previous config saved to /var/cache/conftool/dbconfig/20220908-073851-root.json
  • 07:38 marostegui@cumin2002: dbctl commit (dc=all): 'es2026 (re)pooling @ 4%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34158 and previous config saved to /var/cache/conftool/dbconfig/20220908-073846-root.json
  • 07:37 marostegui@cumin2002: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: Pooling for the first time in s7', diff saved to https://phabricator.wikimedia.org/P34157 and previous config saved to /var/cache/conftool/dbconfig/20220908-073724-root.json
  • 07:24 marostegui@cumin2002: dbctl commit (dc=all): 'db1203 (re)pooling @ 5%: Pooling for the first time in s8', diff saved to https://phabricator.wikimedia.org/P34156 and previous config saved to /var/cache/conftool/dbconfig/20220908-072405-root.json
  • 07:23 marostegui@cumin2002: dbctl commit (dc=all): 'es2028 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34155 and previous config saved to /var/cache/conftool/dbconfig/20220908-072330-root.json
  • 07:23 marostegui@cumin2002: dbctl commit (dc=all): 'es2027 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34154 and previous config saved to /var/cache/conftool/dbconfig/20220908-072321-root.json
  • 07:23 marostegui@cumin2002: dbctl commit (dc=all): 'es2026 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34153 and previous config saved to /var/cache/conftool/dbconfig/20220908-072316-root.json
  • 07:21 marostegui@cumin2002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: Pooling for the first time in s7', diff saved to https://phabricator.wikimedia.org/P34152 and previous config saved to /var/cache/conftool/dbconfig/20220908-072154-root.json
  • 07:08 marostegui@cumin2002: dbctl commit (dc=all): 'db1203 (re)pooling @ 4%: Pooling for the first time in s8', diff saved to https://phabricator.wikimedia.org/P34151 and previous config saved to /var/cache/conftool/dbconfig/20220908-070836-root.json
  • 07:08 marostegui@cumin2002: dbctl commit (dc=all): 'es2028 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34150 and previous config saved to /var/cache/conftool/dbconfig/20220908-070800-root.json
  • 07:07 marostegui@cumin2002: dbctl commit (dc=all): 'es2027 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34149 and previous config saved to /var/cache/conftool/dbconfig/20220908-070752-root.json
  • 07:07 marostegui@cumin2002: dbctl commit (dc=all): 'es2026 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34148 and previous config saved to /var/cache/conftool/dbconfig/20220908-070746-root.json
  • 07:06 marostegui@cumin2002: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: Pooling for the first time in s7', diff saved to https://phabricator.wikimedia.org/P34147 and previous config saved to /var/cache/conftool/dbconfig/20220908-070625-root.json
  • 07:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:01 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:00 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:00 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:00 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 06:53 marostegui@cumin2002: dbctl commit (dc=all): 'db1203 (re)pooling @ 3%: Pooling for the first time in s8', diff saved to https://phabricator.wikimedia.org/P34146 and previous config saved to /var/cache/conftool/dbconfig/20220908-065306-root.json
  • 06:52 marostegui@cumin2002: dbctl commit (dc=all): 'es2028 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34145 and previous config saved to /var/cache/conftool/dbconfig/20220908-065229-root.json
  • 06:52 marostegui@cumin2002: dbctl commit (dc=all): 'es2027 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34144 and previous config saved to /var/cache/conftool/dbconfig/20220908-065222-root.json
  • 06:52 marostegui@cumin2002: dbctl commit (dc=all): 'es2026 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34143 and previous config saved to /var/cache/conftool/dbconfig/20220908-065216-root.json
  • 06:50 marostegui@cumin2002: dbctl commit (dc=all): 'db1202 (re)pooling @ 5%: Pooling for the first time in s7', diff saved to https://phabricator.wikimedia.org/P34142 and previous config saved to /var/cache/conftool/dbconfig/20220908-065054-root.json
  • 06:44 marostegui@cumin2002: dbctl commit (dc=all): 'Depool es2026, es2027, es2028', diff saved to https://phabricator.wikimedia.org/P34141 and previous config saved to /var/cache/conftool/dbconfig/20220908-064450-marostegui.json
  • 06:37 marostegui@cumin2002: dbctl commit (dc=all): 'db1203 (re)pooling @ 2%: Pooling for the first time in s8', diff saved to https://phabricator.wikimedia.org/P34140 and previous config saved to /var/cache/conftool/dbconfig/20220908-063737-root.json
  • 06:35 marostegui@cumin2002: dbctl commit (dc=all): 'db1202 (re)pooling @ 4%: Pooling for the first time in s7', diff saved to https://phabricator.wikimedia.org/P34139 and previous config saved to /var/cache/conftool/dbconfig/20220908-063525-root.json
  • 06:19 marostegui@cumin2002: dbctl commit (dc=all): 'db1202 (re)pooling @ 3%: Pooling for the first time in s7', diff saved to https://phabricator.wikimedia.org/P34138 and previous config saved to /var/cache/conftool/dbconfig/20220908-061955-root.json
  • 06:14 marostegui@cumin2002: dbctl commit (dc=all): 'Add db1203 to s8, depooled, T316342', diff saved to https://phabricator.wikimedia.org/P34137 and previous config saved to /var/cache/conftool/dbconfig/20220908-061413-marostegui.json
  • 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1157 T316622', diff saved to https://phabricator.wikimedia.org/P34136 and previous config saved to /var/cache/conftool/dbconfig/20220908-060438-ladsgroup.json
  • 06:04 marostegui@cumin2002: dbctl commit (dc=all): 'db1202 (re)pooling @ 2%: Pooling for the first time in s7', diff saved to https://phabricator.wikimedia.org/P34135 and previous config saved to /var/cache/conftool/dbconfig/20220908-060426-root.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1123 to s3 primary and set section read-write T316622', diff saved to https://phabricator.wikimedia.org/P34134 and previous config saved to /var/cache/conftool/dbconfig/20220908-060138-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T316622', diff saved to https://phabricator.wikimedia.org/P34133 and previous config saved to /var/cache/conftool/dbconfig/20220908-060110-ladsgroup.json
  • 06:00 Amir1: Starting s3 eqiad failover from db1157 to db1123 - T316622
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1194', diff saved to https://phabricator.wikimedia.org/P34132 and previous config saved to /var/cache/conftool/dbconfig/20220908-055546-marostegui.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pooling db1202 for the first time in s7 T316342', diff saved to https://phabricator.wikimedia.org/P34131 and previous config saved to /var/cache/conftool/dbconfig/20220908-055451-marostegui.json
  • 05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Pooling back db2140', diff saved to https://phabricator.wikimedia.org/P34130 and previous config saved to /var/cache/conftool/dbconfig/20220908-054921-ladsgroup.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1202 to s7, depooled, T316342', diff saved to https://phabricator.wikimedia.org/P34129 and previous config saved to /var/cache/conftool/dbconfig/20220908-054429-marostegui.json
  • 05:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1123 with weight 0 T316622', diff saved to https://phabricator.wikimedia.org/P34128 and previous config saved to /var/cache/conftool/dbconfig/20220908-051043-ladsgroup.json
  • 05:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 T316622
  • 05:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s3 T316622
  • 02:04 pt1979@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1005:
  • 02:04 pt1979@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1005:
  • 02:02 pt1979@cumin1001: END (ERROR) - Cookbook sre.network.configure-switch-interfaces (exit_code=97) for host db2169
  • 02:02 pt1979@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db2169
  • 02:01 pt1979@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:59 pt1979@cumin1001: START - Cookbook sre.dns.netbox
  • 01:56 ejegg: re-enabled recurring charge job
  • 01:48 ejegg: updated fundraising civicrm from c1f0e041 to efbbcb57
  • 01:26 ejegg: disabled recurring charge job
  • 00:33 ejegg: updated standalone Smashpig from 11ba0a1b to 88e5e9bb

2022-09-07

  • 22:12 bd808: Attempting to migrate all remaining Striker managed git repos from Diffusion to GitLab (T315706)
  • 21:27 TheresNoTime: closing UTC late backport window, +27m
  • 21:26 samtar@deploy1002: Finished scap: Backport for Respect skin's TOC option (T316947) (duration: 08m 02s)
  • 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:19 samtar@deploy1002: samtar and jdlrobson: Backport for Respect skin's TOC option (T316947) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:18 samtar@deploy1002: Started scap: Backport for Respect skin's TOC option (T316947)
  • 21:14 samtar@deploy1002: Finished scap: Backport for Respect skin's TOC option (T316947) (duration: 07m 06s)
  • 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:07 samtar@deploy1002: samtar and jdlrobson: Backport for Respect skin's TOC option (T316947) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:07 samtar@deploy1002: Started scap: Backport for Respect skin's TOC option (T316947)
  • 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:02 samtar@deploy1002: Finished scap: Backport for beta: Remove deployment-parsoid11 from wgLinterSubmitterWhitelist (duration: 04m 36s)
  • 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:01 TheresNoTime: extending UTC late backport window
  • 20:58 samtar@deploy1002: samtar and zabe: Backport for beta: Remove deployment-parsoid11 from wgLinterSubmitterWhitelist synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:58 samtar@deploy1002: Started scap: Backport for beta: Remove deployment-parsoid11 from wgLinterSubmitterWhitelist
  • 20:56 samtar@deploy1002: Finished scap: Backport for Enable Extension:Nearby on wikidata (T246493) (duration: 05m 54s)
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:51 samtar@deploy1002: samtar and jdlrobson: Backport for Enable Extension:Nearby on wikidata (T246493) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:50 samtar@deploy1002: Started scap: Backport for Enable Extension:Nearby on wikidata (T246493)
  • 20:49 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Wikidata has a wordmark (T315572) (duration: 03m 44s)
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:44 samtar@deploy1002: Synchronized static/images/mobile/copyright/wikidata-en.svg: Config: Wikidata has a wordmark (T315572) (duration: 03m 45s)
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:35 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Enable wgDiscussionToolsEnablePermalinksBackend on all wikis" (duration: 03m 42s)
  • 20:21 samtar@deploy1002: Finished scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on all wikis (T315353) (duration: 06m 57s)
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:14 samtar@deploy1002: samtar and matmarex: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on all wikis (T315353) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:14 samtar@deploy1002: Started scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on all wikis (T315353)
  • 20:12 mutante: pcc-worker1003 - rm of /srv/jenkins/puppet-compiler/output/36713 and 37153 - /srv is back to 58% usage again
  • 20:10 mutante: integration.wikimedia.org - clicked to delete builds 36713 and 37153 because they were several GB in size
  • 20:08 TheresNoTime: running `extensions/WikimediaMaintenance/createExtensionTables.php discussiontools` on mwmaint1002
  • 20:08 mutante: puppet compiler out of disk space, (pcc-worker1003): identified build 37153 as huge compared to others in the filesystem, then clicked to delete it via integration.wm.org web UI
  • 19:26 herron@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 18:34 dduvall@deploy1002: Finished deploy [phabricator/deployment@a7616e6]: testing deployment to phab2001 (inactive) (duration: 00m 35s)
  • 18:33 dduvall@deploy1002: Started deploy [phabricator/deployment@a7616e6]: testing deployment to phab2001 (inactive)
  • 18:26 herron@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 18:25 herron@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 17:24 herron@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 16:56 ejegg: restarted fundraising scheduled jobs
  • 16:34 ejegg: fundraising civicrm upgraded from 2fcd3bb4 to c1f0e041
  • 16:32 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@9e4ed94]: Update platform_eng Airflow to latest (duration: 00m 10s)
  • 16:31 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@9e4ed94]: Update platform_eng Airflow to latest
  • 16:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 16:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 16:22 moritzm: installing twisted security updates on bullseye
  • 16:22 cparle@deploy1002: Finished deploy [airflow-dags/platform_eng@9e4ed94]: (no justification provided) (duration: 00m 17s)
  • 16:21 cparle@deploy1002: Started deploy [airflow-dags/platform_eng@9e4ed94]: (no justification provided)
  • 16:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 16:00 ejegg: fundraising civicrm upgraded from 5aa1309d to 2fcd3bb4
  • 15:55 ejegg: fundraising scheduled jobs disabled for deployment
  • 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Pooling after maintenance', diff saved to https://phabricator.wikimedia.org/P34125 and previous config saved to /var/cache/conftool/dbconfig/20220907-153827-root.json
  • 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Pooling after maintenance', diff saved to https://phabricator.wikimedia.org/P34124 and previous config saved to /var/cache/conftool/dbconfig/20220907-152322-root.json
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Pooling after maintenance', diff saved to https://phabricator.wikimedia.org/P34122 and previous config saved to /var/cache/conftool/dbconfig/20220907-152028-root.json
  • 15:11 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1027.eqiad.wmnet
  • 15:10 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1048.eqiad.wmnet
  • 15:10 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1047.eqiad.wmnet
  • 15:10 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1027,1047-1048].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 15:10 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1027,1047-1048].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Pooling after maintenance', diff saved to https://phabricator.wikimedia.org/P34121 and previous config saved to /var/cache/conftool/dbconfig/20220907-150817-root.json
  • 15:07 claime: depooled wtp1026.eqiad.wmnet from parsoid cluster T307219
  • 15:07 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 15:06 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Pooling after maintenance', diff saved to https://phabricator.wikimedia.org/P34120 and previous config saved to /var/cache/conftool/dbconfig/20220907-150523-root.json
  • 15:02 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 14:56 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Wikimania 2023 setup T316928 (duration: 04m 04s)
  • 14:54 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb1010.eqiad.wmnet with OS bullseye
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Pooling after maintenance', diff saved to https://phabricator.wikimedia.org/P34119 and previous config saved to /var/cache/conftool/dbconfig/20220907-145313-root.json
  • 14:52 claime: pooled parse1018.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 14:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1018.eqiad.wmnet
  • 14:50 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1018.eqiad.wmnet
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Pooling after maintenance', diff saved to https://phabricator.wikimedia.org/P34118 and previous config saved to /var/cache/conftool/dbconfig/20220907-145018-root.json
  • 14:48 claime: depooled wtp1025.eqiad.wmnet from parsoid cluster T307219
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P34117 and previous config saved to /var/cache/conftool/dbconfig/20220907-144434-ladsgroup.json
  • 14:41 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1018.eqiad.wmnet
  • 14:39 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1010.eqiad.wmnet with reason: host reimage
  • 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P34116 and previous config saved to /var/cache/conftool/dbconfig/20220907-143828-ladsgroup.json
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Pooling after maintenance', diff saved to https://phabricator.wikimedia.org/P34115 and previous config saved to /var/cache/conftool/dbconfig/20220907-143808-root.json
  • 14:37 claime: pooled parse1017.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 14:36 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin1001"
  • 14:35 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1010.eqiad.wmnet with reason: host reimage
  • 14:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1017.eqiad.wmnet
  • 14:35 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1017.eqiad.wmnet
  • 14:35 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Pooling after maintenance', diff saved to https://phabricator.wikimedia.org/P34114 and previous config saved to /var/cache/conftool/dbconfig/20220907-143513-root.json
  • 14:32 claime: parsoid eqiad canaries switched to parse1001 and parse1002 T307219
  • 14:29 moritzm: installing runc security updates on k8s servers
  • 14:23 cgoubert@puppetmaster1001: conftool action : set/weight=1; selector: dc=eqiad,cluster=parsoid,service=canary
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P34113 and previous config saved to /var/cache/conftool/dbconfig/20220907-142321-ladsgroup.json
  • 14:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:23 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1010.eqiad.wmnet with OS bullseye
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 5%: Pooling after maintenance', diff saved to https://phabricator.wikimedia.org/P34112 and previous config saved to /var/cache/conftool/dbconfig/20220907-142303-root.json
  • 14:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:20 claime: Switching canaries for parsoid eqiad T307219
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: Pooling after maintenance', diff saved to https://phabricator.wikimedia.org/P34111 and previous config saved to /var/cache/conftool/dbconfig/20220907-142008-root.json
  • 14:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb1009.eqiad.wmnet with OS bullseye
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T312863)', diff saved to https://phabricator.wikimedia.org/P34110 and previous config saved to /var/cache/conftool/dbconfig/20220907-140813-ladsgroup.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 1%: Pooling after maintenance', diff saved to https://phabricator.wikimedia.org/P34109 and previous config saved to /var/cache/conftool/dbconfig/20220907-140758-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: Pooling after maintenance', diff saved to https://phabricator.wikimedia.org/P34108 and previous config saved to /var/cache/conftool/dbconfig/20220907-140503-root.json
  • 14:02 claime: depooled wtp1027.eqiad.wmnet from parsoid cluster T307219
  • 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:57 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:57 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:56 TheresNoTime: UTC afternoon backport window closed
  • 13:55 samtar@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: CommonSettings-labs: Set config to production-esque values (T314294) (duration: 03m 47s)
  • 13:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maint
  • 13:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maint
  • 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:52 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1009.eqiad.wmnet with reason: host reimage
  • 13:49 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1009.eqiad.wmnet with reason: host reimage
  • 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:46 samtar@deploy1002: Finished scap: Backport for private/readme.php: Add $wgPhonosApiKeyGoogle (T315491) (duration: 04m 51s)
  • 13:42 samtar@deploy1002: samtar and samtar: Backport for private/readme.php: Add $wgPhonosApiKeyGoogle (T315491) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:42 samtar@deploy1002: Started scap: Backport for private/readme.php: Add $wgPhonosApiKeyGoogle (T315491)
  • 13:38 samtar@deploy1002: Synchronized php-1.39.0-wmf.27/extensions/GrowthExperiments/modules/ext.growthExperiments.MentorDashboard.Vue/components/MenteeOverview/MenteeFiltersForm.vue: Backport: Mentee overview(vue): prevent clicks on more recent edit buttons to submit the filters (T316926) (duration: 04m 07s)
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:36 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1009.eqiad.wmnet with OS bullseye
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Pooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34107 and previous config saved to /var/cache/conftool/dbconfig/20220907-131223-root.json
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Pooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34106 and previous config saved to /var/cache/conftool/dbconfig/20220907-125718-root.json
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2120 (re)pooling @ 100%: Pooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34105 and previous config saved to /var/cache/conftool/dbconfig/20220907-125706-root.json
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Pooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34104 and previous config saved to /var/cache/conftool/dbconfig/20220907-124213-root.json
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2120 (re)pooling @ 75%: Pooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34103 and previous config saved to /var/cache/conftool/dbconfig/20220907-124201-root.json
  • 12:31 jbond: re-enable puppet
  • 12:27 moritzm: installing runc security updates on codfw staging hosts
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Pooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34102 and previous config saved to /var/cache/conftool/dbconfig/20220907-122708-root.json
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2120 (re)pooling @ 50%: Pooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34101 and previous config saved to /var/cache/conftool/dbconfig/20220907-122656-root.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Pooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34100 and previous config saved to /var/cache/conftool/dbconfig/20220907-121204-root.json
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2120 (re)pooling @ 25%: Pooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34099 and previous config saved to /var/cache/conftool/dbconfig/20220907-121152-root.json
  • 12:08 jbond: disable puppet fleet wide to fix issues
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Pooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34098 and previous config saved to /var/cache/conftool/dbconfig/20220907-115659-root.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2120 (re)pooling @ 10%: Pooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34097 and previous config saved to /var/cache/conftool/dbconfig/20220907-115647-root.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: Pooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34096 and previous config saved to /var/cache/conftool/dbconfig/20220907-114154-root.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2120 (re)pooling @ 5%: Pooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34095 and previous config saved to /var/cache/conftool/dbconfig/20220907-114142-root.json
  • 11:34 jbond: change default puppet file permissions ro root:root
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34094 and previous config saved to /var/cache/conftool/dbconfig/20220907-111821-root.json
  • 11:05 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 11:04 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34093 and previous config saved to /var/cache/conftool/dbconfig/20220907-110316-root.json
  • 11:01 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:01 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 11:01 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 11:00 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 7 hosts with reason: Downtime pending inclusion in production
  • 11:00 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 7 hosts with reason: Downtime pending inclusion in production
  • 11:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 10:59 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 10:59 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 10:59 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 10:53 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1046.eqiad.wmnet
  • 10:53 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1045.eqiad.wmnet
  • 10:53 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1044.eqiad.wmnet
  • 10:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1044-1046].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 10:52 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1044-1046].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 10:48 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1017.eqiad.wmnet
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34092 and previous config saved to /var/cache/conftool/dbconfig/20220907-104811-root.json
  • 10:40 claime: pooled parse1016.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 10:39 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1016.eqiad.wmnet
  • 10:39 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1016.eqiad.wmnet
  • 10:36 claime: depooled wtp1048.eqiad.wmnet from parsoid cluster T307219
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34091 and previous config saved to /var/cache/conftool/dbconfig/20220907-103306-root.json
  • 10:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable sitelinks to redirects on testwikidatawiki (T316637) (duration: 03m 51s)
  • 10:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:27 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1016.eqiad.wmnet
  • 10:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:26 claime: pooled parse1015.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 10:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1015.eqiad.wmnet
  • 10:25 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1015.eqiad.wmnet
  • 10:21 claime: depooled wtp1047.eqiad.wmnet from parsoid cluster T307219
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34089 and previous config saved to /var/cache/conftool/dbconfig/20220907-101801-root.json
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34088 and previous config saved to /var/cache/conftool/dbconfig/20220907-101734-root.json
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2120 to clone db2122', diff saved to https://phabricator.wikimedia.org/P34086 and previous config saved to /var/cache/conftool/dbconfig/20220907-101258-root.json
  • 10:12 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1015.eqiad.wmnet
  • 10:12 claime: repooled parse1014.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 10:10 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1014.eqiad.wmnet
  • 10:05 claime: pooled parse1014.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 5%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34085 and previous config saved to /var/cache/conftool/dbconfig/20220907-100257-root.json
  • 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34084 and previous config saved to /var/cache/conftool/dbconfig/20220907-100229-root.json
  • 09:57 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1014.eqiad.wmnet
  • 09:57 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1014.eqiad.wmnet
  • 09:53 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1014.eqiad.wmnet
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1199 (re)pooling @ 100%: Pooling for the first time in s4', diff saved to https://phabricator.wikimedia.org/P34083 and previous config saved to /var/cache/conftool/dbconfig/20220907-094825-root.json
  • 09:48 topranks: Re-pooling eqsin for user traffic after successful core router upgrades - T295690
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 4%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34082 and previous config saved to /var/cache/conftool/dbconfig/20220907-094752-root.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34081 and previous config saved to /var/cache/conftool/dbconfig/20220907-094724-root.json
  • 09:44 claime: depooled wtp1046.eqiad.wmnet from parsoid cluster T307219
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34080 and previous config saved to /var/cache/conftool/dbconfig/20220907-093736-root.json
  • 09:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1013.eqiad.wmnet
  • 09:35 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1013.eqiad.wmnet
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1199 (re)pooling @ 75%: Pooling for the first time in s4', diff saved to https://phabricator.wikimedia.org/P34079 and previous config saved to /var/cache/conftool/dbconfig/20220907-093320-root.json
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 3%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34078 and previous config saved to /var/cache/conftool/dbconfig/20220907-093247-root.json
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34077 and previous config saved to /var/cache/conftool/dbconfig/20220907-093219-root.json
  • 09:31 claime: pooled parse1013.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 09:26 godog: restart swift-proxy and repool ms-fe1012
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34076 and previous config saved to /var/cache/conftool/dbconfig/20220907-092230-root.json
  • 09:20 topranks: rebooting cr3-eqsin to complete JunOS upgrade
  • 09:20 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr3-eqsin with reason: router upgrade
  • 09:19 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cr3-eqsin with reason: router upgrade
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1199 (re)pooling @ 50%: Pooling for the first time in s4', diff saved to https://phabricator.wikimedia.org/P34075 and previous config saved to /var/cache/conftool/dbconfig/20220907-091815-root.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 2%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34074 and previous config saved to /var/cache/conftool/dbconfig/20220907-091740-root.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34073 and previous config saved to /var/cache/conftool/dbconfig/20220907-091715-root.json
  • 09:10 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1013.eqiad.wmnet
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Pooling for the first time in s3', diff saved to https://phabricator.wikimedia.org/P34072 and previous config saved to /var/cache/conftool/dbconfig/20220907-090830-root.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34071 and previous config saved to /var/cache/conftool/dbconfig/20220907-090725-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1199 (re)pooling @ 25%: Pooling for the first time in s4', diff saved to https://phabricator.wikimedia.org/P34070 and previous config saved to /var/cache/conftool/dbconfig/20220907-090310-root.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34069 and previous config saved to /var/cache/conftool/dbconfig/20220907-090210-root.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34068 and previous config saved to /var/cache/conftool/dbconfig/20220907-085610-root.json
  • 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Pooling for the first time in s3', diff saved to https://phabricator.wikimedia.org/P34067 and previous config saved to /var/cache/conftool/dbconfig/20220907-085325-root.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34066 and previous config saved to /var/cache/conftool/dbconfig/20220907-085220-root.json
  • 08:51 topranks: rebooting cr2-eqsin to complete JunOS upgrade
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1199 (re)pooling @ 10%: Pooling for the first time in s4', diff saved to https://phabricator.wikimedia.org/P34065 and previous config saved to /var/cache/conftool/dbconfig/20220907-084805-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 4%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34064 and previous config saved to /var/cache/conftool/dbconfig/20220907-084705-root.json
  • 08:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Pooling db1201 for the first time in s6 T316342', diff saved to https://phabricator.wikimedia.org/P34063 and previous config saved to /var/cache/conftool/dbconfig/20220907-084454-marostegui.json
  • 08:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 (s2 master) from API', diff saved to https://phabricator.wikimedia.org/P34062 and previous config saved to /var/cache/conftool/dbconfig/20220907-084232-root.json
  • 08:42 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Move 50% of traffic to php 7.4 (T271736) (duration: 04m 00s)
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from x1 master', diff saved to https://phabricator.wikimedia.org/P34061 and previous config saved to /var/cache/conftool/dbconfig/20220907-084133-marostegui.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34060 and previous config saved to /var/cache/conftool/dbconfig/20220907-084105-root.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1201 to s6, depooled, T316342', diff saved to https://phabricator.wikimedia.org/P34059 and previous config saved to /var/cache/conftool/dbconfig/20220907-084057-marostegui.json
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 (s6 master) from API', diff saved to https://phabricator.wikimedia.org/P34058 and previous config saved to /var/cache/conftool/dbconfig/20220907-083958-root.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Pooling for the first time in s3', diff saved to https://phabricator.wikimedia.org/P34057 and previous config saved to /var/cache/conftool/dbconfig/20220907-083820-root.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34056 and previous config saved to /var/cache/conftool/dbconfig/20220907-083752-root.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34055 and previous config saved to /var/cache/conftool/dbconfig/20220907-083715-root.json
  • 08:37 cmooney@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:37 cmooney@cumin1001: START - Cookbook sre.network.cf
  • 08:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-eqsin with reason: router upgrade
  • 08:35 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqsin with reason: router upgrade
  • 08:35 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr2-eqsin.wikimedia.org with reason: router upgrade
  • 08:35 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqsin.wikimedia.org with reason: router upgrade
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1199 (re)pooling @ 5%: Pooling for the first time in s4', diff saved to https://phabricator.wikimedia.org/P34054 and previous config saved to /var/cache/conftool/dbconfig/20220907-083300-root.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 3%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34053 and previous config saved to /var/cache/conftool/dbconfig/20220907-083200-root.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34052 and previous config saved to /var/cache/conftool/dbconfig/20220907-082554-root.json
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Pooling for the first time in s3', diff saved to https://phabricator.wikimedia.org/P34051 and previous config saved to /var/cache/conftool/dbconfig/20220907-082315-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34050 and previous config saved to /var/cache/conftool/dbconfig/20220907-082247-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 5%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34049 and previous config saved to /var/cache/conftool/dbconfig/20220907-082210-root.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1197 (re)pooling @ 100%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34048 and previous config saved to /var/cache/conftool/dbconfig/20220907-081826-root.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1199 (re)pooling @ 4%: Pooling for the first time in s4', diff saved to https://phabricator.wikimedia.org/P34047 and previous config saved to /var/cache/conftool/dbconfig/20220907-081756-root.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 2%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34046 and previous config saved to /var/cache/conftool/dbconfig/20220907-081655-root.json
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34045 and previous config saved to /var/cache/conftool/dbconfig/20220907-081049-root.json
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Pooling db1200 for the first time in s5 T316342', diff saved to https://phabricator.wikimedia.org/P34044 and previous config saved to /var/cache/conftool/dbconfig/20220907-080825-marostegui.json
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Pooling for the first time in s3', diff saved to https://phabricator.wikimedia.org/P34043 and previous config saved to /var/cache/conftool/dbconfig/20220907-080810-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34042 and previous config saved to /var/cache/conftool/dbconfig/20220907-080742-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 4%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34041 and previous config saved to /var/cache/conftool/dbconfig/20220907-080705-root.json
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T312863)', diff saved to https://phabricator.wikimedia.org/P34040 and previous config saved to /var/cache/conftool/dbconfig/20220907-080449-ladsgroup.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34039 and previous config saved to /var/cache/conftool/dbconfig/20220907-080439-root.json
  • 08:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 08:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1197 (re)pooling @ 75%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34038 and previous config saved to /var/cache/conftool/dbconfig/20220907-080321-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1199 (re)pooling @ 3%: Pooling for the first time in s4', diff saved to https://phabricator.wikimedia.org/P34037 and previous config saved to /var/cache/conftool/dbconfig/20220907-080251-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1200 to s5, depooled, T316342', diff saved to https://phabricator.wikimedia.org/P34036 and previous config saved to /var/cache/conftool/dbconfig/20220907-075919-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34035 and previous config saved to /var/cache/conftool/dbconfig/20220907-075544-root.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Pooling for the first time in s3', diff saved to https://phabricator.wikimedia.org/P34034 and previous config saved to /var/cache/conftool/dbconfig/20220907-075305-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34033 and previous config saved to /var/cache/conftool/dbconfig/20220907-075237-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 3%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34032 and previous config saved to /var/cache/conftool/dbconfig/20220907-075200-root.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34031 and previous config saved to /var/cache/conftool/dbconfig/20220907-074935-root.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1197 (re)pooling @ 50%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34030 and previous config saved to /var/cache/conftool/dbconfig/20220907-074816-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1199 (re)pooling @ 2%: Pooling for the first time in s4', diff saved to https://phabricator.wikimedia.org/P34029 and previous config saved to /var/cache/conftool/dbconfig/20220907-074746-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34028 and previous config saved to /var/cache/conftool/dbconfig/20220907-074636-root.json
  • 07:46 topranks: Depool eqsin from user traffic in advance of core router upgrades - T295690
  • 07:44 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34027 and previous config saved to /var/cache/conftool/dbconfig/20220907-074039-root.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 4%: Pooling for the first time in s3', diff saved to https://phabricator.wikimedia.org/P34026 and previous config saved to /var/cache/conftool/dbconfig/20220907-073800-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Pooling db1199 for the first time in s4 T316342', diff saved to https://phabricator.wikimedia.org/P34025 and previous config saved to /var/cache/conftool/dbconfig/20220907-073745-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34024 and previous config saved to /var/cache/conftool/dbconfig/20220907-073732-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1199 to s4, depooled, T316342', diff saved to https://phabricator.wikimedia.org/P34023 and previous config saved to /var/cache/conftool/dbconfig/20220907-073727-marostegui.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 2%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34022 and previous config saved to /var/cache/conftool/dbconfig/20220907-073655-root.json
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34021 and previous config saved to /var/cache/conftool/dbconfig/20220907-073430-root.json
  • 07:33 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1197 (re)pooling @ 25%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34020 and previous config saved to /var/cache/conftool/dbconfig/20220907-073311-root.json
  • 07:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 07:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34019 and previous config saved to /var/cache/conftool/dbconfig/20220907-073131-root.json
  • 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 3%: Pooling for the first time in s3', diff saved to https://phabricator.wikimedia.org/P34018 and previous config saved to /var/cache/conftool/dbconfig/20220907-072255-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34017 and previous config saved to /var/cache/conftool/dbconfig/20220907-072214-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 1%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P34016 and previous config saved to /var/cache/conftool/dbconfig/20220907-072151-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34015 and previous config saved to /var/cache/conftool/dbconfig/20220907-071925-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1197 (re)pooling @ 10%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34014 and previous config saved to /var/cache/conftool/dbconfig/20220907-071806-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2146 and db2122', diff saved to https://phabricator.wikimedia.org/P34013 and previous config saved to /var/cache/conftool/dbconfig/20220907-071744-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34012 and previous config saved to /var/cache/conftool/dbconfig/20220907-071627-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 2%: Pooling for the first time in s3', diff saved to https://phabricator.wikimedia.org/P34011 and previous config saved to /var/cache/conftool/dbconfig/20220907-070750-root.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34010 and previous config saved to /var/cache/conftool/dbconfig/20220907-070420-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1197 (re)pooling @ 5%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34009 and previous config saved to /var/cache/conftool/dbconfig/20220907-070301-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34008 and previous config saved to /var/cache/conftool/dbconfig/20220907-070122-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34007 and previous config saved to /var/cache/conftool/dbconfig/20220907-064915-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Pooling db1198 for the first time in s3 T316342', diff saved to https://phabricator.wikimedia.org/P34006 and previous config saved to /var/cache/conftool/dbconfig/20220907-064831-marostegui.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1197 (re)pooling @ 4%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34005 and previous config saved to /var/cache/conftool/dbconfig/20220907-064757-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34004 and previous config saved to /var/cache/conftool/dbconfig/20220907-064617-root.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1198 to s3, depooled, T316342', diff saved to https://phabricator.wikimedia.org/P34003 and previous config saved to /var/cache/conftool/dbconfig/20220907-064552-marostegui.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 4%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34002 and previous config saved to /var/cache/conftool/dbconfig/20220907-063410-root.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1197 (re)pooling @ 3%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P34001 and previous config saved to /var/cache/conftool/dbconfig/20220907-063252-root.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34000 and previous config saved to /var/cache/conftool/dbconfig/20220907-063112-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P33999 and previous config saved to /var/cache/conftool/dbconfig/20220907-061906-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1197 (re)pooling @ 2%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P33998 and previous config saved to /var/cache/conftool/dbconfig/20220907-061747-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 4%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33997 and previous config saved to /var/cache/conftool/dbconfig/20220907-061607-root.json
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Pooling db1197 for the first time in s2 T316342', diff saved to https://phabricator.wikimedia.org/P33996 and previous config saved to /var/cache/conftool/dbconfig/20220907-061147-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1197 to s2, depooled, T316342', diff saved to https://phabricator.wikimedia.org/P33995 and previous config saved to /var/cache/conftool/dbconfig/20220907-060828-marostegui.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 2%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P33994 and previous config saved to /var/cache/conftool/dbconfig/20220907-060401-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33993 and previous config saved to /var/cache/conftool/dbconfig/20220907-060102-root.json
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Pooling db1196 for the first time in s1 T316342', diff saved to https://phabricator.wikimedia.org/P33992 and previous config saved to /var/cache/conftool/dbconfig/20220907-055201-marostegui.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1196 to s1, depooled, T316342', diff saved to https://phabricator.wikimedia.org/P33991 and previous config saved to /var/cache/conftool/dbconfig/20220907-054910-marostegui.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33990 and previous config saved to /var/cache/conftool/dbconfig/20220907-054557-root.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172 T316342', diff saved to https://phabricator.wikimedia.org/P33988 and previous config saved to /var/cache/conftool/dbconfig/20220907-053350-root.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174 T316342', diff saved to https://phabricator.wikimedia.org/P33986 and previous config saved to /var/cache/conftool/dbconfig/20220907-053154-root.json
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33985 and previous config saved to /var/cache/conftool/dbconfig/20220907-053053-root.json
  • 02:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 02:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 02:26 ejegg: rolled back civicrm from 2fcd3bb4 to 5aa1309d
  • 02:16 ejegg: civicrm upgraded from 5aa1309d to 2fcd3bb4
  • 01:57 ejegg: updated payments-wiki from 648842f9 to de4b2bb9
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T314041)', diff saved to https://phabricator.wikimedia.org/P33983 and previous config saved to /var/cache/conftool/dbconfig/20220907-015138-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 01:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T314041)', diff saved to https://phabricator.wikimedia.org/P33982 and previous config saved to /var/cache/conftool/dbconfig/20220907-015116-ladsgroup.json

2022-09-06

  • 23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T314041)', diff saved to https://phabricator.wikimedia.org/P33981 and previous config saved to /var/cache/conftool/dbconfig/20220906-233809-ladsgroup.json
  • 23:07 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on phab1004.eqiad.wmnet with reason: new install
  • 23:06 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on phab1004.eqiad.wmnet with reason: new install
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T314041)', diff saved to https://phabricator.wikimedia.org/P33980 and previous config saved to /var/cache/conftool/dbconfig/20220906-222439-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 22:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T314041)', diff saved to https://phabricator.wikimedia.org/P33979 and previous config saved to /var/cache/conftool/dbconfig/20220906-222418-ladsgroup.json
  • 21:56 milimetric@deploy1002: Finished deploy [analytics/refinery@b14c9f4] (thin): Hotfix for requestctl field (duration: 00m 08s)
  • 21:56 milimetric@deploy1002: Started deploy [analytics/refinery@b14c9f4] (thin): Hotfix for requestctl field
  • 21:56 milimetric@deploy1002: Finished deploy [analytics/refinery@b14c9f4]: Hotfix for requestctl field (duration: 02m 28s)
  • 21:53 milimetric@deploy1002: Started deploy [analytics/refinery@b14c9f4]: Hotfix for requestctl field
  • 21:53 milimetric@deploy1002: Finished deploy [analytics/refinery@b14c9f4]: Hotfix for requestctl field (duration: 03m 28s)
  • 21:49 milimetric@deploy1002: Started deploy [analytics/refinery@b14c9f4]: Hotfix for requestctl field
  • 21:49 milimetric@deploy1002: Finished deploy [analytics/refinery@b14c9f4]: Hotfix for requestctl field (duration: 03m 55s)
  • 21:45 milimetric@deploy1002: Started deploy [analytics/refinery@b14c9f4]: Hotfix for requestctl field
  • 21:45 milimetric@deploy1002: deploy aborted: Hotfix for requestctl field (duration: 32m 09s)
  • 21:41 root@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 21:39 mutante: phabricator - passive hosts in codfw switched to readonly DB access (m3-slave, not m3-master) T315713
  • 21:30 root@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
  • 21:13 milimetric@deploy1002: Started deploy [analytics/refinery@b14c9f4]: Hotfix for requestctl field
  • 20:57 milimetric@deploy1002: Finished deploy [analytics/refinery@8a5ce13] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8a5ce13] (duration: 08m 54s)
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:48 milimetric@deploy1002: Started deploy [analytics/refinery@8a5ce13] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8a5ce13]
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:48 cjming: end of UTC late backport window
  • 20:47 cjming@deploy1002: Finished scap: Backport for Add localized wordmark for Bengali Wiktionary (T316953) (duration: 05m 24s)
  • 20:45 milimetric@deploy1002: Finished deploy [analytics/refinery@8a5ce13]: Regular analytics weekly train [analytics/refinery@8a5ce13] (duration: 00m 16s)
  • 20:44 milimetric@deploy1002: Started deploy [analytics/refinery@8a5ce13]: Regular analytics weekly train [analytics/refinery@8a5ce13]
  • 20:44 milimetric@deploy1002: deploy aborted: Regular analytics weekly train [analytics/refinery@8a5ce13] (duration: 00m 00s)
  • 20:44 milimetric@deploy1002: Started deploy [analytics/refinery@8a5ce13]: Regular analytics weekly train [analytics/refinery@8a5ce13]
  • 20:44 milimetric@deploy1002: Finished deploy [analytics/refinery@8a5ce13] (thin): Regular analytics weekly train THIN [analytics/refinery@8a5ce13] (duration: 00m 08s)
  • 20:44 milimetric@deploy1002: Started deploy [analytics/refinery@8a5ce13] (thin): Regular analytics weekly train THIN [analytics/refinery@8a5ce13]
  • 20:42 cjming@deploy1002: cjming and mdsshakil: Backport for Add localized wordmark for Bengali Wiktionary (T316953) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:41 cjming@deploy1002: Started scap: Backport for Add localized wordmark for Bengali Wiktionary (T316953)
  • 20:38 milimetric@deploy1002: Finished deploy [analytics/refinery@8a5ce13]: Regular analytics weekly train [analytics/refinery@8a5ce13] (duration: 03m 15s)
  • 20:35 milimetric@deploy1002: Started deploy [analytics/refinery@8a5ce13]: Regular analytics weekly train [analytics/refinery@8a5ce13]
  • 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T314041)', diff saved to https://phabricator.wikimedia.org/P33978 and previous config saved to /var/cache/conftool/dbconfig/20220906-203258-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T314041)', diff saved to https://phabricator.wikimedia.org/P33977 and previous config saved to /var/cache/conftool/dbconfig/20220906-203236-ladsgroup.json
  • 20:29 cjming@deploy1002: Finished scap: Backport for Ensure namespace filters is passed as a list (duration: 06m 35s)
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:28 milimetric@deploy1002: Finished deploy [analytics/refinery@8a5ce13]: Regular analytics weekly train [analytics/refinery@8a5ce13] (duration: 63m 48s)
  • 20:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T312863)', diff saved to https://phabricator.wikimedia.org/P33976 and previous config saved to /var/cache/conftool/dbconfig/20220906-202654-ladsgroup.json
  • 20:23 cjming@deploy1002: cjming and ebernhardson: Backport for Ensure namespace filters is passed as a list synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:23 cjming@deploy1002: Started scap: Backport for Ensure namespace filters is passed as a list
  • 20:16 bd808: Forcing puppet runs on cloudweb100[34] to deploy new version of Striker (T296893)
  • 20:13 bd808: Running database migrations for Striker (T296893)
  • 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P33975 and previous config saved to /var/cache/conftool/dbconfig/20220906-201148-ladsgroup.json
  • 20:03 inflatador: 'bking@cumin1001 disabling puppet on elastic codfw hosts T313431'
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P33974 and previous config saved to /var/cache/conftool/dbconfig/20220906-195642-ladsgroup.json
  • 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T312863)', diff saved to https://phabricator.wikimedia.org/P33973 and previous config saved to /var/cache/conftool/dbconfig/20220906-194135-ladsgroup.json
  • 19:24 milimetric@deploy1002: Started deploy [analytics/refinery@8a5ce13]: Regular analytics weekly train [analytics/refinery@8a5ce13]
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T314041)', diff saved to https://phabricator.wikimedia.org/P33972 and previous config saved to /var/cache/conftool/dbconfig/20220906-184515-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 18:25 cwhite: reduce codfw replicas 2 to 1 for logstash-(webrequest|k8s) partitions. Make space for failed logstash2027 - T316996
  • 17:50 root@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 17:48 root@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 17:23 moritzm: installing dpkg bugfix updates from bullseye point release
  • 17:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging1004']
  • 17:16 krinkle@deploy1002: Synchronized php-1.39.0-wmf.27/resources/src/: I0516527d5cc0 (duration: 03m 50s)
  • 17:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004']
  • 17:06 krinkle@deploy1002: Synchronized wmf-config/: (no justification provided) (duration: 03m 50s)
  • 17:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging1004']
  • 17:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 17:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T314041)', diff saved to https://phabricator.wikimedia.org/P33969 and previous config saved to /var/cache/conftool/dbconfig/20220906-165958-ladsgroup.json
  • 16:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004']
  • 16:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:47 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 16:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-logging1004']
  • 16:44 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004']
  • 16:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-logging1004']
  • 16:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004']
  • 16:36 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['kafka-logging1004']
  • 16:25 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 16:24 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 16:23 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 16:22 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 16:22 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 16:20 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 16:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging1004']
  • 16:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 16:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:01 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33968 and previous config saved to /var/cache/conftool/dbconfig/20220906-154959-root.json
  • 15:48 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 15:44 root@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 15:43 root@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 15:43 root@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 15:43 root@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33967 and previous config saved to /var/cache/conftool/dbconfig/20220906-153454-root.json
  • 15:21 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.reboot-runner (exit_code=1) rolling reboot on A:gitlab-runner
  • 15:20 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33966 and previous config saved to /var/cache/conftool/dbconfig/20220906-151950-root.json
  • 15:15 claime: Set wtp10[41-43].eqiad.wmnet inactive pending decommission T317025
  • 15:14 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1043.eqiad.wmnet
  • 15:14 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1042.eqiad.wmnet
  • 15:14 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1041.eqiad.wmnet
  • 15:12 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1041-1043].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 15:12 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1041-1043].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T314041)', diff saved to https://phabricator.wikimedia.org/P33965 and previous config saved to /var/cache/conftool/dbconfig/20220906-150953-ladsgroup.json
  • 15:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33964 and previous config saved to /var/cache/conftool/dbconfig/20220906-150928-ladsgroup.json
  • 15:08 claime: depooled wtp1045.eqiad.wmnet from parsoid cluster T307219
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33963 and previous config saved to /var/cache/conftool/dbconfig/20220906-150445-root.json
  • 14:58 claime: pooled parse1012.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 14:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1012.eqiad.wmnet
  • 14:55 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1012.eqiad.wmnet
  • 14:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-logging1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33962 and previous config saved to /var/cache/conftool/dbconfig/20220906-144940-root.json
  • 14:46 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1012.eqiad.wmnet
  • 14:39 claime: depooled wtp1044.eqiad.wmnet from parsoid cluster T307219
  • 14:39 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:36 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1004
  • 14:36 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1004
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33961 and previous config saved to /var/cache/conftool/dbconfig/20220906-143435-root.json
  • 14:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 14:30 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 14:29 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 14:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 14:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 14:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 14:28 claime: pooled parse1011.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 14:27 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1011.eqiad.wmnet
  • 14:27 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1011.eqiad.wmnet
  • 14:15 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:15 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:08 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1011.eqiad.wmnet
  • 13:56 claime: depooled wtp1043.eqiad.wmnet from parsoid cluster T307219
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T312863)', diff saved to https://phabricator.wikimedia.org/P33960 and previous config saved to /var/cache/conftool/dbconfig/20220906-134545-ladsgroup.json
  • 13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T312863)', diff saved to https://phabricator.wikimedia.org/P33959 and previous config saved to /var/cache/conftool/dbconfig/20220906-134523-ladsgroup.json
  • 13:35 claime: pooled parse1010.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 13:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1010.eqiad.wmnet
  • 13:33 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1010.eqiad.wmnet
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P33958 and previous config saved to /var/cache/conftool/dbconfig/20220906-133017-ladsgroup.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 T316342', diff saved to https://phabricator.wikimedia.org/P33956 and previous config saved to /var/cache/conftool/dbconfig/20220906-132627-root.json
  • 13:21 TheresNoTime: closing UTC afternoon backport window
  • 13:19 samtar@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: CommonSettings-labs: Load Phonos extension (T314294) (duration: 04m 05s)
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33954 and previous config saved to /var/cache/conftool/dbconfig/20220906-131715-ladsgroup.json
  • 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T314041)', diff saved to https://phabricator.wikimedia.org/P33953 and previous config saved to /var/cache/conftool/dbconfig/20220906-131654-ladsgroup.json
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P33952 and previous config saved to /var/cache/conftool/dbconfig/20220906-131510-ladsgroup.json
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T312863)', diff saved to https://phabricator.wikimedia.org/P33951 and previous config saved to /var/cache/conftool/dbconfig/20220906-130004-ladsgroup.json
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33950 and previous config saved to /var/cache/conftool/dbconfig/20220906-123145-root.json
  • 12:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on puppetdb2002.codfw.wmnet with reason: Temporarily stop puppetdb/postgres
  • 12:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on puppetdb2002.codfw.wmnet with reason: Temporarily stop puppetdb/postgres
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33949 and previous config saved to /var/cache/conftool/dbconfig/20220906-121640-root.json
  • 12:15 XioNoX: repool ulsfo - T295690
  • 12:14 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1010.eqiad.wmnet
  • 12:05 claime: Set wtp10[38-40].eqiad.wmnet inactive pending decommission T317025
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T314041)', diff saved to https://phabricator.wikimedia.org/P33948 and previous config saved to /var/cache/conftool/dbconfig/20220906-120433-ladsgroup.json
  • 12:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33947 and previous config saved to /var/cache/conftool/dbconfig/20220906-120412-ladsgroup.json
  • 12:03 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1040.eqiad.wmnet
  • 12:03 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1039.eqiad.wmnet
  • 12:03 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1039-1040].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 12:02 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1039-1040].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33946 and previous config saved to /var/cache/conftool/dbconfig/20220906-120135-root.json
  • 12:01 claime: depooled wtp1042.eqiad.wmnet from parsoid cluster T307219
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33945 and previous config saved to /var/cache/conftool/dbconfig/20220906-114631-root.json
  • 11:35 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:34 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33944 and previous config saved to /var/cache/conftool/dbconfig/20220906-113126-root.json
  • 11:27 claime: pooled parse1009.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 11:26 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin2002"
  • 11:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 12 hosts with reason: Downtime pending inclusion in production
  • 11:26 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 12 hosts with reason: Downtime pending inclusion in production
  • 11:25 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin2002"
  • 11:17 XioNoX: put cr4-ulsfo back in service - T295690
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 5%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33943 and previous config saved to /var/cache/conftool/dbconfig/20220906-111621-root.json
  • 11:12 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:12 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:12 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:11 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:11 moritzm: installing ghostscript updates on stretch
  • 11:06 XioNoX: restart cr4-ulsfo for software upgrade - T295690
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 4%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33942 and previous config saved to /var/cache/conftool/dbconfig/20220906-110116-root.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33941 and previous config saved to /var/cache/conftool/dbconfig/20220906-105841-root.json
  • 10:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1009.eqiad.wmnet
  • 10:57 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1009.eqiad.wmnet
  • 10:52 moritzm: uploaded ghostscript 9.26a~dfsg-0+deb9u9+wmf1 to apt.wikimedia.org
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 3%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33940 and previous config saved to /var/cache/conftool/dbconfig/20220906-104611-root.json
  • 10:44 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:44 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33939 and previous config saved to /var/cache/conftool/dbconfig/20220906-104336-root.json
  • 10:42 XioNoX: drain traffic from cr4-ulsfo - T295690
  • 10:40 jayme: switched primary kube-controller-manager from kubemaster1001 to kubemaster1002
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33938 and previous config saved to /var/cache/conftool/dbconfig/20220906-103402-root.json
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 2%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33937 and previous config saved to /var/cache/conftool/dbconfig/20220906-103104-root.json
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33936 and previous config saved to /var/cache/conftool/dbconfig/20220906-103017-root.json
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33935 and previous config saved to /var/cache/conftool/dbconfig/20220906-102919-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33934 and previous config saved to /var/cache/conftool/dbconfig/20220906-102831-root.json
  • 10:27 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:27 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:26 XioNoX: put cr3-ulsfo back in service - T295690
  • 10:25 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:25 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1103 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33932 and previous config saved to /var/cache/conftool/dbconfig/20220906-102152-root.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33931 and previous config saved to /var/cache/conftool/dbconfig/20220906-101858-root.json
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 1%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P33930 and previous config saved to /var/cache/conftool/dbconfig/20220906-101559-root.json
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33929 and previous config saved to /var/cache/conftool/dbconfig/20220906-101513-root.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33928 and previous config saved to /var/cache/conftool/dbconfig/20220906-101414-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33927 and previous config saved to /var/cache/conftool/dbconfig/20220906-101326-root.json
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33926 and previous config saved to /var/cache/conftool/dbconfig/20220906-101129-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 100%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P33925 and previous config saved to /var/cache/conftool/dbconfig/20220906-100656-root.json
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1103 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33924 and previous config saved to /var/cache/conftool/dbconfig/20220906-100647-root.json
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33923 and previous config saved to /var/cache/conftool/dbconfig/20220906-100353-root.json
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33921 and previous config saved to /var/cache/conftool/dbconfig/20220906-100008-root.json
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33920 and previous config saved to /var/cache/conftool/dbconfig/20220906-095909-root.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33919 and previous config saved to /var/cache/conftool/dbconfig/20220906-095821-root.json
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33918 and previous config saved to /var/cache/conftool/dbconfig/20220906-095722-root.json
  • 09:57 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1009.eqiad.wmnet
  • 09:55 claime: depooled wtp1041.eqiad.wmnet from parsoid cluster T307219
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 75%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P33917 and previous config saved to /var/cache/conftool/dbconfig/20220906-095151-root.json
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1103 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33916 and previous config saved to /var/cache/conftool/dbconfig/20220906-095143-root.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33915 and previous config saved to /var/cache/conftool/dbconfig/20220906-094848-root.json
  • 09:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 09:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 09:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 09:45 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33914 and previous config saved to /var/cache/conftool/dbconfig/20220906-094503-root.json
  • 09:44 claime: pooled parse1008.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33913 and previous config saved to /var/cache/conftool/dbconfig/20220906-094404-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33912 and previous config saved to /var/cache/conftool/dbconfig/20220906-094316-root.json
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33911 and previous config saved to /var/cache/conftool/dbconfig/20220906-094217-root.json
  • 09:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1008.eqiad.wmnet
  • 09:40 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1008.eqiad.wmnet
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 50%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P33910 and previous config saved to /var/cache/conftool/dbconfig/20220906-093646-root.json
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1103 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33909 and previous config saved to /var/cache/conftool/dbconfig/20220906-093638-root.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33908 and previous config saved to /var/cache/conftool/dbconfig/20220906-093343-root.json
  • 09:31 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1008.eqiad.wmnet
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33907 and previous config saved to /var/cache/conftool/dbconfig/20220906-092958-root.json
  • 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33906 and previous config saved to /var/cache/conftool/dbconfig/20220906-092900-root.json
  • 09:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 4%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33905 and previous config saved to /var/cache/conftool/dbconfig/20220906-092812-root.json
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33904 and previous config saved to /var/cache/conftool/dbconfig/20220906-092712-root.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T314041)', diff saved to https://phabricator.wikimedia.org/P33903 and previous config saved to /var/cache/conftool/dbconfig/20220906-092626-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 09:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T314041)', diff saved to https://phabricator.wikimedia.org/P33902 and previous config saved to /var/cache/conftool/dbconfig/20220906-092604-ladsgroup.json
  • 09:25 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:25 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:25 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:22 btullis: installing istio configs to dse-k8s cluster
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 25%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P33901 and previous config saved to /var/cache/conftool/dbconfig/20220906-092141-root.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1103 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33900 and previous config saved to /var/cache/conftool/dbconfig/20220906-092133-root.json
  • 09:19 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 09:19 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33899 and previous config saved to /var/cache/conftool/dbconfig/20220906-091838-root.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33898 and previous config saved to /var/cache/conftool/dbconfig/20220906-091453-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33897 and previous config saved to /var/cache/conftool/dbconfig/20220906-091355-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33896 and previous config saved to /var/cache/conftool/dbconfig/20220906-091307-root.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33895 and previous config saved to /var/cache/conftool/dbconfig/20220906-091207-root.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 10%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P33894 and previous config saved to /var/cache/conftool/dbconfig/20220906-090637-root.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1103 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33893 and previous config saved to /var/cache/conftool/dbconfig/20220906-090628-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 4%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33892 and previous config saved to /var/cache/conftool/dbconfig/20220906-090333-root.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 4%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33891 and previous config saved to /var/cache/conftool/dbconfig/20220906-085948-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 4%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33890 and previous config saved to /var/cache/conftool/dbconfig/20220906-085850-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33889 and previous config saved to /var/cache/conftool/dbconfig/20220906-085802-root.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33888 and previous config saved to /var/cache/conftool/dbconfig/20220906-085703-root.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 5%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P33887 and previous config saved to /var/cache/conftool/dbconfig/20220906-085132-root.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1103 (re)pooling @ 4%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33886 and previous config saved to /var/cache/conftool/dbconfig/20220906-085123-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33885 and previous config saved to /var/cache/conftool/dbconfig/20220906-084829-root.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33884 and previous config saved to /var/cache/conftool/dbconfig/20220906-084443-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33883 and previous config saved to /var/cache/conftool/dbconfig/20220906-084345-root.json
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33882 and previous config saved to /var/cache/conftool/dbconfig/20220906-084257-root.json
  • 08:42 XioNoX: restart cr3-ulsfo for software upgrade - T295690
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33881 and previous config saved to /var/cache/conftool/dbconfig/20220906-084158-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 4%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P33880 and previous config saved to /var/cache/conftool/dbconfig/20220906-083627-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1103 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33879 and previous config saved to /var/cache/conftool/dbconfig/20220906-083619-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33878 and previous config saved to /var/cache/conftool/dbconfig/20220906-083324-root.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33876 and previous config saved to /var/cache/conftool/dbconfig/20220906-083019-root.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33875 and previous config saved to /var/cache/conftool/dbconfig/20220906-083002-root.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 T316342', diff saved to https://phabricator.wikimedia.org/P33874 and previous config saved to /var/cache/conftool/dbconfig/20220906-082954-root.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33873 and previous config saved to /var/cache/conftool/dbconfig/20220906-082939-root.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33872 and previous config saved to /var/cache/conftool/dbconfig/20220906-082841-root.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33871 and previous config saved to /var/cache/conftool/dbconfig/20220906-082653-root.json
  • 08:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 08:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33870 and previous config saved to /var/cache/conftool/dbconfig/20220906-082507-ladsgroup.json
  • 08:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:23 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 3%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P33869 and previous config saved to /var/cache/conftool/dbconfig/20220906-082122-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1103 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33868 and previous config saved to /var/cache/conftool/dbconfig/20220906-082114-root.json
  • 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33867 and previous config saved to /var/cache/conftool/dbconfig/20220906-081819-root.json
  • 08:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33866 and previous config saved to /var/cache/conftool/dbconfig/20220906-081514-root.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33865 and previous config saved to /var/cache/conftool/dbconfig/20220906-081458-root.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33864 and previous config saved to /var/cache/conftool/dbconfig/20220906-081434-root.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33863 and previous config saved to /var/cache/conftool/dbconfig/20220906-081336-root.json
  • 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P33862 and previous config saved to /var/cache/conftool/dbconfig/20220906-081001-ladsgroup.json
  • 08:09 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.28 refs T314189
  • 08:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 2%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P33861 and previous config saved to /var/cache/conftool/dbconfig/20220906-080618-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1103 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33860 and previous config saved to /var/cache/conftool/dbconfig/20220906-080609-root.json
  • 08:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 08:02 marostegui: Set x1 back to binlog_format=ROW
  • 08:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33859 and previous config saved to /var/cache/conftool/dbconfig/20220906-080009-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33858 and previous config saved to /var/cache/conftool/dbconfig/20220906-075953-root.json
  • 07:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr3-ulsfo.wikimedia.org with reason: router upgrade
  • 07:58 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr3-ulsfo.wikimedia.org with reason: router upgrade
  • 07:58 jnuche@deploy1002: Pruned MediaWiki: 1.39.0-wmf.24, 1.39.0-wmf.26 (duration: 02m 48s)
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P33857 and previous config saved to /var/cache/conftool/dbconfig/20220906-075455-ladsgroup.json
  • 07:52 XioNoX: depool ulsfo for routers upgrade - T295690
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 1%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P33856 and previous config saved to /var/cache/conftool/dbconfig/20220906-075113-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33855 and previous config saved to /var/cache/conftool/dbconfig/20220906-074504-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33854 and previous config saved to /var/cache/conftool/dbconfig/20220906-074448-root.json
  • 07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33853 and previous config saved to /var/cache/conftool/dbconfig/20220906-073948-ladsgroup.json
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 T316342', diff saved to https://phabricator.wikimedia.org/P33851 and previous config saved to /var/cache/conftool/dbconfig/20220906-073434-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33850 and previous config saved to /var/cache/conftool/dbconfig/20220906-072959-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33849 and previous config saved to /var/cache/conftool/dbconfig/20220906-072943-root.json
  • 07:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33848 and previous config saved to /var/cache/conftool/dbconfig/20220906-071455-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33847 and previous config saved to /var/cache/conftool/dbconfig/20220906-071438-root.json
  • 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:11 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Move 1 of 6 users to php 7.4 (T271736) (duration: 04m 06s)
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 4%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33846 and previous config saved to /var/cache/conftool/dbconfig/20220906-065950-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 4%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33845 and previous config saved to /var/cache/conftool/dbconfig/20220906-065934-root.json
  • 06:53 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 3%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33844 and previous config saved to /var/cache/conftool/dbconfig/20220906-064445-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33843 and previous config saved to /var/cache/conftool/dbconfig/20220906-064429-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1189 T316342', diff saved to https://phabricator.wikimedia.org/P33841 and previous config saved to /var/cache/conftool/dbconfig/20220906-064021-root.json
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1188 T316342', diff saved to https://phabricator.wikimedia.org/P33839 and previous config saved to /var/cache/conftool/dbconfig/20220906-063322-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 2%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33838 and previous config saved to /var/cache/conftool/dbconfig/20220906-062940-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33837 and previous config saved to /var/cache/conftool/dbconfig/20220906-062924-root.json
  • 06:15 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33836 and previous config saved to /var/cache/conftool/dbconfig/20220906-061434-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling again', diff saved to https://phabricator.wikimedia.org/P33835 and previous config saved to /var/cache/conftool/dbconfig/20220906-061419-root.json
  • 06:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T312863)', diff saved to https://phabricator.wikimedia.org/P33833 and previous config saved to /var/cache/conftool/dbconfig/20220906-061150-ladsgroup.json
  • 06:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 06:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 06:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 06:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to current x1 eqiad master', diff saved to https://phabricator.wikimedia.org/P33832 and previous config saved to /var/cache/conftool/dbconfig/20220906-060833-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103 T316745', diff saved to https://phabricator.wikimedia.org/P33831 and previous config saved to /var/cache/conftool/dbconfig/20220906-060815-root.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1120 to x1 primary T316745', diff saved to https://phabricator.wikimedia.org/P33830 and previous config saved to /var/cache/conftool/dbconfig/20220906-060602-root.json
  • 06:05 marostegui: Starting x1 eqiad failover from db1103 to db1120 - T316745
  • 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1118 T316623', diff saved to https://phabricator.wikimedia.org/P33829 and previous config saved to /var/cache/conftool/dbconfig/20220906-060418-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1163 to s1 primary and set section read-write T316623', diff saved to https://phabricator.wikimedia.org/P33828 and previous config saved to /var/cache/conftool/dbconfig/20220906-060055-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T316623', diff saved to https://phabricator.wikimedia.org/P33827 and previous config saved to /var/cache/conftool/dbconfig/20220906-060032-ladsgroup.json
  • 06:00 Amir1: Starting s1 eqiad failover from db1118 to db1163 - T316623
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1107 to dbctl depooled T316870', diff saved to https://phabricator.wikimedia.org/P33826 and previous config saved to /var/cache/conftool/dbconfig/20220906-053238-marostegui.json
  • 05:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33825 and previous config saved to /var/cache/conftool/dbconfig/20220906-052609-ladsgroup.json
  • 05:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 05:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33824 and previous config saved to /var/cache/conftool/dbconfig/20220906-052547-ladsgroup.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1120 with weight 0 T316745', diff saved to https://phabricator.wikimedia.org/P33823 and previous config saved to /var/cache/conftool/dbconfig/20220906-051304-root.json
  • 05:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T316745
  • 05:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T316745
  • 05:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P33822 and previous config saved to /var/cache/conftool/dbconfig/20220906-051041-ladsgroup.json
  • 05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1163 with weight 0 T316623', diff saved to https://phabricator.wikimedia.org/P33821 and previous config saved to /var/cache/conftool/dbconfig/20220906-050610-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: Primary switchover s1 T316623
  • 05:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: Primary switchover s1 T316623
  • 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P33820 and previous config saved to /var/cache/conftool/dbconfig/20220906-045535-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33819 and previous config saved to /var/cache/conftool/dbconfig/20220906-044029-ladsgroup.json
  • 03:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.28 refs T314189 (duration: 36m 17s)
  • 03:26 TimStarling: multi-DC stage 4: all traffic to appservers-ro, rolling out via puppet 03:24-03:54
  • 03:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.28 refs T314189
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33816 and previous config saved to /var/cache/conftool/dbconfig/20220906-024351-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 02:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33815 and previous config saved to /var/cache/conftool/dbconfig/20220906-024330-ladsgroup.json
  • 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P33814 and previous config saved to /var/cache/conftool/dbconfig/20220906-022824-ladsgroup.json
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P33813 and previous config saved to /var/cache/conftool/dbconfig/20220906-021318-ladsgroup.json
  • 02:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33812 and previous config saved to /var/cache/conftool/dbconfig/20220906-015812-ladsgroup.json
  • 01:03 TimStarling: multi-DC stage 3: 2% of codfw/ulsfo/eqsin traffic going to codfw appservers, rolling out via puppet 00:54-01:24
  • 00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance

2022-09-05

  • 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33811 and previous config saved to /var/cache/conftool/dbconfig/20220905-232237-ladsgroup.json
  • 23:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 23:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T314041)', diff saved to https://phabricator.wikimedia.org/P33810 and previous config saved to /var/cache/conftool/dbconfig/20220905-232216-ladsgroup.json
  • 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P33809 and previous config saved to /var/cache/conftool/dbconfig/20220905-230709-ladsgroup.json
  • 22:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P33808 and previous config saved to /var/cache/conftool/dbconfig/20220905-225203-ladsgroup.json
  • 22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T314041)', diff saved to https://phabricator.wikimedia.org/P33807 and previous config saved to /var/cache/conftool/dbconfig/20220905-223657-ladsgroup.json
  • 21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T314041)', diff saved to https://phabricator.wikimedia.org/P33806 and previous config saved to /var/cache/conftool/dbconfig/20220905-212415-ladsgroup.json
  • 21:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 21:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T314041)', diff saved to https://phabricator.wikimedia.org/P33805 and previous config saved to /var/cache/conftool/dbconfig/20220905-212343-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P33804 and previous config saved to /var/cache/conftool/dbconfig/20220905-210837-ladsgroup.json
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P33803 and previous config saved to /var/cache/conftool/dbconfig/20220905-205330-ladsgroup.json
  • 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T314041)', diff saved to https://phabricator.wikimedia.org/P33802 and previous config saved to /var/cache/conftool/dbconfig/20220905-203824-ladsgroup.json
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T314041)', diff saved to https://phabricator.wikimedia.org/P33801 and previous config saved to /var/cache/conftool/dbconfig/20220905-192554-ladsgroup.json
  • 19:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Maint needs to be redone', diff saved to https://phabricator.wikimedia.org/P33800 and previous config saved to /var/cache/conftool/dbconfig/20220905-191532-ladsgroup.json
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Maint needs to be redone', diff saved to https://phabricator.wikimedia.org/P33799 and previous config saved to /var/cache/conftool/dbconfig/20220905-190027-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Maint needs to be redone', diff saved to https://phabricator.wikimedia.org/P33798 and previous config saved to /var/cache/conftool/dbconfig/20220905-184522-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: Maint needs to be redone', diff saved to https://phabricator.wikimedia.org/P33797 and previous config saved to /var/cache/conftool/dbconfig/20220905-183017-ladsgroup.json
  • 18:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 18:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312863)', diff saved to https://phabricator.wikimedia.org/P33796 and previous config saved to /var/cache/conftool/dbconfig/20220905-182510-ladsgroup.json
  • 18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P33795 and previous config saved to /var/cache/conftool/dbconfig/20220905-181003-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P33794 and previous config saved to /var/cache/conftool/dbconfig/20220905-175457-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T314041)', diff saved to https://phabricator.wikimedia.org/P33793 and previous config saved to /var/cache/conftool/dbconfig/20220905-175423-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312863)', diff saved to https://phabricator.wikimedia.org/P33792 and previous config saved to /var/cache/conftool/dbconfig/20220905-173951-ladsgroup.json
  • 16:27 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 16:26 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
  • 15:30 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1038.eqiad.wmnet
  • 15:30 moritzm: installing apache2 security updates
  • 15:28 claime: depooled wtp1040.eqiad.wmnet from parsoid cluster T307219
  • 15:19 claime: pooled parse1007.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 15:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1007,parse1007.mgmt
  • 15:16 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1007,parse1007.mgmt
  • 15:09 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1007.eqiad.wmnet
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T314041)', diff saved to https://phabricator.wikimedia.org/P33791 and previous config saved to /var/cache/conftool/dbconfig/20220905-150837-ladsgroup.json
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T314041)', diff saved to https://phabricator.wikimedia.org/P33790 and previous config saved to /var/cache/conftool/dbconfig/20220905-150758-ladsgroup.json
  • 15:04 moritzm: updating docker.io on gitlab-runners
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P33789 and previous config saved to /var/cache/conftool/dbconfig/20220905-145252-ladsgroup.json
  • 14:48 claime: Set wtp103[6-7].eqiad.wmnet inactive pending decommission T317025
  • 14:47 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1037.eqiad.wmnet
  • 14:46 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1036.eqiad.wmnet
  • 14:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1036-1038].eqiad.wmnet with reason: Downtiming replace wtp servers
  • 14:40 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1036-1038].eqiad.wmnet with reason: Downtiming replace wtp servers
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P33788 and previous config saved to /var/cache/conftool/dbconfig/20220905-143746-ladsgroup.json
  • 14:33 claime: depooled wtp1039.eqiad.wmnet from parsoid cluster T307219
  • 14:30 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:30 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:29 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:29 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:28 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:28 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:23 claime: pooled parse1006.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T314041)', diff saved to https://phabricator.wikimedia.org/P33786 and previous config saved to /var/cache/conftool/dbconfig/20220905-142240-ladsgroup.json
  • 14:21 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1006,parse1006.mgmt
  • 14:21 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1006,parse1006.mgmt
  • 14:11 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1006.eqiad.wmnet
  • 14:02 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:02 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:01 claime: depooled wtp1038.eqiad.wmnet from parsoid cluster T307219
  • 13:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:48 claime: pooled parse1005.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 13:41 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 13:31 addshore: wdqs1009 sudo systemctl stop wdqs-blazegraph.service
  • 13:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1011.eqiad.wmnet with OS bullseye
  • 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on puppetdb2002.codfw.wmnet with reason: Temporarily stop puppetdb
  • 13:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on puppetdb2002.codfw.wmnet with reason: Temporarily stop puppetdb
  • 13:10 urbanecm: UTC afternoon B&C window done
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T314041)', diff saved to https://phabricator.wikimedia.org/P33785 and previous config saved to /var/cache/conftool/dbconfig/20220905-130944-ladsgroup.json
  • 13:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: edbcee4: Enable partial action blocks on fawiki (T315525) (duration: 03m 34s)
  • 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:07 moritzm: disabling puppet in codfw and the edges temporarily
  • 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 13:01 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 12:48 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1011.eqiad.wmnet with OS bullseye
  • 12:47 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1007.eqiad.wmnet with OS bullseye
  • 12:33 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host datahubsearch1003.eqiad.wmnet
  • 12:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1005,parse1005.mgmt
  • 12:31 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1005,parse1005.mgmt
  • 12:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
  • 12:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
  • 12:20 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 18 hosts with reason: Downtime pending inclusion in production
  • 12:20 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 18 hosts with reason: Downtime pending inclusion in production
  • 12:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
  • 12:16 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
  • 12:16 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1005.eqiad.wmnet
  • 12:14 claime: depooled wtp1037.eqiad.wmnet from parsoid cluster T312638
  • 12:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
  • 12:10 tstarling@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db[2142-2144].codfw.wmnet
  • 12:10 tstarling@cumin1001: START - Cookbook sre.hosts.remove-downtime for db[2142-2144].codfw.wmnet
  • 12:10 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1004.mgmt
  • 12:10 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1004.mgmt
  • 12:10 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1007.eqiad.wmnet with OS bullseye
  • 12:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
  • 11:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[1001-1004].eqiad.wmnet
  • 11:56 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[1001-1004].eqiad.wmnet
  • 11:55 TimStarling: on db2142: rejecting inbound mysql traffic T316847
  • 11:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
  • 11:53 claime: pooled parse1004.eqiad.wmnet (php 7.4 only) in parsoid cluster T312638
  • 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1004.eqiad.wmnet
  • 11:52 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1004.eqiad.wmnet
  • 11:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T312863)', diff saved to https://phabricator.wikimedia.org/P33784 and previous config saved to /var/cache/conftool/dbconfig/20220905-114352-ladsgroup.json
  • 11:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 11:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr2-eqiad:xe-4/1/3
  • 11:41 jnuche@deploy1002: Installation of scap version "4.16.0" completed for 584 hosts
  • 11:41 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr2-eqiad:xe-4/1/3
  • 11:40 jnuche@deploy1002: Installing scap version "4.16.0" for 584 hosts
  • 11:37 TimStarling: on db2142: dropping inbound mysql traffic T316847
  • 11:36 claime: Set wtp103[4-5].eqiad.wmnet inactive pending decommission https://phabricator.wikimedia.org/T317025
  • 11:34 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1035.eqiad.wmnet
  • 11:34 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1034.eqiad.wmnet
  • 11:32 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1034-1036].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 11:32 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1034-1036].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 11:30 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1004.eqiad.wmnet
  • 11:29 TimStarling: on db2142: set master_delay=30 and restarted replication T316847
  • 11:27 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1003.eqiad.wmnet
  • 11:27 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1003.eqiad.wmnet
  • 11:24 claime: depooled wtp1036.eqiad.wmnet from parsoid cluster https://phabricator.wikimedia.org/T312638
  • 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T314041)', diff saved to https://phabricator.wikimedia.org/P33783 and previous config saved to /var/cache/conftool/dbconfig/20220905-112308-ladsgroup.json
  • 11:18 TimStarling: on db2142: stopped mariadb replication
  • 11:16 claime: pooled parse1003.eqiad.wmnet (php 7.4 only) in parsoid cluster https://phabricator.wikimedia.org/T312638
  • 11:16 tstarling@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2142-2144].codfw.wmnet with reason: T316847 x2 failure test
  • 11:15 tstarling@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2142-2144].codfw.wmnet with reason: T316847 x2 failure test
  • 11:15 cgoubert@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,name=parse1003.eqiad.wmnet
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P33782 and previous config saved to /var/cache/conftool/dbconfig/20220905-110801-ladsgroup.json
  • 11:04 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1003.eqiad.wmnet
  • 10:55 Emperor: set thanos ring replicas to 3.90 T311690
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P33781 and previous config saved to /var/cache/conftool/dbconfig/20220905-105255-ladsgroup.json
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T314041)', diff saved to https://phabricator.wikimedia.org/P33780 and previous config saved to /var/cache/conftool/dbconfig/20220905-103749-ladsgroup.json
  • 10:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1015.eqiad.wmnet
  • 10:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1015.eqiad.wmnet
  • 10:25 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1014.eqiad.wmnet
  • 10:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1014.eqiad.wmnet
  • 10:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1013.eqiad.wmnet
  • 10:13 XioNoX: upgrade python-pynetbox to 6.6 on netbox frontends - T310745
  • 10:11 hnowlan@deploy1002: Finished deploy [restbase/deploy@79b3cd2]: Add guwwiktionary and bjnwiktionary T309058 T312216 (duration: 15m 05s)
  • 10:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1013.eqiad.wmnet
  • 09:56 hnowlan@deploy1002: Started deploy [restbase/deploy@79b3cd2]: Add guwwiktionary and bjnwiktionary T309058 T312216
  • 09:47 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 09:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1012.eqiad.wmnet
  • 09:37 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:35 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 09:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1012.eqiad.wmnet
  • 09:25 btullis: deployed calico to dse-k8s cluster T310174
  • 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T314041)', diff saved to https://phabricator.wikimedia.org/P33779 and previous config saved to /var/cache/conftool/dbconfig/20220905-092338-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 09:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 09:23 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
  • 09:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1010.eqiad.wmnet
  • 09:17 XioNoX: Squid: permit production networks instead of aggregate_networks - T265864
  • 09:17 moritzm: installing flac security updates
  • 09:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1010.eqiad.wmnet
  • 09:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1008.eqiad.wmnet
  • 09:05 hnowlan@deploy1002: Finished deploy [restbase/deploy@a571f9a]: Add pcmwiki T310880 (duration: 01m 06s)
  • 09:04 hnowlan@deploy1002: Started deploy [restbase/deploy@a571f9a]: Add pcmwiki T310880
  • 09:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1008.eqiad.wmnet
  • 09:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1006.eqiad.wmnet
  • 08:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1006.eqiad.wmnet
  • 08:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
  • 08:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
  • 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:14 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to old templatelinks fields in s7 (T312865) (duration: 03m 51s)
  • 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:01 XioNoX: rename Telia to Arelion in Netbox
  • 07:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:32 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Make English Wikipedia read new on templatelinks migration (T306673) (duration: 03m 31s)
  • 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:25 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 739920c: Fix missing logo for mniwiktionary and frwikiquote (T317004) (duration: 03m 36s)
  • 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:22 urbanecm@deploy1002: Synchronized static/images/project-logos/: ff2e108: Upload missing logo for mniwiktionary and frwikiquote (T317004) (duration: 03m 50s)
  • 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:19 moritzm: installing ghostscript security updates
  • 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:07 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Move 10% of traffic to php 7.4 (T271736) (duration: 03m 50s)
  • 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr2-eqiad:xe-4/1/3
  • 06:28 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr2-eqiad:xe-4/1/3
  • 06:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 06:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 02:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 02:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 02:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P33778 and previous config saved to /var/cache/conftool/dbconfig/20220905-024602-ladsgroup.json
  • 00:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 00:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P33777 and previous config saved to /var/cache/conftool/dbconfig/20220905-003619-ladsgroup.json
  • 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33776 and previous config saved to /var/cache/conftool/dbconfig/20220905-002112-ladsgroup.json
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33775 and previous config saved to /var/cache/conftool/dbconfig/20220905-000606-ladsgroup.json

2022-09-04

  • 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P33774 and previous config saved to /var/cache/conftool/dbconfig/20220904-235100-ladsgroup.json
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P33773 and previous config saved to /var/cache/conftool/dbconfig/20220904-225044-ladsgroup.json
  • 22:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P33772 and previous config saved to /var/cache/conftool/dbconfig/20220904-225016-ladsgroup.json
  • 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P33771 and previous config saved to /var/cache/conftool/dbconfig/20220904-223510-ladsgroup.json
  • 22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P33770 and previous config saved to /var/cache/conftool/dbconfig/20220904-222004-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P33769 and previous config saved to /var/cache/conftool/dbconfig/20220904-220457-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P33767 and previous config saved to /var/cache/conftool/dbconfig/20220904-155059-ladsgroup.json
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P33766 and previous config saved to /var/cache/conftool/dbconfig/20220904-155027-ladsgroup.json
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P33765 and previous config saved to /var/cache/conftool/dbconfig/20220904-153521-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P33764 and previous config saved to /var/cache/conftool/dbconfig/20220904-152015-ladsgroup.json
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P33763 and previous config saved to /var/cache/conftool/dbconfig/20220904-150508-ladsgroup.json
  • 12:51 elukey: reset-fail ifup@ens13.service on idp2002
  • 12:50 elukey: reset-fail ifup@ens13.service on netflow4002
  • 12:49 elukey: pkill remaining processes of user effeietsanders on stat1008 to unblock puppet - T314846
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P33762 and previous config saved to /var/cache/conftool/dbconfig/20220904-103427-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T314041)', diff saved to https://phabricator.wikimedia.org/P33761 and previous config saved to /var/cache/conftool/dbconfig/20220904-103405-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P33760 and previous config saved to /var/cache/conftool/dbconfig/20220904-083341-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance

2022-09-03

  • 23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P33759 and previous config saved to /var/cache/conftool/dbconfig/20220903-235001-ladsgroup.json
  • 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33758 and previous config saved to /var/cache/conftool/dbconfig/20220903-233455-ladsgroup.json
  • 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33757 and previous config saved to /var/cache/conftool/dbconfig/20220903-231949-ladsgroup.json
  • 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P33756 and previous config saved to /var/cache/conftool/dbconfig/20220903-230443-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P33755 and previous config saved to /var/cache/conftool/dbconfig/20220903-220427-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 22:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 22:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 22:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T312863)', diff saved to https://phabricator.wikimedia.org/P33754 and previous config saved to /var/cache/conftool/dbconfig/20220903-220326-ladsgroup.json
  • 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P33753 and previous config saved to /var/cache/conftool/dbconfig/20220903-214820-ladsgroup.json
  • 21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P33752 and previous config saved to /var/cache/conftool/dbconfig/20220903-213314-ladsgroup.json
  • 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T312863)', diff saved to https://phabricator.wikimedia.org/P33751 and previous config saved to /var/cache/conftool/dbconfig/20220903-211808-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T314041)', diff saved to https://phabricator.wikimedia.org/P33750 and previous config saved to /var/cache/conftool/dbconfig/20220903-180104-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T314041)', diff saved to https://phabricator.wikimedia.org/P33749 and previous config saved to /var/cache/conftool/dbconfig/20220903-180042-ladsgroup.json
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T312863)', diff saved to https://phabricator.wikimedia.org/P33748 and previous config saved to /var/cache/conftool/dbconfig/20220903-151224-ladsgroup.json
  • 15:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 15:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 09:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T314041)', diff saved to https://phabricator.wikimedia.org/P33747 and previous config saved to /var/cache/conftool/dbconfig/20220903-015524-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T314041)', diff saved to https://phabricator.wikimedia.org/P33746 and previous config saved to /var/cache/conftool/dbconfig/20220903-015502-ladsgroup.json

2022-09-02

  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:58 dancy@deploy1002: Sync cancelled.
  • 18:56 dancy@deploy1002: dancy: testing T299648 synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 18:55 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:51 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:51 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:47 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:40 dancy@deploy1002: Started scap: testing T299648
  • 17:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1203.eqiad.wmnet with OS bullseye
  • 17:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1203.eqiad.wmnet with reason: host reimage
  • 17:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1203.eqiad.wmnet with reason: host reimage
  • 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1203.eqiad.wmnet with OS bullseye
  • 17:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS bullseye
  • 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1201.eqiad.wmnet with OS bullseye
  • 16:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage
  • 16:52 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage
  • 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1201.eqiad.wmnet with reason: host reimage
  • 16:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1201.eqiad.wmnet with reason: host reimage
  • 16:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS bullseye
  • 16:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1200.eqiad.wmnet with OS bullseye
  • 16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1203']
  • 16:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1201.eqiad.wmnet with OS bullseye
  • 16:30 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1203']
  • 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1199.eqiad.wmnet with OS bullseye
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1202']
  • 16:23 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1200.eqiad.wmnet with reason: host reimage
  • 16:19 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1200.eqiad.wmnet with reason: host reimage
  • 16:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1203']
  • 16:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1202']
  • 16:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1199.eqiad.wmnet with reason: host reimage
  • 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1203']
  • 16:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1202']
  • 16:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1199.eqiad.wmnet with reason: host reimage
  • 16:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1200.eqiad.wmnet with OS bullseye
  • 16:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1201']
  • 16:03 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1198.eqiad.wmnet with OS bullseye
  • 16:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1202']
  • 15:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:57 jayme: repool kubemaster2002
  • 15:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1201']
  • 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1199.eqiad.wmnet with OS bullseye
  • 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1200']
  • 15:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1198.eqiad.wmnet with reason: host reimage
  • 15:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1197.eqiad.wmnet with OS bullseye
  • 15:45 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1198.eqiad.wmnet with reason: host reimage
  • 15:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1200']
  • 15:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1200']
  • 15:39 jayme: depool kubemaster2002
  • 15:37 jayme: repooled kubemaster1001
  • 15:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1201']
  • 15:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1201']
  • 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1197.eqiad.wmnet with reason: host reimage
  • 15:34 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1198.eqiad.wmnet with OS bullseye
  • 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1201']
  • 15:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1200']
  • 15:31 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1196.eqiad.wmnet with OS bullseye
  • 15:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1197.eqiad.wmnet with reason: host reimage
  • 15:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1198']
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1201']
  • 15:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1199']
  • 15:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1198']
  • 15:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1197.eqiad.wmnet with OS bullseye
  • 15:16 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1196.eqiad.wmnet with reason: host reimage
  • 15:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1199']
  • 15:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1199']
  • 15:13 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1196.eqiad.wmnet with reason: host reimage
  • 15:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1198']
  • 15:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1199']
  • 15:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1198']
  • 15:04 jayme: depooled kubemaster1001
  • 15:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1198']
  • 15:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1198']
  • 15:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1198']
  • 15:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1197']
  • 15:00 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1196.eqiad.wmnet with OS bullseye
  • 14:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices1003.wikimedia.org
  • 14:58 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1198']
  • 14:55 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:53 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1197']
  • 14:51 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1003.wikimedia.org
  • 14:49 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices1003.wikimedia.org
  • 14:49 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1197']
  • 14:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1196']
  • 14:46 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:42 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1003.wikimedia.org
  • 14:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1197']
  • 14:39 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
  • 14:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1196']
  • 14:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
  • 14:21 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['db1196']
  • 14:18 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices1003
  • 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:16 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['db1197']
  • 14:15 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:11 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1003
  • 14:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1197']
  • 14:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
  • 13:57 pt1979@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1196']
  • 13:57 pt1979@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
  • 13:31 pt1979@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1196']
  • 13:31 pt1979@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
  • 13:15 jayme: repooled kubemaster1002
  • 13:10 jayme: redepooled kubemaster1002
  • 13:00 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:56 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MMandere out of all services on: 1235 hosts
  • 10:35 jmm@cumin2002: START - Cookbook sre.idm.logout Logging MMandere out of all services on: 1235 hosts
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MMandere out of all services on: 779 hosts
  • 10:34 jmm@cumin2002: START - Cookbook sre.idm.logout Logging MMandere out of all services on: 779 hosts
  • 10:08 jayme: depooled kubemaster1002 for tests
  • 09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T314041)', diff saved to https://phabricator.wikimedia.org/P33743 and previous config saved to /var/cache/conftool/dbconfig/20220902-092704-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
  • 08:41 fnegri@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:37 fnegri@cumin1001: START - Cookbook sre.dns.netbox
  • 08:36 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
  • 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
  • 08:26 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
  • 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
  • 08:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
  • 08:13 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host idp2002.wikimedia.org
  • 08:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp2002.wikimedia.org
  • 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1002.wikimedia.org
  • 07:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1002.wikimedia.org
  • 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2002.wikimedia.org
  • 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2002.wikimedia.org
  • 07:17 dcausse: restarting blazegraph on wdqs1016 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 to clone db1107 T316870', diff saved to https://phabricator.wikimedia.org/P33739 and previous config saved to /var/cache/conftool/dbconfig/20220902-054405-root.json
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2149 T316494 ', diff saved to https://phabricator.wikimedia.org/P33738 and previous config saved to /var/cache/conftool/dbconfig/20220902-052841-marostegui.json

2022-09-01

  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:50 thcipriani@deploy1002: Finished scap: Backport for Remove Vector grid config (T313559), Disable sticky header edit experiment for idwiki, viwki (T315264) (duration: 05m 44s)
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:44 thcipriani@deploy1002: thcipriani and cjming and bwang: Backport for Remove Vector grid config (T313559), Disable sticky header edit experiment for idwiki, viwki (T315264) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:44 thcipriani@deploy1002: Started scap: Backport for Remove Vector grid config (T313559), Disable sticky header edit experiment for idwiki, viwki (T315264)
  • 20:41 thcipriani@deploy1002: Finished scap: Backport for cirrus: Handle transition to elasticsearch 7.10 (duration: 16m 56s)
  • 20:40 ryankemper: T300943 New hosts are in service and were pooled like so: `sudo confctl select name=elastic20[73-86].* set/weight=10:pooled=yes` (in retrospect that syntax seems to have selected too many hosts, but the final state of pybal is correct per https://config-master.wikimedia.org/pybal/codfw/search)
  • 20:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1203.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1202.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:37 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic20[73-86].*
  • 20:35 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: T300943
  • 20:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: T300943
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:24 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for cirrus: Handle transition to elasticsearch 7.10 synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:24 thcipriani@deploy1002: Started scap: Backport for cirrus: Handle transition to elasticsearch 7.10
  • 20:20 thcipriani@deploy1002: backport aborted: (duration: 03m 09s)
  • 20:20 thcipriani@deploy1002: backport aborted: (duration: 02m 57s)
  • 20:20 thcipriani@deploy1002: sync-world aborted: Backport for Revert "Deploy Research Incentive Survey to idwiki" (duration: 01m 23s)
  • 20:20 thcipriani@deploy1002: thcipriani and trainbranchbot: Backport for Revert "Deploy Research Incentive Survey to idwiki" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:19 thcipriani@deploy1002: Started scap: Backport for Revert "Deploy Research Incentive Survey to idwiki"
  • 20:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1203.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:14 pt1979@cumin1001: START - Cookbook sre.hosts.provision for host db1202.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1201.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:13 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1200.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:13 thcipriani@deploy1002: thcipriani and dani: Backport for Deploy Research Incentive Survey to idwiki (T316466) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:06 thcipriani@deploy1002: Started scap: Backport for Deploy Research Incentive Survey to idwiki (T316466)
  • 19:58 mutante: otrs1001 - sudo systemctl reset-failed - T316903
  • 19:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1201.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:46 pt1979@cumin1001: START - Cookbook sre.hosts.provision for host db1200.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1199.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1198.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1199.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 pt1979@cumin1001: START - Cookbook sre.hosts.provision for host db1198.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1197.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:16 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1196.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1197.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:53 pt1979@cumin1001: START - Cookbook sre.hosts.provision for host db1196.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:52 pt1979@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1203
  • 18:51 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1203
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1202
  • 18:51 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1202
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1201
  • 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1201
  • 18:50 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1200
  • 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1200
  • 18:50 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1199
  • 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1199
  • 18:50 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1198
  • 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1198
  • 18:50 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1197
  • 18:49 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1197
  • 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1196
  • 18:49 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1196
  • 18:48 pt1979@cumin1001: START - Cookbook sre.dns.netbox
  • 18:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:42 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.27 refs T314188
  • 17:34 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:33 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:33 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:32 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:32 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:31 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:26 herron: restarted rsyslog on centrallog2002
  • 16:29 topranks: Brining Lumen Tranport CCT 442550294 (cr1-codfw to cr4-ulsfo) back into service following successful hot-cut to lower-latency path with carrier
  • 16:17 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=restbase103[1-3].eqiad.wmnet
  • 15:55 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase103[1-3].eqiad.wmnet
  • 15:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:21 moritzm: installing usb.ids update from Bullseye 11.4 point release
  • 15:19 moritzm: updating docker.io on ml-serve* to bugfix release from Bullseye 11.4 point release
  • 14:54 topranks: Draining traffic from Lumen Tranport CCT 442550294 (cr1-codfw to cr4-ulsfo) ahead of hot-cut to lower-latency path with carrier
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
  • 14:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
  • 14:07 moritzm: installing net-snmp security updates on Buster
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
  • 14:01 marostegui: test T316744
  • 14:01 marostegui: test T316744
  • 14:00 marostegui: Failover m5 from db1107 to db1183 - T316744
  • 13:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
  • 13:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
  • 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netbox1002.eqiad.wmnet
  • 13:43 moritzm: rebooting netbox1002 (running netbox.wikimedia.org)
  • 13:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox1002.eqiad.wmnet
  • 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox2002.codfw.wmnet
  • 13:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox2002.codfw.wmnet
  • 13:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1107,1117,1183].eqiad.wmnet with reason: switchover m5 T316744
  • 13:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1107,1117,1183].eqiad.wmnet with reason: switchover m5 T316744
  • 13:19 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:19 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:19 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:19 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:18 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Move 5% of traffic to php 7.4 (T271736) (duration: 03m 45s)
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:00 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:00 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:00 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:59 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:56 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:56 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:29 herron: restarted thanos-query on thanos-fe1001
  • 12:20 cdanis@cumin2002: dbctl commit (dc=all): 'T316482 remove replicas from x2', diff saved to https://phabricator.wikimedia.org/P33736 and previous config saved to /var/cache/conftool/dbconfig/20220901-122026-cdanis.json
  • 12:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-serve-ctrl1001.eqiad.wmnet
  • 12:13 klausman@cumin1001: START - Cookbook sre.hosts.remove-downtime for ml-serve-ctrl1001.eqiad.wmnet
  • 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P33735 and previous config saved to /var/cache/conftool/dbconfig/20220901-121252-ladsgroup.json
  • 12:05 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
  • 12:05 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
  • 12:03 klausman@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
  • 11:59 moritzm: rebalance row B after completed Bullseye updates T311686
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P33734 and previous config saved to /var/cache/conftool/dbconfig/20220901-115746-ladsgroup.json
  • 11:48 cdanis: root@apt1001:/home/cdanis/build-area# reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia conftool_2.2.2-1_amd64.changes
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P33733 and previous config saved to /var/cache/conftool/dbconfig/20220901-114239-ladsgroup.json
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P33732 and previous config saved to /var/cache/conftool/dbconfig/20220901-112733-ladsgroup.json
  • 11:04 claime: depooled wtp1035.eqiad.wmnet from parsoid cluster https://phabricator.wikimedia.org/T312638
  • 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki2002.codfw.wmnet
  • 10:58 claime: pooled parse1002.eqiad.wmnet (php 7.4 only) in parsoid cluster https://phabricator.wikimedia.org/T312638
  • 10:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
  • 10:56 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
  • 10:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki2002.codfw.wmnet
  • 10:43 claime: depooled wtp1034.eqiad.wmnet from parsoid cluster https://phabricator.wikimedia.org/T312638
  • 10:43 claime: pooled parse1001.eqiad.wmnet (php 7.4 only) in parsoid cluster https://phabricator.wikimedia.org/T312638
  • 10:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
  • 10:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1001.eqiad.wmnet
  • 10:40 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1001.eqiad.wmnet
  • 10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
  • 10:36 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1002.eqiad.wmnet
  • 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
  • 10:29 klausman@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 10:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
  • 10:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:13 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1013 backt to pc3 master (duration: 03m 43s)
  • 10:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:58 cgoubert@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update wgLinterSubmitterWhitelist (T312638) (duration: 03m 37s)
  • 09:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:32 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to pc3 master (duration: 03m 34s)
  • 09:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2015.codfw.wmnet to cluster codfw and group D
  • 08:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: Readding downtime removed by reimage
  • 08:17 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: Readding downtime removed by reimage
  • 08:17 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2015.codfw.wmnet to cluster codfw and group D
  • 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2015.codfw.wmnet
  • 07:56 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Moving 1% of traffic to php 7.4 (duration: 03m 42s)
  • 07:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
  • 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2015.codfw.wmnet with OS bullseye
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2015.codfw.wmnet with reason: host reimage
  • 07:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2015.codfw.wmnet with reason: host reimage
  • 06:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2015.codfw.wmnet with OS bullseye
  • 06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:25 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Reverting to no php 7.4 traffic (duration: 03m 44s)
  • 06:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:10 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Moving 1% of users to php 7.4 (duration: 03m 55s)
  • 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1136 T316111', diff saved to https://phabricator.wikimedia.org/P33729 and previous config saved to /var/cache/conftool/dbconfig/20220901-060923-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1181 to s7 primary and set section read-write T316111', diff saved to https://phabricator.wikimedia.org/P33728 and previous config saved to /var/cache/conftool/dbconfig/20220901-060128-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T316111', diff saved to https://phabricator.wikimedia.org/P33727 and previous config saved to /var/cache/conftool/dbconfig/20220901-060100-ladsgroup.json
  • 06:00 Amir1: Starting s7 eqiad failover from db1136 to db1181 - T316111
  • 05:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1181 with weight 0 T316111', diff saved to https://phabricator.wikimedia.org/P33726 and previous config saved to /var/cache/conftool/dbconfig/20220901-051701-ladsgroup.json
  • 05:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T316111
  • 05:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T316111
  • 01:20 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase201[3-8].codfw.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
  • 00:21 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase201[3-8].codfw.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001

Other archives

2000s

2010s

2020s