Jump to content

Server Admin Log/Archive 58

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2022-10-31

  • 22:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T318605)', diff saved to https://phabricator.wikimedia.org/P37282 and previous config saved to /var/cache/conftool/dbconfig/20221031-222151-ladsgroup.json
  • 22:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P37281 and previous config saved to /var/cache/conftool/dbconfig/20221031-220645-ladsgroup.json
  • 21:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P37280 and previous config saved to /var/cache/conftool/dbconfig/20221031-215138-ladsgroup.json
  • 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T318605)', diff saved to https://phabricator.wikimedia.org/P37279 and previous config saved to /var/cache/conftool/dbconfig/20221031-213632-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T318605)', diff saved to https://phabricator.wikimedia.org/P37278 and previous config saved to /var/cache/conftool/dbconfig/20221031-212749-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T318605)', diff saved to https://phabricator.wikimedia.org/P37277 and previous config saved to /var/cache/conftool/dbconfig/20221031-212717-ladsgroup.json
  • 21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P37276 and previous config saved to /var/cache/conftool/dbconfig/20221031-211210-ladsgroup.json
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P37275 and previous config saved to /var/cache/conftool/dbconfig/20221031-205703-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T318605)', diff saved to https://phabricator.wikimedia.org/P37274 and previous config saved to /var/cache/conftool/dbconfig/20221031-204157-ladsgroup.json
  • 20:33 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: No-op sync of InitialiseSettings.php to declare stream rc0.mediawiki.page_change. This stream is disabled everywhere by default, and only enabled in beta for now. - T311129 (duration: 03m 42s)
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T318605)', diff saved to https://phabricator.wikimedia.org/P37273 and previous config saved to /var/cache/conftool/dbconfig/20221031-203319-ladsgroup.json
  • 20:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 20:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T318605)', diff saved to https://phabricator.wikimedia.org/P37272 and previous config saved to /var/cache/conftool/dbconfig/20221031-203258-ladsgroup.json
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 20:32 cjming: end of UTC late backport window
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 20:26 cjming@deploy1002: Finished scap: Backport for cirrus: Correct comments in ProductionServices.php (T262630) (duration: 05m 57s)
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 20:20 cjming@deploy1002: cjming and ebernhardson: Backport for cirrus: Correct comments in ProductionServices.php (T262630) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:20 cjming@deploy1002: Started scap: Backport for cirrus: Correct comments in ProductionServices.php (T262630)
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P37271 and previous config saved to /var/cache/conftool/dbconfig/20221031-201751-ladsgroup.json
  • 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 20:17 cjming@deploy1002: Finished scap: Backport for Update sample rate for edit attempt stream to 1 for group 0. (T312016) (duration: 04m 17s)
  • 20:13 cjming@deploy1002: cjming and cjming: Backport for Update sample rate for edit attempt stream to 1 for group 0. (T312016) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:12 cjming@deploy1002: Started scap: Backport for Update sample rate for edit attempt stream to 1 for group 0. (T312016)
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 20:09 cjming@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on itwiki (T314318) (duration: 05m 44s)
  • 20:03 cjming@deploy1002: cjming and arlolra: Backport for Disable wgParserEnableLegacyMediaDOM on itwiki (T314318) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:03 cjming@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on itwiki (T314318)
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P37270 and previous config saved to /var/cache/conftool/dbconfig/20221031-200245-ladsgroup.json
  • 19:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 19:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T318605)', diff saved to https://phabricator.wikimedia.org/P37269 and previous config saved to /var/cache/conftool/dbconfig/20221031-194738-ladsgroup.json
  • 19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T318605)', diff saved to https://phabricator.wikimedia.org/P37268 and previous config saved to /var/cache/conftool/dbconfig/20221031-194614-ladsgroup.json
  • 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P37267 and previous config saved to /var/cache/conftool/dbconfig/20221031-193108-ladsgroup.json
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P37265 and previous config saved to /var/cache/conftool/dbconfig/20221031-191601-ladsgroup.json
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T318605)', diff saved to https://phabricator.wikimedia.org/P37264 and previous config saved to /var/cache/conftool/dbconfig/20221031-190054-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T318605)', diff saved to https://phabricator.wikimedia.org/P37263 and previous config saved to /var/cache/conftool/dbconfig/20221031-184729-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 18:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37262 and previous config saved to /var/cache/conftool/dbconfig/20221031-184707-ladsgroup.json
  • 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P37261 and previous config saved to /var/cache/conftool/dbconfig/20221031-183201-ladsgroup.json
  • 18:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T318955)', diff saved to https://phabricator.wikimedia.org/P37260 and previous config saved to /var/cache/conftool/dbconfig/20221031-183052-ladsgroup.json
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P37259 and previous config saved to /var/cache/conftool/dbconfig/20221031-181654-ladsgroup.json
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P37258 and previous config saved to /var/cache/conftool/dbconfig/20221031-181546-ladsgroup.json
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37257 and previous config saved to /var/cache/conftool/dbconfig/20221031-180148-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T318605)', diff saved to https://phabricator.wikimedia.org/P37256 and previous config saved to /var/cache/conftool/dbconfig/20221031-180049-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P37255 and previous config saved to /var/cache/conftool/dbconfig/20221031-180039-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37254 and previous config saved to /var/cache/conftool/dbconfig/20221031-180021-ladsgroup.json
  • 17:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 17:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T318955)', diff saved to https://phabricator.wikimedia.org/P37250 and previous config saved to /var/cache/conftool/dbconfig/20221031-174301-ladsgroup.json
  • 17:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T318955)', diff saved to https://phabricator.wikimedia.org/P37249 and previous config saved to /var/cache/conftool/dbconfig/20221031-174052-ladsgroup.json
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P37248 and previous config saved to /var/cache/conftool/dbconfig/20221031-173008-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P37247 and previous config saved to /var/cache/conftool/dbconfig/20221031-172755-ladsgroup.json
  • 17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P37246 and previous config saved to /var/cache/conftool/dbconfig/20221031-172545-ladsgroup.json
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37245 and previous config saved to /var/cache/conftool/dbconfig/20221031-171501-ladsgroup.json
  • 17:14 mutante: contint1001 - racadm serveraction powercyle - crashed
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P37244 and previous config saved to /var/cache/conftool/dbconfig/20221031-171248-ladsgroup.json
  • 17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P37243 and previous config saved to /var/cache/conftool/dbconfig/20221031-171039-ladsgroup.json
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37242 and previous config saved to /var/cache/conftool/dbconfig/20221031-170935-ladsgroup.json
  • 17:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T318605)', diff saved to https://phabricator.wikimedia.org/P37241 and previous config saved to /var/cache/conftool/dbconfig/20221031-170925-ladsgroup.json
  • 17:02 mutante: contint1001 - just went fully down without maintenance work, fortunately 2001 is the prod CI server currently
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T318955)', diff saved to https://phabricator.wikimedia.org/P37240 and previous config saved to /var/cache/conftool/dbconfig/20221031-165742-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T318955)', diff saved to https://phabricator.wikimedia.org/P37239 and previous config saved to /var/cache/conftool/dbconfig/20221031-165532-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T318955)', diff saved to https://phabricator.wikimedia.org/P37238 and previous config saved to /var/cache/conftool/dbconfig/20221031-165532-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 16:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T318955)', diff saved to https://phabricator.wikimedia.org/P37237 and previous config saved to /var/cache/conftool/dbconfig/20221031-165511-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P37236 and previous config saved to /var/cache/conftool/dbconfig/20221031-165418-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T318955)', diff saved to https://phabricator.wikimedia.org/P37235 and previous config saved to /var/cache/conftool/dbconfig/20221031-164431-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37234 and previous config saved to /var/cache/conftool/dbconfig/20221031-164409-ladsgroup.json
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P37233 and previous config saved to /var/cache/conftool/dbconfig/20221031-164004-ladsgroup.json
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P37232 and previous config saved to /var/cache/conftool/dbconfig/20221031-163912-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P37231 and previous config saved to /var/cache/conftool/dbconfig/20221031-162903-ladsgroup.json
  • 16:25 hashar@deploy1002: Finished deploy [integration/docroot@0ff8642]: build: Use disableProcessTimeout() for serve commands only (duration: 00m 25s)
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P37230 and previous config saved to /var/cache/conftool/dbconfig/20221031-162458-ladsgroup.json
  • 16:24 hashar@deploy1002: Started deploy [integration/docroot@0ff8642]: build: Use disableProcessTimeout() for serve commands only
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T318605)', diff saved to https://phabricator.wikimedia.org/P37229 and previous config saved to /var/cache/conftool/dbconfig/20221031-162405-ladsgroup.json
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37228 and previous config saved to /var/cache/conftool/dbconfig/20221031-162311-ladsgroup.json
  • 16:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T318605)', diff saved to https://phabricator.wikimedia.org/P37227 and previous config saved to /var/cache/conftool/dbconfig/20221031-162249-ladsgroup.json
  • 16:17 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet,service=varnish-fe
  • 16:17 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet,service=ats-be
  • 16:17 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet,service=ats-tls
  • 16:15 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T318605)', diff saved to https://phabricator.wikimedia.org/P37226 and previous config saved to /var/cache/conftool/dbconfig/20221031-161448-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T318605)', diff saved to https://phabricator.wikimedia.org/P37225 and previous config saved to /var/cache/conftool/dbconfig/20221031-161426-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P37224 and previous config saved to /var/cache/conftool/dbconfig/20221031-161356-ladsgroup.json
  • 16:13 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T318955)', diff saved to https://phabricator.wikimedia.org/P37223 and previous config saved to /var/cache/conftool/dbconfig/20221031-160951-ladsgroup.json
  • 16:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P37222 and previous config saved to /var/cache/conftool/dbconfig/20221031-160743-ladsgroup.json
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T318955)', diff saved to https://phabricator.wikimedia.org/P37221 and previous config saved to /var/cache/conftool/dbconfig/20221031-160641-ladsgroup.json
  • 16:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T318955)', diff saved to https://phabricator.wikimedia.org/P37220 and previous config saved to /var/cache/conftool/dbconfig/20221031-160620-ladsgroup.json
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P37219 and previous config saved to /var/cache/conftool/dbconfig/20221031-155919-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37218 and previous config saved to /var/cache/conftool/dbconfig/20221031-155850-ladsgroup.json
  • 15:56 ryankemper: [Elastic] `ryankemper@elastic2052:~$ sudo reboot` to grab latest kernel
  • 15:54 ryankemper: [Elastic] `ryankemper@elastic2043:~$ sudo pool` (cluster back to green and DIMM A2 has been switched out by dc-ops); marked as `Active` in netbox
  • 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.drmrs.wmnet,service=varnish-fe
  • 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.drmrs.wmnet,service=ats-be
  • 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.drmrs.wmnet,service=ats-tls
  • 15:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P37217 and previous config saved to /var/cache/conftool/dbconfig/20221031-155236-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P37216 and previous config saved to /var/cache/conftool/dbconfig/20221031-155113-ladsgroup.json
  • 15:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.drmrs.wmnet,service=varnish-fe
  • 15:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.drmrs.wmnet,service=ats-be
  • 15:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.drmrs.wmnet,service=ats-tls
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37215 and previous config saved to /var/cache/conftool/dbconfig/20221031-154638-ladsgroup.json
  • 15:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37214 and previous config saved to /var/cache/conftool/dbconfig/20221031-154627-ladsgroup.json
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P37213 and previous config saved to /var/cache/conftool/dbconfig/20221031-154413-ladsgroup.json
  • 15:43 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 03m 34s)
  • 15:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 03m 43s)
  • 15:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 15:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T318605)', diff saved to https://phabricator.wikimedia.org/P37212 and previous config saved to /var/cache/conftool/dbconfig/20221031-153730-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P37211 and previous config saved to /var/cache/conftool/dbconfig/20221031-153607-ladsgroup.json
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P37210 and previous config saved to /var/cache/conftool/dbconfig/20221031-153121-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T318605)', diff saved to https://phabricator.wikimedia.org/P37209 and previous config saved to /var/cache/conftool/dbconfig/20221031-152906-ladsgroup.json
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T318955)', diff saved to https://phabricator.wikimedia.org/P37206 and previous config saved to /var/cache/conftool/dbconfig/20221031-151851-ladsgroup.json
  • 15:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P37205 and previous config saved to /var/cache/conftool/dbconfig/20221031-151612-ladsgroup.json
  • 15:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37204 and previous config saved to /var/cache/conftool/dbconfig/20221031-150919-ladsgroup.json
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P37203 and previous config saved to /var/cache/conftool/dbconfig/20221031-150517-ladsgroup.json
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37202 and previous config saved to /var/cache/conftool/dbconfig/20221031-150105-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P37201 and previous config saved to /var/cache/conftool/dbconfig/20221031-145413-ladsgroup.json
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P37200 and previous config saved to /var/cache/conftool/dbconfig/20221031-145012-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37199 and previous config saved to /var/cache/conftool/dbconfig/20221031-144840-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T318955)', diff saved to https://phabricator.wikimedia.org/P37198 and previous config saved to /var/cache/conftool/dbconfig/20221031-144819-ladsgroup.json
  • 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T318605)', diff saved to https://phabricator.wikimedia.org/P37197 and previous config saved to /var/cache/conftool/dbconfig/20221031-144511-ladsgroup.json
  • 14:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 14:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37196 and previous config saved to /var/cache/conftool/dbconfig/20221031-144449-ladsgroup.json
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P37195 and previous config saved to /var/cache/conftool/dbconfig/20221031-143906-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P37194 and previous config saved to /var/cache/conftool/dbconfig/20221031-143507-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T318605)', diff saved to https://phabricator.wikimedia.org/P37193 and previous config saved to /var/cache/conftool/dbconfig/20221031-143458-ladsgroup.json
  • 14:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 14:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37192 and previous config saved to /var/cache/conftool/dbconfig/20221031-143430-ladsgroup.json
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P37191 and previous config saved to /var/cache/conftool/dbconfig/20221031-143312-ladsgroup.json
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P37190 and previous config saved to /var/cache/conftool/dbconfig/20221031-142942-ladsgroup.json
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37189 and previous config saved to /var/cache/conftool/dbconfig/20221031-142400-ladsgroup.json
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P37188 and previous config saved to /var/cache/conftool/dbconfig/20221031-141924-ladsgroup.json
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P37187 and previous config saved to /var/cache/conftool/dbconfig/20221031-141806-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37186 and previous config saved to /var/cache/conftool/dbconfig/20221031-141701-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P37185 and previous config saved to /var/cache/conftool/dbconfig/20221031-141436-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37184 and previous config saved to /var/cache/conftool/dbconfig/20221031-141404-ladsgroup.json
  • 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T318955)', diff saved to https://phabricator.wikimedia.org/P37183 and previous config saved to /var/cache/conftool/dbconfig/20221031-141342-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P37182 and previous config saved to /var/cache/conftool/dbconfig/20221031-140417-ladsgroup.json
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T318955)', diff saved to https://phabricator.wikimedia.org/P37181 and previous config saved to /var/cache/conftool/dbconfig/20221031-140259-ladsgroup.json
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P37180 and previous config saved to /var/cache/conftool/dbconfig/20221031-140153-ladsgroup.json
  • 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37179 and previous config saved to /var/cache/conftool/dbconfig/20221031-135929-ladsgroup.json
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P37178 and previous config saved to /var/cache/conftool/dbconfig/20221031-135836-ladsgroup.json
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T318955)', diff saved to https://phabricator.wikimedia.org/P37177 and previous config saved to /var/cache/conftool/dbconfig/20221031-135039-ladsgroup.json
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T318955)', diff saved to https://phabricator.wikimedia.org/P37176 and previous config saved to /var/cache/conftool/dbconfig/20221031-135013-ladsgroup.json
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37175 and previous config saved to /var/cache/conftool/dbconfig/20221031-134911-ladsgroup.json
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T318955)', diff saved to https://phabricator.wikimedia.org/P37170 and previous config saved to /var/cache/conftool/dbconfig/20221031-132823-ladsgroup.json
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:10 urbanecm@deploy1002: urbanecm and mlitn: Backport for Update i18n for ca, nb, fi & hu (T300064) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P37165 and previous config saved to /var/cache/conftool/dbconfig/20221031-131028-ladsgroup.json
  • 13:10 urbanecm@deploy1002: Started scap: Backport for Update i18n for ca, nb, fi & hu (T300064)
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P37164 and previous config saved to /var/cache/conftool/dbconfig/20221031-130834-marostegui.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37163 and previous config saved to /var/cache/conftool/dbconfig/20221031-130651-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T318605)', diff saved to https://phabricator.wikimedia.org/P37162 and previous config saved to /var/cache/conftool/dbconfig/20221031-130629-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T318955)', diff saved to https://phabricator.wikimedia.org/P37161 and previous config saved to /var/cache/conftool/dbconfig/20221031-130454-ladsgroup.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P37160 and previous config saved to /var/cache/conftool/dbconfig/20221031-130244-marostegui.json
  • 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37159 and previous config saved to /var/cache/conftool/dbconfig/20221031-130217-ladsgroup.json
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P37158 and previous config saved to /var/cache/conftool/dbconfig/20221031-130016-marostegui.json
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P37157 and previous config saved to /var/cache/conftool/dbconfig/20221031-125521-ladsgroup.json
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37156 and previous config saved to /var/cache/conftool/dbconfig/20221031-125509-ladsgroup.json
  • 12:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T318955)', diff saved to https://phabricator.wikimedia.org/P37155 and previous config saved to /var/cache/conftool/dbconfig/20221031-125350-ladsgroup.json
  • 12:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T318955)', diff saved to https://phabricator.wikimedia.org/P37154 and previous config saved to /var/cache/conftool/dbconfig/20221031-125329-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P37153 and previous config saved to /var/cache/conftool/dbconfig/20221031-125123-ladsgroup.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P37152 and previous config saved to /var/cache/conftool/dbconfig/20221031-124836-marostegui.json
  • 12:47 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P37151 and previous config saved to /var/cache/conftool/dbconfig/20221031-124711-ladsgroup.json
  • 12:46 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:42 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T318955)', diff saved to https://phabricator.wikimedia.org/P37150 and previous config saved to /var/cache/conftool/dbconfig/20221031-124015-ladsgroup.json
  • 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P37149 and previous config saved to /var/cache/conftool/dbconfig/20221031-123822-ladsgroup.json
  • 12:36 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P37148 and previous config saved to /var/cache/conftool/dbconfig/20221031-123616-ladsgroup.json
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37147 and previous config saved to /var/cache/conftool/dbconfig/20221031-123330-marostegui.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37146 and previous config saved to /var/cache/conftool/dbconfig/20221031-123222-marostegui.json
  • 12:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37145 and previous config saved to /var/cache/conftool/dbconfig/20221031-123211-marostegui.json
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P37144 and previous config saved to /var/cache/conftool/dbconfig/20221031-123204-ladsgroup.json
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P37143 and previous config saved to /var/cache/conftool/dbconfig/20221031-122314-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T318605)', diff saved to https://phabricator.wikimedia.org/P37142 and previous config saved to /var/cache/conftool/dbconfig/20221031-122109-ladsgroup.json
  • 12:18 gehel: repooling wdqs1007 - catched up on lag - T322010
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P37141 and previous config saved to /var/cache/conftool/dbconfig/20221031-121705-marostegui.json
  • 12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37140 and previous config saved to /var/cache/conftool/dbconfig/20221031-121658-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T318605)', diff saved to https://phabricator.wikimedia.org/P37139 and previous config saved to /var/cache/conftool/dbconfig/20221031-121108-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T318605)', diff saved to https://phabricator.wikimedia.org/P37138 and previous config saved to /var/cache/conftool/dbconfig/20221031-121043-ladsgroup.json
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T318955)', diff saved to https://phabricator.wikimedia.org/P37137 and previous config saved to /var/cache/conftool/dbconfig/20221031-120807-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T318605)', diff saved to https://phabricator.wikimedia.org/P37136 and previous config saved to /var/cache/conftool/dbconfig/20221031-120644-ladsgroup.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P37135 and previous config saved to /var/cache/conftool/dbconfig/20221031-120158-marostegui.json
  • 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T318955)', diff saved to https://phabricator.wikimedia.org/P37134 and previous config saved to /var/cache/conftool/dbconfig/20221031-115639-ladsgroup.json
  • 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T318955)', diff saved to https://phabricator.wikimedia.org/P37133 and previous config saved to /var/cache/conftool/dbconfig/20221031-115618-ladsgroup.json
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P37132 and previous config saved to /var/cache/conftool/dbconfig/20221031-115536-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P37131 and previous config saved to /var/cache/conftool/dbconfig/20221031-115138-ladsgroup.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37130 and previous config saved to /var/cache/conftool/dbconfig/20221031-114652-marostegui.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37129 and previous config saved to /var/cache/conftool/dbconfig/20221031-114443-marostegui.json
  • 11:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 11:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T321123)', diff saved to https://phabricator.wikimedia.org/P37128 and previous config saved to /var/cache/conftool/dbconfig/20221031-114337-marostegui.json
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P37127 and previous config saved to /var/cache/conftool/dbconfig/20221031-114111-ladsgroup.json
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P37126 and previous config saved to /var/cache/conftool/dbconfig/20221031-114030-ladsgroup.json
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T318955)', diff saved to https://phabricator.wikimedia.org/P37125 and previous config saved to /var/cache/conftool/dbconfig/20221031-113959-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T318955)', diff saved to https://phabricator.wikimedia.org/P37124 and previous config saved to /var/cache/conftool/dbconfig/20221031-113938-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P37123 and previous config saved to /var/cache/conftool/dbconfig/20221031-113631-ladsgroup.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P37122 and previous config saved to /var/cache/conftool/dbconfig/20221031-112831-marostegui.json
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P37121 and previous config saved to /var/cache/conftool/dbconfig/20221031-112605-ladsgroup.json
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T318605)', diff saved to https://phabricator.wikimedia.org/P37120 and previous config saved to /var/cache/conftool/dbconfig/20221031-112523-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P37119 and previous config saved to /var/cache/conftool/dbconfig/20221031-112431-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T318605)', diff saved to https://phabricator.wikimedia.org/P37118 and previous config saved to /var/cache/conftool/dbconfig/20221031-112125-ladsgroup.json
  • 11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37117 and previous config saved to /var/cache/conftool/dbconfig/20221031-111641-ladsgroup.json
  • 11:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P37116 and previous config saved to /var/cache/conftool/dbconfig/20221031-111324-marostegui.json
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T318605)', diff saved to https://phabricator.wikimedia.org/P37115 and previous config saved to /var/cache/conftool/dbconfig/20221031-111153-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37114 and previous config saved to /var/cache/conftool/dbconfig/20221031-111132-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T318955)', diff saved to https://phabricator.wikimedia.org/P37113 and previous config saved to /var/cache/conftool/dbconfig/20221031-111058-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P37112 and previous config saved to /var/cache/conftool/dbconfig/20221031-110925-ladsgroup.json
  • 11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T318955)', diff saved to https://phabricator.wikimedia.org/P37111 and previous config saved to /var/cache/conftool/dbconfig/20221031-110003-ladsgroup.json
  • 10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T318955)', diff saved to https://phabricator.wikimedia.org/P37110 and previous config saved to /var/cache/conftool/dbconfig/20221031-105941-ladsgroup.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T321123)', diff saved to https://phabricator.wikimedia.org/P37109 and previous config saved to /var/cache/conftool/dbconfig/20221031-105818-marostegui.json
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P37108 and previous config saved to /var/cache/conftool/dbconfig/20221031-105625-ladsgroup.json
  • 10:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37107 and previous config saved to /var/cache/conftool/dbconfig/20221031-105605-marostegui.json
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T318955)', diff saved to https://phabricator.wikimedia.org/P37106 and previous config saved to /var/cache/conftool/dbconfig/20221031-105418-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P37105 and previous config saved to /var/cache/conftool/dbconfig/20221031-104435-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37104 and previous config saved to /var/cache/conftool/dbconfig/20221031-104415-ladsgroup.json
  • 10:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T318955)', diff saved to https://phabricator.wikimedia.org/P37103 and previous config saved to /var/cache/conftool/dbconfig/20221031-104238-ladsgroup.json
  • 10:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37102 and previous config saved to /var/cache/conftool/dbconfig/20221031-104217-ladsgroup.json
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P37101 and previous config saved to /var/cache/conftool/dbconfig/20221031-104119-ladsgroup.json
  • 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P37100 and previous config saved to /var/cache/conftool/dbconfig/20221031-104059-marostegui.json
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P37099 and previous config saved to /var/cache/conftool/dbconfig/20221031-102928-ladsgroup.json
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P37098 and previous config saved to /var/cache/conftool/dbconfig/20221031-102908-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P37097 and previous config saved to /var/cache/conftool/dbconfig/20221031-102710-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T318605)', diff saved to https://phabricator.wikimedia.org/P37096 and previous config saved to /var/cache/conftool/dbconfig/20221031-102627-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37095 and previous config saved to /var/cache/conftool/dbconfig/20221031-102612-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T318605)', diff saved to https://phabricator.wikimedia.org/P37094 and previous config saved to /var/cache/conftool/dbconfig/20221031-102606-ladsgroup.json
  • 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P37093 and previous config saved to /var/cache/conftool/dbconfig/20221031-102552-marostegui.json
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T318955)', diff saved to https://phabricator.wikimedia.org/P37092 and previous config saved to /var/cache/conftool/dbconfig/20221031-101422-ladsgroup.json
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P37091 and previous config saved to /var/cache/conftool/dbconfig/20221031-101402-ladsgroup.json
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P37090 and previous config saved to /var/cache/conftool/dbconfig/20221031-101203-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P37089 and previous config saved to /var/cache/conftool/dbconfig/20221031-101059-ladsgroup.json
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37088 and previous config saved to /var/cache/conftool/dbconfig/20221031-101046-marostegui.json
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37087 and previous config saved to /var/cache/conftool/dbconfig/20221031-100935-marostegui.json
  • 10:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37086 and previous config saved to /var/cache/conftool/dbconfig/20221031-100913-marostegui.json
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T318955)', diff saved to https://phabricator.wikimedia.org/P37085 and previous config saved to /var/cache/conftool/dbconfig/20221031-100316-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T318955)', diff saved to https://phabricator.wikimedia.org/P37084 and previous config saved to /var/cache/conftool/dbconfig/20221031-100255-ladsgroup.json
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37083 and previous config saved to /var/cache/conftool/dbconfig/20221031-095855-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37082 and previous config saved to /var/cache/conftool/dbconfig/20221031-095657-ladsgroup.json
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P37081 and previous config saved to /var/cache/conftool/dbconfig/20221031-095551-ladsgroup.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P37080 and previous config saved to /var/cache/conftool/dbconfig/20221031-095407-marostegui.json
  • 09:52 gehel: depooling wdqs1007 while it catches up on lag - T322010
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P37079 and previous config saved to /var/cache/conftool/dbconfig/20221031-094748-ladsgroup.json
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37078 and previous config saved to /var/cache/conftool/dbconfig/20221031-094501-ladsgroup.json
  • 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37077 and previous config saved to /var/cache/conftool/dbconfig/20221031-094439-ladsgroup.json
  • 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T318605)', diff saved to https://phabricator.wikimedia.org/P37076 and previous config saved to /var/cache/conftool/dbconfig/20221031-094045-ladsgroup.json
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P37075 and previous config saved to /var/cache/conftool/dbconfig/20221031-093900-marostegui.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P37074 and previous config saved to /var/cache/conftool/dbconfig/20221031-093242-ladsgroup.json
  • 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T318605)', diff saved to https://phabricator.wikimedia.org/P37073 and previous config saved to /var/cache/conftool/dbconfig/20221031-093102-ladsgroup.json
  • 09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37072 and previous config saved to /var/cache/conftool/dbconfig/20221031-093012-ladsgroup.json
  • 09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P37071 and previous config saved to /var/cache/conftool/dbconfig/20221031-092933-ladsgroup.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37070 and previous config saved to /var/cache/conftool/dbconfig/20221031-092354-marostegui.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37069 and previous config saved to /var/cache/conftool/dbconfig/20221031-092242-marostegui.json
  • 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T321123)', diff saved to https://phabricator.wikimedia.org/P37068 and previous config saved to /var/cache/conftool/dbconfig/20221031-092221-marostegui.json
  • 09:17 Emperor: set thanos ring replicas to 3.40 T311690
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T318955)', diff saved to https://phabricator.wikimedia.org/P37067 and previous config saved to /var/cache/conftool/dbconfig/20221031-091735-ladsgroup.json
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P37066 and previous config saved to /var/cache/conftool/dbconfig/20221031-091426-ladsgroup.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P37065 and previous config saved to /var/cache/conftool/dbconfig/20221031-090714-marostegui.json
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T318955)', diff saved to https://phabricator.wikimedia.org/P37064 and previous config saved to /var/cache/conftool/dbconfig/20221031-090640-ladsgroup.json
  • 09:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37063 and previous config saved to /var/cache/conftool/dbconfig/20221031-085920-ladsgroup.json
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37062 and previous config saved to /var/cache/conftool/dbconfig/20221031-085839-ladsgroup.json
  • 08:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P37061 and previous config saved to /var/cache/conftool/dbconfig/20221031-085208-marostegui.json
  • 08:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 08:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 08:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 08:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37060 and previous config saved to /var/cache/conftool/dbconfig/20221031-084751-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T321123)', diff saved to https://phabricator.wikimedia.org/P37059 and previous config saved to /var/cache/conftool/dbconfig/20221031-083701-marostegui.json
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T321123)', diff saved to https://phabricator.wikimedia.org/P37058 and previous config saved to /var/cache/conftool/dbconfig/20221031-083449-marostegui.json
  • 08:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 08:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T321123)', diff saved to https://phabricator.wikimedia.org/P37057 and previous config saved to /var/cache/conftool/dbconfig/20221031-083342-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P37056 and previous config saved to /var/cache/conftool/dbconfig/20221031-081836-marostegui.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P37055 and previous config saved to /var/cache/conftool/dbconfig/20221031-080329-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T321123)', diff saved to https://phabricator.wikimedia.org/P37054 and previous config saved to /var/cache/conftool/dbconfig/20221031-074823-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T321123)', diff saved to https://phabricator.wikimedia.org/P37053 and previous config saved to /var/cache/conftool/dbconfig/20221031-074611-marostegui.json
  • 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T321123)', diff saved to https://phabricator.wikimedia.org/P37052 and previous config saved to /var/cache/conftool/dbconfig/20221031-074549-marostegui.json
  • 07:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 07:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P37051 and previous config saved to /var/cache/conftool/dbconfig/20221031-073042-marostegui.json
  • 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P37050 and previous config saved to /var/cache/conftool/dbconfig/20221031-071536-marostegui.json
  • 07:14 kartik@deploy1002: Finished scap: Backport for Enable Section Translation in Hawaiian, Pashto and Xhosa WPs (T317289) (duration: 06m 48s)
  • 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 07:08 kartik@deploy1002: kartik and kartik: Backport for Enable Section Translation in Hawaiian, Pashto and Xhosa WPs (T317289) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:07 kartik@deploy1002: Started scap: Backport for Enable Section Translation in Hawaiian, Pashto and Xhosa WPs (T317289)
  • 07:02 ryankemper: [WDQS] `ryankemper@wdqs1007:~$ sudo systemctl restart wdqs-blazegraph.service`
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T321123)', diff saved to https://phabricator.wikimedia.org/P37049 and previous config saved to /var/cache/conftool/dbconfig/20221031-070029-marostegui.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T321123)', diff saved to https://phabricator.wikimedia.org/P37048 and previous config saved to /var/cache/conftool/dbconfig/20221031-065817-marostegui.json
  • 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 06:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T321123)', diff saved to https://phabricator.wikimedia.org/P37047 and previous config saved to /var/cache/conftool/dbconfig/20221031-065756-marostegui.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P37046 and previous config saved to /var/cache/conftool/dbconfig/20221031-064249-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T321123)', diff saved to https://phabricator.wikimedia.org/P37044 and previous config saved to /var/cache/conftool/dbconfig/20221031-061236-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T321123)', diff saved to https://phabricator.wikimedia.org/P37043 and previous config saved to /var/cache/conftool/dbconfig/20221031-061026-marostegui.json
  • 06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 06:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 06:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 05:42 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply

2022-10-29

  • 11:25 taavi: deploy patch for T321971
  • 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply

2022-10-28

  • 20:42 mutante: clouddumps* - deployed gerrit:848444 - as kind of expected it fails - most likely the project dirs are not automatically created before rsync runs the first time - T57503
  • 20:37 mutante: clouddumps1001 - puppet run after merging gerrit:848441 for kiwix, changed ferm status from "stopped" to "running". manually ran 'sudo systemctl start kiwix-mirror-update' T57503
  • 19:17 mutante: contint* - changing source for scap repo to gitlab - gerrit:850246 T321847
  • 18:54 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2326f9c]: Import cirrus indexes to hdfs (duration: 02m 07s)
  • 18:52 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2326f9c]: Import cirrus indexes to hdfs
  • 18:44 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:11 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 18:08 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:08 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 17:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
  • 17:31 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4052.ulsfo.wmnet,service=varnish-fe
  • 17:31 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4052.ulsfo.wmnet,service=ats-tls
  • 17:31 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4052.ulsfo.wmnet,service=ats-be
  • 17:28 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS buster
  • 17:09 cparle@deploy1002: Finished deploy [airflow-dags/platform_eng@c849762]: (no justification provided) (duration: 00m 05s)
  • 17:09 cparle@deploy1002: Started deploy [airflow-dags/platform_eng@c849762]: (no justification provided)
  • 17:07 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@c849762]: (no justification provided) (duration: 00m 11s)
  • 17:07 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@c849762]: (no justification provided)
  • 17:02 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 16:57 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 16:38 mforns@deploy1002: Finished deploy [airflow-dags/analytics@62b4181]: testing scap since we are having problems with other instances (duration: 00m 04s)
  • 16:38 mforns@deploy1002: Started deploy [airflow-dags/analytics@62b4181]: testing scap since we are having problems with other instances
  • 16:31 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 16:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T321123)', diff saved to https://phabricator.wikimedia.org/P37038 and previous config saved to /var/cache/conftool/dbconfig/20221028-163102-marostegui.json
  • 16:29 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4052.ulsfo.wmnet with OS buster
  • 16:29 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 16:27 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4052.ulsfo.wmnet with OS buster
  • 16:27 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 16:24 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:22 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:21 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 16:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P37037 and previous config saved to /var/cache/conftool/dbconfig/20221028-161555-marostegui.json
  • 16:14 cjming: deployed ReadingLists on beta cluster for authenticated users - https://gerrit.wikimedia.org/r/850516 (https://phabricator.wikimedia.org/T317935)
  • 16:13 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 16:07 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4010.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:06 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs4010.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:05 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:04 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:04 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4009.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:03 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs4009.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:02 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P37036 and previous config saved to /var/cache/conftool/dbconfig/20221028-160047-marostegui.json
  • 15:50 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 15:50 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 15:49 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T321123)', diff saved to https://phabricator.wikimedia.org/P37035 and previous config saved to /var/cache/conftool/dbconfig/20221028-154541-marostegui.json
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T321123)', diff saved to https://phabricator.wikimedia.org/P37034 and previous config saved to /var/cache/conftool/dbconfig/20221028-154328-marostegui.json
  • 15:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 15:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T321123)', diff saved to https://phabricator.wikimedia.org/P37033 and previous config saved to /var/cache/conftool/dbconfig/20221028-154307-marostegui.json
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P37031 and previous config saved to /var/cache/conftool/dbconfig/20221028-152800-marostegui.json
  • 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P37030 and previous config saved to /var/cache/conftool/dbconfig/20221028-151252-marostegui.json
  • 15:12 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 15:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T321123)', diff saved to https://phabricator.wikimedia.org/P37029 and previous config saved to /var/cache/conftool/dbconfig/20221028-145746-marostegui.json
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T321123)', diff saved to https://phabricator.wikimedia.org/P37028 and previous config saved to /var/cache/conftool/dbconfig/20221028-145533-marostegui.json
  • 14:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 14:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T321123)', diff saved to https://phabricator.wikimedia.org/P37027 and previous config saved to /var/cache/conftool/dbconfig/20221028-145512-marostegui.json
  • 14:53 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 14:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P37026 and previous config saved to /var/cache/conftool/dbconfig/20221028-144005-marostegui.json
  • 14:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain
  • 14:37 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P37025 and previous config saved to /var/cache/conftool/dbconfig/20221028-142459-marostegui.json
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T321123)', diff saved to https://phabricator.wikimedia.org/P37024 and previous config saved to /var/cache/conftool/dbconfig/20221028-140952-marostegui.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T321123)', diff saved to https://phabricator.wikimedia.org/P37023 and previous config saved to /var/cache/conftool/dbconfig/20221028-140613-marostegui.json
  • 14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P37022 and previous config saved to /var/cache/conftool/dbconfig/20221028-140552-marostegui.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P37021 and previous config saved to /var/cache/conftool/dbconfig/20221028-135045-marostegui.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P37020 and previous config saved to /var/cache/conftool/dbconfig/20221028-133538-marostegui.json
  • 13:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P37019 and previous config saved to /var/cache/conftool/dbconfig/20221028-132032-marostegui.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P37018 and previous config saved to /var/cache/conftool/dbconfig/20221028-131920-marostegui.json
  • 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P37017 and previous config saved to /var/cache/conftool/dbconfig/20221028-131905-root.json
  • 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T321123)', diff saved to https://phabricator.wikimedia.org/P37016 and previous config saved to /var/cache/conftool/dbconfig/20221028-131858-marostegui.json
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P37015 and previous config saved to /var/cache/conftool/dbconfig/20221028-131353-root.json
  • 13:12 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P37014 and previous config saved to /var/cache/conftool/dbconfig/20221028-130851-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P37009 and previous config saved to /var/cache/conftool/dbconfig/20221028-124849-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P37008 and previous config saved to /var/cache/conftool/dbconfig/20221028-124845-marostegui.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P37007 and previous config saved to /var/cache/conftool/dbconfig/20221028-124343-root.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P37006 and previous config saved to /var/cache/conftool/dbconfig/20221028-123842-root.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P36998 and previous config saved to /var/cache/conftool/dbconfig/20221028-121557-marostegui.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36997 and previous config saved to /var/cache/conftool/dbconfig/20221028-121333-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36996 and previous config saved to /var/cache/conftool/dbconfig/20221028-120832-root.json
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36995 and previous config saved to /var/cache/conftool/dbconfig/20221028-120334-root.json
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P36994 and previous config saved to /var/cache/conftool/dbconfig/20221028-120050-marostegui.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36993 and previous config saved to /var/cache/conftool/dbconfig/20221028-115828-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36992 and previous config saved to /var/cache/conftool/dbconfig/20221028-115327-root.json
  • 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36991 and previous config saved to /var/cache/conftool/dbconfig/20221028-114829-root.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T321123)', diff saved to https://phabricator.wikimedia.org/P36990 and previous config saved to /var/cache/conftool/dbconfig/20221028-114544-marostegui.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T321123)', diff saved to https://phabricator.wikimedia.org/P36989 and previous config saved to /var/cache/conftool/dbconfig/20221028-114332-marostegui.json
  • 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36988 and previous config saved to /var/cache/conftool/dbconfig/20221028-114323-root.json
  • 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36987 and previous config saved to /var/cache/conftool/dbconfig/20221028-114253-marostegui.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36986 and previous config saved to /var/cache/conftool/dbconfig/20221028-113822-root.json
  • 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36985 and previous config saved to /var/cache/conftool/dbconfig/20221028-113324-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36984 and previous config saved to /var/cache/conftool/dbconfig/20221028-112818-root.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P36983 and previous config saved to /var/cache/conftool/dbconfig/20221028-112746-marostegui.json
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti4003.ulsfo.wmnet
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36982 and previous config saved to /var/cache/conftool/dbconfig/20221028-112317-root.json
  • 11:20 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1026 es1027 es1028 for upgrade', diff saved to https://phabricator.wikimedia.org/P36981 and previous config saved to /var/cache/conftool/dbconfig/20221028-111805-root.json
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1029 as es1 master, es1030 as es2 master, es1031 as es3 master', diff saved to https://phabricator.wikimedia.org/P36980 and previous config saved to /var/cache/conftool/dbconfig/20221028-111707-marostegui.json
  • 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P36979 and previous config saved to /var/cache/conftool/dbconfig/20221028-111240-marostegui.json
  • 11:11 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4003.ulsfo.wmnet
  • 11:05 cparle@deploy1002: Finished deploy [airflow-dags/platform_eng@c849762]: (no justification provided) (duration: 00m 15s)
  • 11:05 cparle@deploy1002: Started deploy [airflow-dags/platform_eng@c849762]: (no justification provided)
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti4003.ulsfo.wmnet with reason: Remove from cluster for eventual decom
  • 10:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti4003.ulsfo.wmnet with reason: Remove from cluster for eventual decom
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36978 and previous config saved to /var/cache/conftool/dbconfig/20221028-105733-marostegui.json
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36977 and previous config saved to /var/cache/conftool/dbconfig/20221028-105520-marostegui.json
  • 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T321123)', diff saved to https://phabricator.wikimedia.org/P36976 and previous config saved to /var/cache/conftool/dbconfig/20221028-105438-marostegui.json
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P36975 and previous config saved to /var/cache/conftool/dbconfig/20221028-103932-marostegui.json
  • 10:14 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T321123)', diff saved to https://phabricator.wikimedia.org/P36973 and previous config saved to /var/cache/conftool/dbconfig/20221028-100918-marostegui.json
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T321123)', diff saved to https://phabricator.wikimedia.org/P36972 and previous config saved to /var/cache/conftool/dbconfig/20221028-100706-marostegui.json
  • 10:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36971 and previous config saved to /var/cache/conftool/dbconfig/20221028-100644-marostegui.json
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow1002.eqiad.wmnet
  • 09:53 moritzm: drain ganeti4003 for eventual decom T317247
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P36970 and previous config saved to /var/cache/conftool/dbconfig/20221028-095138-marostegui.json
  • 09:46 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow1002.eqiad.wmnet
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P36969 and previous config saved to /var/cache/conftool/dbconfig/20221028-093631-marostegui.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36968 and previous config saved to /var/cache/conftool/dbconfig/20221028-092125-marostegui.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36967 and previous config saved to /var/cache/conftool/dbconfig/20221028-091912-marostegui.json
  • 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:18 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo and group 1
  • 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 09:17 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo and group 1
  • 09:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 09:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 09:05 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo and group 1
  • 09:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo and group 1
  • 09:05 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster eqiad and group A
  • 09:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster eqiad and group A
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
  • 08:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
  • 08:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 07:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 07:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 07:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T318950)', diff saved to https://phabricator.wikimedia.org/P36965 and previous config saved to /var/cache/conftool/dbconfig/20221028-051110-ladsgroup.json
  • 04:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P36964 and previous config saved to /var/cache/conftool/dbconfig/20221028-045603-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P36963 and previous config saved to /var/cache/conftool/dbconfig/20221028-044057-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T318950)', diff saved to https://phabricator.wikimedia.org/P36962 and previous config saved to /var/cache/conftool/dbconfig/20221028-042550-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T318950)', diff saved to https://phabricator.wikimedia.org/P36961 and previous config saved to /var/cache/conftool/dbconfig/20221028-042443-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T318950)', diff saved to https://phabricator.wikimedia.org/P36960 and previous config saved to /var/cache/conftool/dbconfig/20221028-042421-ladsgroup.json
  • 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P36959 and previous config saved to /var/cache/conftool/dbconfig/20221028-040915-ladsgroup.json
  • 03:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P36958 and previous config saved to /var/cache/conftool/dbconfig/20221028-035409-ladsgroup.json
  • 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T318950)', diff saved to https://phabricator.wikimedia.org/P36957 and previous config saved to /var/cache/conftool/dbconfig/20221028-033902-ladsgroup.json
  • 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P36954 and previous config saved to /var/cache/conftool/dbconfig/20221028-032127-ladsgroup.json
  • 03:17 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 03:12 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P36953 and previous config saved to /var/cache/conftool/dbconfig/20221028-030620-ladsgroup.json
  • 03:05 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 02:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T318950)', diff saved to https://phabricator.wikimedia.org/P36952 and previous config saved to /var/cache/conftool/dbconfig/20221028-025113-ladsgroup.json
  • 02:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T318950)', diff saved to https://phabricator.wikimedia.org/P36951 and previous config saved to /var/cache/conftool/dbconfig/20221028-025006-ladsgroup.json
  • 02:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 02:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 02:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T318950)', diff saved to https://phabricator.wikimedia.org/P36950 and previous config saved to /var/cache/conftool/dbconfig/20221028-024944-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P36949 and previous config saved to /var/cache/conftool/dbconfig/20221028-023438-ladsgroup.json
  • 02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P36948 and previous config saved to /var/cache/conftool/dbconfig/20221028-021932-ladsgroup.json
  • 02:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T318950)', diff saved to https://phabricator.wikimedia.org/P36947 and previous config saved to /var/cache/conftool/dbconfig/20221028-020425-ladsgroup.json
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T318950)', diff saved to https://phabricator.wikimedia.org/P36946 and previous config saved to /var/cache/conftool/dbconfig/20221028-020117-ladsgroup.json
  • 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 02:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 02:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 02:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T318950)', diff saved to https://phabricator.wikimedia.org/P36945 and previous config saved to /var/cache/conftool/dbconfig/20221028-020024-ladsgroup.json
  • 01:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS buster
  • 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P36944 and previous config saved to /var/cache/conftool/dbconfig/20221028-014517-ladsgroup.json
  • 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T318950)', diff saved to https://phabricator.wikimedia.org/P36942 and previous config saved to /var/cache/conftool/dbconfig/20221028-011505-ladsgroup.json
  • 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T318950)', diff saved to https://phabricator.wikimedia.org/P36941 and previous config saved to /var/cache/conftool/dbconfig/20221028-011357-ladsgroup.json
  • 01:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T318950)', diff saved to https://phabricator.wikimedia.org/P36940 and previous config saved to /var/cache/conftool/dbconfig/20221028-011335-ladsgroup.json
  • 01:13 ejegg: civicrm upgraded from 4cb2d91e to 6f511710
  • 01:00 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS buster
  • 00:59 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4048.ulsfo.wmnet with OS buster
  • 00:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS buster
  • 00:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P36939 and previous config saved to /var/cache/conftool/dbconfig/20221028-005829-ladsgroup.json
  • 00:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS buster
  • 00:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P36938 and previous config saved to /var/cache/conftool/dbconfig/20221028-004322-ladsgroup.json
  • 00:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
  • 00:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T318950)', diff saved to https://phabricator.wikimedia.org/P36937 and previous config saved to /var/cache/conftool/dbconfig/20221028-002816-ladsgroup.json
  • 00:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T318950)', diff saved to https://phabricator.wikimedia.org/P36936 and previous config saved to /var/cache/conftool/dbconfig/20221028-002708-ladsgroup.json
  • 00:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 00:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 00:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 00:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T318950)', diff saved to https://phabricator.wikimedia.org/P36935 and previous config saved to /var/cache/conftool/dbconfig/20221028-002631-ladsgroup.json
  • 00:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
  • 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P36934 and previous config saved to /var/cache/conftool/dbconfig/20221028-001124-ladsgroup.json
  • 00:00 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS buster

2022-10-27

  • 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P36933 and previous config saved to /var/cache/conftool/dbconfig/20221027-235618-ladsgroup.json
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T318950)', diff saved to https://phabricator.wikimedia.org/P36932 and previous config saved to /var/cache/conftool/dbconfig/20221027-234111-ladsgroup.json
  • 23:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T318950)', diff saved to https://phabricator.wikimedia.org/P36931 and previous config saved to /var/cache/conftool/dbconfig/20221027-233903-ladsgroup.json
  • 23:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 23:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T318950)', diff saved to https://phabricator.wikimedia.org/P36930 and previous config saved to /var/cache/conftool/dbconfig/20221027-233842-ladsgroup.json
  • 23:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P36929 and previous config saved to /var/cache/conftool/dbconfig/20221027-232335-ladsgroup.json
  • 23:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P36928 and previous config saved to /var/cache/conftool/dbconfig/20221027-230828-ladsgroup.json
  • 23:00 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1006
  • 23:00 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:57 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T318950)', diff saved to https://phabricator.wikimedia.org/P36927 and previous config saved to /var/cache/conftool/dbconfig/20221027-225322-ladsgroup.json
  • 22:53 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1006
  • 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1007
  • 22:51 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:49 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 22:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1007
  • 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T318950)', diff saved to https://phabricator.wikimedia.org/P36926 and previous config saved to /var/cache/conftool/dbconfig/20221027-224413-ladsgroup.json
  • 22:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 22:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 22:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T318950)', diff saved to https://phabricator.wikimedia.org/P36925 and previous config saved to /var/cache/conftool/dbconfig/20221027-224350-ladsgroup.json
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P36924 and previous config saved to /var/cache/conftool/dbconfig/20221027-222844-ladsgroup.json
  • 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P36923 and previous config saved to /var/cache/conftool/dbconfig/20221027-221337-ladsgroup.json
  • 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T318950)', diff saved to https://phabricator.wikimedia.org/P36922 and previous config saved to /var/cache/conftool/dbconfig/20221027-215831-ladsgroup.json
  • 21:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T318950)', diff saved to https://phabricator.wikimedia.org/P36921 and previous config saved to /var/cache/conftool/dbconfig/20221027-215723-ladsgroup.json
  • 21:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 21:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 21:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T318950)', diff saved to https://phabricator.wikimedia.org/P36920 and previous config saved to /var/cache/conftool/dbconfig/20221027-215701-ladsgroup.json
  • 21:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 21:46 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 21:46 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4052
  • 21:46 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4052
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P36919 and previous config saved to /var/cache/conftool/dbconfig/20221027-214154-ladsgroup.json
  • 21:41 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 21:34 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P36918 and previous config saved to /var/cache/conftool/dbconfig/20221027-212648-ladsgroup.json
  • 21:20 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4052.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:13 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bullseye
  • 21:12 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4052.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T318950)', diff saved to https://phabricator.wikimedia.org/P36917 and previous config saved to /var/cache/conftool/dbconfig/20221027-211142-ladsgroup.json
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T318950)', diff saved to https://phabricator.wikimedia.org/P36916 and previous config saved to /var/cache/conftool/dbconfig/20221027-211034-ladsgroup.json
  • 21:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 21:10 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4052.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T318950)', diff saved to https://phabricator.wikimedia.org/P36915 and previous config saved to /var/cache/conftool/dbconfig/20221027-211012-ladsgroup.json
  • 21:02 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4052.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:56 sukhe: sudo ipmitool -I lanplus -H "cp4052.mgmt.ulsfo.wmnet" -U root -E chassis power cycle
  • 20:56 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage
  • 20:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P36914 and previous config saved to /var/cache/conftool/dbconfig/20221027-205505-ladsgroup.json
  • 20:53 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage
  • 20:50 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 20:47 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for enwiki, enwiktionary (T300770)
  • 20:47 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for commonswiki (T300770)
  • 20:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T318950)', diff saved to https://phabricator.wikimedia.org/P36912 and previous config saved to /var/cache/conftool/dbconfig/20221027-202452-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 (T318950)', diff saved to https://phabricator.wikimedia.org/P36911 and previous config saved to /var/cache/conftool/dbconfig/20221027-202345-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T318950)', diff saved to https://phabricator.wikimedia.org/P36910 and previous config saved to /var/cache/conftool/dbconfig/20221027-202323-ladsgroup.json
  • 20:16 kindrobot: End of UTC late backport deployment window
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 20:15 kindrobot@deploy1002: Finished scap: Backport for Deploy Research Incentive survey on enwiki (T318333) (duration: 06m 32s)
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 20:09 kindrobot@deploy1002: kindrobot and dani: Backport for Deploy Research Incentive survey on enwiki (T318333) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:08 kindrobot@deploy1002: Started scap: Backport for Deploy Research Incentive survey on enwiki (T318333)
  • 20:08 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS buster
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P36909 and previous config saved to /var/cache/conftool/dbconfig/20221027-200817-ladsgroup.json
  • 20:08 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:59 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T318950)', diff saved to https://phabricator.wikimedia.org/P36908 and previous config saved to /var/cache/conftool/dbconfig/20221027-195634-ladsgroup.json
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P36907 and previous config saved to /var/cache/conftool/dbconfig/20221027-195310-ladsgroup.json
  • 19:51 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:50 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:50 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:49 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P36906 and previous config saved to /var/cache/conftool/dbconfig/20221027-194127-ladsgroup.json
  • 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T318950)', diff saved to https://phabricator.wikimedia.org/P36905 and previous config saved to /var/cache/conftool/dbconfig/20221027-193803-ladsgroup.json
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T318950)', diff saved to https://phabricator.wikimedia.org/P36904 and previous config saved to /var/cache/conftool/dbconfig/20221027-193656-ladsgroup.json
  • 19:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36903 and previous config saved to /var/cache/conftool/dbconfig/20221027-193617-ladsgroup.json
  • 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P36902 and previous config saved to /var/cache/conftool/dbconfig/20221027-192621-ladsgroup.json
  • 19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P36901 and previous config saved to /var/cache/conftool/dbconfig/20221027-192110-ladsgroup.json
  • 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T318950)', diff saved to https://phabricator.wikimedia.org/P36900 and previous config saved to /var/cache/conftool/dbconfig/20221027-191114-ladsgroup.json
  • 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T318950)', diff saved to https://phabricator.wikimedia.org/P36899 and previous config saved to /var/cache/conftool/dbconfig/20221027-190904-ladsgroup.json
  • 19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 19:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T318950)', diff saved to https://phabricator.wikimedia.org/P36898 and previous config saved to /var/cache/conftool/dbconfig/20221027-190843-ladsgroup.json
  • 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P36897 and previous config saved to /var/cache/conftool/dbconfig/20221027-190604-ladsgroup.json
  • 18:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P36896 and previous config saved to /var/cache/conftool/dbconfig/20221027-185336-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36895 and previous config saved to /var/cache/conftool/dbconfig/20221027-185057-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36894 and previous config saved to /var/cache/conftool/dbconfig/20221027-184949-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36893 and previous config saved to /var/cache/conftool/dbconfig/20221027-184928-ladsgroup.json
  • 18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P36892 and previous config saved to /var/cache/conftool/dbconfig/20221027-183830-ladsgroup.json
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P36891 and previous config saved to /var/cache/conftool/dbconfig/20221027-183421-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T318950)', diff saved to https://phabricator.wikimedia.org/P36890 and previous config saved to /var/cache/conftool/dbconfig/20221027-182323-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T318950)', diff saved to https://phabricator.wikimedia.org/P36889 and previous config saved to /var/cache/conftool/dbconfig/20221027-182113-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318950)', diff saved to https://phabricator.wikimedia.org/P36888 and previous config saved to /var/cache/conftool/dbconfig/20221027-182051-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P36887 and previous config saved to /var/cache/conftool/dbconfig/20221027-181915-ladsgroup.json
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P36886 and previous config saved to /var/cache/conftool/dbconfig/20221027-180545-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36885 and previous config saved to /var/cache/conftool/dbconfig/20221027-180408-ladsgroup.json
  • 18:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36884 and previous config saved to /var/cache/conftool/dbconfig/20221027-180301-ladsgroup.json
  • 18:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 18:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 17:52 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P36883 and previous config saved to /var/cache/conftool/dbconfig/20221027-175038-ladsgroup.json
  • 17:45 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:45 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:42 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P36882 and previous config saved to /var/cache/conftool/dbconfig/20221027-174219-ladsgroup.json
  • 17:42 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4051.ulsfo.wmnet with OS buster
  • 17:39 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:38 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4006
  • 17:37 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4006
  • 17:37 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4002
  • 17:37 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4002
  • 17:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318950)', diff saved to https://phabricator.wikimedia.org/P36881 and previous config saved to /var/cache/conftool/dbconfig/20221027-173532-ladsgroup.json
  • 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T318950)', diff saved to https://phabricator.wikimedia.org/P36880 and previous config saved to /var/cache/conftool/dbconfig/20221027-173322-ladsgroup.json
  • 17:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36879 and previous config saved to /var/cache/conftool/dbconfig/20221027-173255-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P36878 and previous config saved to /var/cache/conftool/dbconfig/20221027-172712-ladsgroup.json
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P36877 and previous config saved to /var/cache/conftool/dbconfig/20221027-171749-ladsgroup.json
  • 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P36876 and previous config saved to /var/cache/conftool/dbconfig/20221027-171205-ladsgroup.json
  • 17:11 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P36875 and previous config saved to /var/cache/conftool/dbconfig/20221027-170242-ladsgroup.json
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P36874 and previous config saved to /var/cache/conftool/dbconfig/20221027-165659-ladsgroup.json
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:53 dancy@deploy1002: Sync cancelled.
  • 16:52 dancy@deploy1002: dancy: testing mw-debug synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 16:52 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:52 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P36873 and previous config saved to /var/cache/conftool/dbconfig/20221027-165052-ladsgroup.json
  • 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36872 and previous config saved to /var/cache/conftool/dbconfig/20221027-165031-ladsgroup.json
  • 16:48 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:48 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36871 and previous config saved to /var/cache/conftool/dbconfig/20221027-164735-ladsgroup.json
  • 16:47 dancy@deploy1002: Started scap: testing mw-debug
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36870 and previous config saved to /var/cache/conftool/dbconfig/20221027-164626-ladsgroup.json
  • 16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36869 and previous config saved to /var/cache/conftool/dbconfig/20221027-164615-ladsgroup.json
  • 16:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS buster
  • 16:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS buster
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P36868 and previous config saved to /var/cache/conftool/dbconfig/20221027-163524-ladsgroup.json
  • 16:33 dancy@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/bin/make -C /srv/mwbuilder/release/make-container-image -f Makefile build-and-push-all-images http_proxy=http://webproxy.eqiad.wmnet:8080 https_proxy=http://webproxy.eqiad.wmnet:8080 GIT_BASE=https://gerrit.wikimedia.org/r/ MW_CONFIG_BRANCH=master workdir_volume=/srv/mediawiki-staging mv_image_name=docker-registry.discovery.wmnet/restric
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P36867 and previous config saved to /var/cache/conftool/dbconfig/20221027-163109-ladsgroup.json
  • 16:27 dancy@deploy1002: Started scap: testing mw-debug
  • 16:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
  • 16:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P36866 and previous config saved to /var/cache/conftool/dbconfig/20221027-162018-ladsgroup.json
  • 16:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2004.codfw.wmnet
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P36865 and previous config saved to /var/cache/conftool/dbconfig/20221027-161602-ladsgroup.json
  • 16:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
  • 16:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2004.codfw.wmnet
  • 16:11 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36864 and previous config saved to /var/cache/conftool/dbconfig/20221027-160511-ladsgroup.json
  • 16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36863 and previous config saved to /var/cache/conftool/dbconfig/20221027-160056-ladsgroup.json
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36862 and previous config saved to /var/cache/conftool/dbconfig/20221027-155946-ladsgroup.json
  • 15:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:59 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T318950)', diff saved to https://phabricator.wikimedia.org/P36861 and previous config saved to /var/cache/conftool/dbconfig/20221027-155902-ladsgroup.json
  • 15:55 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2003-dev.codfw.wmnet with OS bullseye
  • 15:47 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:46 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS buster
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P36860 and previous config saved to /var/cache/conftool/dbconfig/20221027-154356-ladsgroup.json
  • 15:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4041.ulsfo.wmnet with OS buster
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36859 and previous config saved to /var/cache/conftool/dbconfig/20221027-153143-ladsgroup.json
  • 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36858 and previous config saved to /var/cache/conftool/dbconfig/20221027-153121-ladsgroup.json
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P36857 and previous config saved to /var/cache/conftool/dbconfig/20221027-152849-ladsgroup.json
  • 15:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wcqs2002.codfw.wmnet
  • 15:26 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for wcqs2002.codfw.wmnet
  • 15:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on wcqs2003.codfw.wmnet with reason: data reload
  • 15:26 bking@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on wcqs2003.codfw.wmnet with reason: data reload
  • 15:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: data reload
  • 15:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: data reload
  • 15:23 claime: Removed silence ProbeDown instance="mwdebug:4444"
  • 15:23 claime: k8s-experimental mwdebug service switched to new deployment mw-debug
  • 15:22 claime: Unpausing mwdebug k8s deployments
  • 15:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
  • 15:19 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 15:18 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 15:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P36856 and previous config saved to /var/cache/conftool/dbconfig/20221027-151615-ladsgroup.json
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T321123)', diff saved to https://phabricator.wikimedia.org/P36855 and previous config saved to /var/cache/conftool/dbconfig/20221027-151604-marostegui.json
  • 15:15 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T318950)', diff saved to https://phabricator.wikimedia.org/P36854 and previous config saved to /var/cache/conftool/dbconfig/20221027-151343-ladsgroup.json
  • 15:12 claime: Silence ProbeDown instance="mwdebug:4444" for 1h
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T318950)', diff saved to https://phabricator.wikimedia.org/P36853 and previous config saved to /var/cache/conftool/dbconfig/20221027-151133-ladsgroup.json
  • 15:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 15:11 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2001-dev.codfw.wmnet with OS bullseye
  • 15:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T318950)', diff saved to https://phabricator.wikimedia.org/P36852 and previous config saved to /var/cache/conftool/dbconfig/20221027-151111-ladsgroup.json
  • 15:07 claime: Pausing mwdebug k8s deployments
  • 15:07 moritzm: installing node-moment security updates
  • 15:07 claime: Switching k8s-experimental mwdebug service
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P36851 and previous config saved to /var/cache/conftool/dbconfig/20221027-150108-ladsgroup.json
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P36850 and previous config saved to /var/cache/conftool/dbconfig/20221027-150058-marostegui.json
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P36849 and previous config saved to /var/cache/conftool/dbconfig/20221027-145604-ladsgroup.json
  • 14:51 moritzm: installing krb5 bugfix updates from Bullseye point release
  • 14:50 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
  • 14:50 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
  • 14:49 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS buster
  • 14:48 moritzm: installing twitter-bootstrap4 security updates
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36848 and previous config saved to /var/cache/conftool/dbconfig/20221027-144602-ladsgroup.json
  • 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P36847 and previous config saved to /var/cache/conftool/dbconfig/20221027-144551-marostegui.json
  • 14:45 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2003-dev.codfw.wmnet with reason: host reimage
  • 14:41 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2003-dev.codfw.wmnet with reason: host reimage
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P36846 and previous config saved to /var/cache/conftool/dbconfig/20221027-144058-ladsgroup.json
  • 14:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS buster
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T321123)', diff saved to https://phabricator.wikimedia.org/P36845 and previous config saved to /var/cache/conftool/dbconfig/20221027-143045-marostegui.json
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T321123)', diff saved to https://phabricator.wikimedia.org/P36844 and previous config saved to /var/cache/conftool/dbconfig/20221027-142656-marostegui.json
  • 14:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 14:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36843 and previous config saved to /var/cache/conftool/dbconfig/20221027-142634-marostegui.json
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T318950)', diff saved to https://phabricator.wikimedia.org/P36842 and previous config saved to /var/cache/conftool/dbconfig/20221027-142552-ladsgroup.json
  • 14:25 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2003-dev.codfw.wmnet with OS bullseye
  • 14:24 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2001-dev.codfw.wmnet with OS bullseye
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T318950)', diff saved to https://phabricator.wikimedia.org/P36841 and previous config saved to /var/cache/conftool/dbconfig/20221027-142342-ladsgroup.json
  • 14:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T318950)', diff saved to https://phabricator.wikimedia.org/P36840 and previous config saved to /var/cache/conftool/dbconfig/20221027-142320-ladsgroup.json
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36839 and previous config saved to /var/cache/conftool/dbconfig/20221027-141326-ladsgroup.json
  • 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P36838 and previous config saved to /var/cache/conftool/dbconfig/20221027-141304-ladsgroup.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P36837 and previous config saved to /var/cache/conftool/dbconfig/20221027-141128-marostegui.json
  • 14:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P36836 and previous config saved to /var/cache/conftool/dbconfig/20221027-140814-ladsgroup.json
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022 (T321312)', diff saved to https://phabricator.wikimedia.org/P36835 and previous config saved to /var/cache/conftool/dbconfig/20221027-140708-ladsgroup.json
  • 14:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T318950)', diff saved to https://phabricator.wikimedia.org/P36834 and previous config saved to /var/cache/conftool/dbconfig/20221027-140043-ladsgroup.json
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P36833 and previous config saved to /var/cache/conftool/dbconfig/20221027-135757-ladsgroup.json
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P36832 and previous config saved to /var/cache/conftool/dbconfig/20221027-135621-marostegui.json
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P36831 and previous config saved to /var/cache/conftool/dbconfig/20221027-135307-ladsgroup.json
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022', diff saved to https://phabricator.wikimedia.org/P36830 and previous config saved to /var/cache/conftool/dbconfig/20221027-135201-ladsgroup.json
  • 13:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P36829 and previous config saved to /var/cache/conftool/dbconfig/20221027-134537-ladsgroup.json
  • 13:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P36828 and previous config saved to /var/cache/conftool/dbconfig/20221027-134251-ladsgroup.json
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36827 and previous config saved to /var/cache/conftool/dbconfig/20221027-134115-marostegui.json
  • 13:40 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:39 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS buster
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36826 and previous config saved to /var/cache/conftool/dbconfig/20221027-133848-marostegui.json
  • 13:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 13:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T321123)', diff saved to https://phabricator.wikimedia.org/P36825 and previous config saved to /var/cache/conftool/dbconfig/20221027-133827-marostegui.json
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T318950)', diff saved to https://phabricator.wikimedia.org/P36824 and previous config saved to /var/cache/conftool/dbconfig/20221027-133801-ladsgroup.json
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022', diff saved to https://phabricator.wikimedia.org/P36823 and previous config saved to /var/cache/conftool/dbconfig/20221027-133654-ladsgroup.json
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T318950)', diff saved to https://phabricator.wikimedia.org/P36822 and previous config saved to /var/cache/conftool/dbconfig/20221027-133551-ladsgroup.json
  • 13:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T318950)', diff saved to https://phabricator.wikimedia.org/P36821 and previous config saved to /var/cache/conftool/dbconfig/20221027-133522-ladsgroup.json
  • 13:32 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P36820 and previous config saved to /var/cache/conftool/dbconfig/20221027-133031-ladsgroup.json
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:29 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Define a default value for wgPageTriageMaxAge (T310974) (duration: 05m 33s)
  • 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T321312)', diff saved to https://phabricator.wikimedia.org/P36819 and previous config saved to /var/cache/conftool/dbconfig/20221027-132814-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P36818 and previous config saved to /var/cache/conftool/dbconfig/20221027-132743-ladsgroup.json
  • 13:24 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for Define a default value for wgPageTriageMaxAge (T310974) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Define a default value for wgPageTriageMaxAge (T310974)
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P36817 and previous config saved to /var/cache/conftool/dbconfig/20221027-132320-marostegui.json
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:22 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable show nearby feature on de.wikivoyage (T320692) (duration: 05m 35s)
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022 (T321312)', diff saved to https://phabricator.wikimedia.org/P36816 and previous config saved to /var/cache/conftool/dbconfig/20221027-132148-ladsgroup.json
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P36815 and previous config saved to /var/cache/conftool/dbconfig/20221027-132016-ladsgroup.json
  • 13:16 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and wmde-fisch: Backport for Enable show nearby feature on de.wikivoyage (T320692) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:16 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable show nearby feature on de.wikivoyage (T320692)
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T318950)', diff saved to https://phabricator.wikimedia.org/P36814 and previous config saved to /var/cache/conftool/dbconfig/20221027-131524-ladsgroup.json
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P36813 and previous config saved to /var/cache/conftool/dbconfig/20221027-131308-ladsgroup.json
  • 13:12 vgutierrez: pool cp5007
  • 13:12 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable source links on Translation ns on bnwikisource (T53980) (duration: 05m 40s)
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T318950)', diff saved to https://phabricator.wikimedia.org/P36812 and previous config saved to /var/cache/conftool/dbconfig/20221027-131127-ladsgroup.json
  • 13:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T318950)', diff saved to https://phabricator.wikimedia.org/P36811 and previous config saved to /var/cache/conftool/dbconfig/20221027-131117-ladsgroup.json
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P36810 and previous config saved to /var/cache/conftool/dbconfig/20221027-130814-marostegui.json
  • 13:06 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and bodhisattwa: Backport for Enable source links on Translation ns on bnwikisource (T53980) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable source links on Translation ns on bnwikisource (T53980)
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P36809 and previous config saved to /var/cache/conftool/dbconfig/20221027-130509-ladsgroup.json
  • 13:03 vgutierrez: depool cp5007
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2022 (T321312)', diff saved to https://phabricator.wikimedia.org/P36808 and previous config saved to /var/cache/conftool/dbconfig/20221027-130135-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021 (T321312)', diff saved to https://phabricator.wikimedia.org/P36807 and previous config saved to /var/cache/conftool/dbconfig/20221027-130110-ladsgroup.json
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P36806 and previous config saved to /var/cache/conftool/dbconfig/20221027-125801-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P36805 and previous config saved to /var/cache/conftool/dbconfig/20221027-125610-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P36804 and previous config saved to /var/cache/conftool/dbconfig/20221027-125456-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T321123)', diff saved to https://phabricator.wikimedia.org/P36803 and previous config saved to /var/cache/conftool/dbconfig/20221027-125307-marostegui.json
  • 12:52 ladsgroup@deploy1002: Finished scap: Backport for maintenance: Use $this->waitForReplication() (T298485) (duration: 04m 40s)
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T321123)', diff saved to https://phabricator.wikimedia.org/P36802 and previous config saved to /var/cache/conftool/dbconfig/20221027-125042-marostegui.json
  • 12:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 12:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36801 and previous config saved to /var/cache/conftool/dbconfig/20221027-125020-marostegui.json
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T318950)', diff saved to https://phabricator.wikimedia.org/P36800 and previous config saved to /var/cache/conftool/dbconfig/20221027-125002-ladsgroup.json
  • 12:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:48 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for maintenance: Use $this->waitForReplication() (T298485) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T318950)', diff saved to https://phabricator.wikimedia.org/P36799 and previous config saved to /var/cache/conftool/dbconfig/20221027-124752-ladsgroup.json
  • 12:47 ladsgroup@deploy1002: Started scap: Backport for maintenance: Use $this->waitForReplication() (T298485)
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T318950)', diff saved to https://phabricator.wikimedia.org/P36798 and previous config saved to /var/cache/conftool/dbconfig/20221027-124731-ladsgroup.json
  • 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021', diff saved to https://phabricator.wikimedia.org/P36797 and previous config saved to /var/cache/conftool/dbconfig/20221027-124603-ladsgroup.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T321312)', diff saved to https://phabricator.wikimedia.org/P36796 and previous config saved to /var/cache/conftool/dbconfig/20221027-124255-ladsgroup.json
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P36795 and previous config saved to /var/cache/conftool/dbconfig/20221027-124104-ladsgroup.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P36794 and previous config saved to /var/cache/conftool/dbconfig/20221027-123513-marostegui.json
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P36793 and previous config saved to /var/cache/conftool/dbconfig/20221027-123224-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021', diff saved to https://phabricator.wikimedia.org/P36792 and previous config saved to /var/cache/conftool/dbconfig/20221027-123057-ladsgroup.json
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T318950)', diff saved to https://phabricator.wikimedia.org/P36791 and previous config saved to /var/cache/conftool/dbconfig/20221027-122557-ladsgroup.json
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P36790 and previous config saved to /var/cache/conftool/dbconfig/20221027-122007-marostegui.json
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P36789 and previous config saved to /var/cache/conftool/dbconfig/20221027-121717-ladsgroup.json
  • 12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021 (T321312)', diff saved to https://phabricator.wikimedia.org/P36788 and previous config saved to /var/cache/conftool/dbconfig/20221027-121550-ladsgroup.json
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36787 and previous config saved to /var/cache/conftool/dbconfig/20221027-121441-root.json
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36786 and previous config saved to /var/cache/conftool/dbconfig/20221027-121432-root.json
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36785 and previous config saved to /var/cache/conftool/dbconfig/20221027-121425-root.json
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1022 (T321312)', diff saved to https://phabricator.wikimedia.org/P36784 and previous config saved to /var/cache/conftool/dbconfig/20221027-121323-ladsgroup.json
  • 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021 (T321312)', diff saved to https://phabricator.wikimedia.org/P36783 and previous config saved to /var/cache/conftool/dbconfig/20221027-121259-ladsgroup.json
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36782 and previous config saved to /var/cache/conftool/dbconfig/20221027-120500-marostegui.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36781 and previous config saved to /var/cache/conftool/dbconfig/20221027-120234-marostegui.json
  • 12:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T318950)', diff saved to https://phabricator.wikimedia.org/P36780 and previous config saved to /var/cache/conftool/dbconfig/20221027-120211-ladsgroup.json
  • 12:01 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on D{lvs200[7-8].codfw.wmnet} and A:lvs
  • 12:00 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on D{lvs200[7-8].codfw.wmnet} and A:lvs
  • 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T318950)', diff saved to https://phabricator.wikimedia.org/P36778 and previous config saved to /var/cache/conftool/dbconfig/20221027-120001-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T318950)', diff saved to https://phabricator.wikimedia.org/P36777 and previous config saved to /var/cache/conftool/dbconfig/20221027-115939-ladsgroup.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36776 and previous config saved to /var/cache/conftool/dbconfig/20221027-115936-root.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36775 and previous config saved to /var/cache/conftool/dbconfig/20221027-115927-root.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36774 and previous config saved to /var/cache/conftool/dbconfig/20221027-115920-root.json
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021', diff saved to https://phabricator.wikimedia.org/P36773 and previous config saved to /var/cache/conftool/dbconfig/20221027-115753-ladsgroup.json
  • 11:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P36772 and previous config saved to /var/cache/conftool/dbconfig/20221027-115157-ladsgroup.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P36771 and previous config saved to /var/cache/conftool/dbconfig/20221027-114706-marostegui.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36770 and previous config saved to /var/cache/conftool/dbconfig/20221027-114432-root.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36769 and previous config saved to /var/cache/conftool/dbconfig/20221027-114422-root.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36768 and previous config saved to /var/cache/conftool/dbconfig/20221027-114416-root.json
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021', diff saved to https://phabricator.wikimedia.org/P36767 and previous config saved to /var/cache/conftool/dbconfig/20221027-114246-ladsgroup.json
  • 11:39 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:38 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P36765 and previous config saved to /var/cache/conftool/dbconfig/20221027-113651-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T318950)', diff saved to https://phabricator.wikimedia.org/P36764 and previous config saved to /var/cache/conftool/dbconfig/20221027-113554-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T318950)', diff saved to https://phabricator.wikimedia.org/P36763 and previous config saved to /var/cache/conftool/dbconfig/20221027-113544-ladsgroup.json
  • 11:35 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:35 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow1002.eqiad.wmnet to plain
  • 11:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow1002.eqiad.wmnet to plain
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P36762 and previous config saved to /var/cache/conftool/dbconfig/20221027-113159-marostegui.json
  • 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P36761 and previous config saved to /var/cache/conftool/dbconfig/20221027-112927-ladsgroup.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36760 and previous config saved to /var/cache/conftool/dbconfig/20221027-112927-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36759 and previous config saved to /var/cache/conftool/dbconfig/20221027-112917-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36758 and previous config saved to /var/cache/conftool/dbconfig/20221027-112911-root.json
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021 (T321312)', diff saved to https://phabricator.wikimedia.org/P36757 and previous config saved to /var/cache/conftool/dbconfig/20221027-112740-ladsgroup.json
  • 11:24 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:24 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P36756 and previous config saved to /var/cache/conftool/dbconfig/20221027-112144-ladsgroup.json
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P36755 and previous config saved to /var/cache/conftool/dbconfig/20221027-112037-ladsgroup.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T321123)', diff saved to https://phabricator.wikimedia.org/P36754 and previous config saved to /var/cache/conftool/dbconfig/20221027-111653-marostegui.json
  • 11:15 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:15 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T321123)', diff saved to https://phabricator.wikimedia.org/P36753 and previous config saved to /var/cache/conftool/dbconfig/20221027-111427-marostegui.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36752 and previous config saved to /var/cache/conftool/dbconfig/20221027-111422-root.json
  • 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T318950)', diff saved to https://phabricator.wikimedia.org/P36751 and previous config saved to /var/cache/conftool/dbconfig/20221027-111414-ladsgroup.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36750 and previous config saved to /var/cache/conftool/dbconfig/20221027-111412-root.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36749 and previous config saved to /var/cache/conftool/dbconfig/20221027-111406-root.json
  • 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T321123)', diff saved to https://phabricator.wikimedia.org/P36748 and previous config saved to /var/cache/conftool/dbconfig/20221027-111401-marostegui.json
  • 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T318950)', diff saved to https://phabricator.wikimedia.org/P36747 and previous config saved to /var/cache/conftool/dbconfig/20221027-111204-ladsgroup.json
  • 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2021 (T321312)', diff saved to https://phabricator.wikimedia.org/P36746 and previous config saved to /var/cache/conftool/dbconfig/20221027-111009-ladsgroup.json
  • 11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1021 (T321312)', diff saved to https://phabricator.wikimedia.org/P36745 and previous config saved to /var/cache/conftool/dbconfig/20221027-110920-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P36744 and previous config saved to /var/cache/conftool/dbconfig/20221027-110638-ladsgroup.json
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P36743 and previous config saved to /var/cache/conftool/dbconfig/20221027-110531-ladsgroup.json
  • 11:05 moritzm: installing nodejs security updates on buster
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T318950)', diff saved to https://phabricator.wikimedia.org/P36742 and previous config saved to /var/cache/conftool/dbconfig/20221027-110301-ladsgroup.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36740 and previous config saved to /var/cache/conftool/dbconfig/20221027-105910-root.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36739 and previous config saved to /var/cache/conftool/dbconfig/20221027-105907-root.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36738 and previous config saved to /var/cache/conftool/dbconfig/20221027-105901-root.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P36737 and previous config saved to /var/cache/conftool/dbconfig/20221027-105855-marostegui.json
  • 10:35 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be1059.eqiad.wmnet
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P36728 and previous config saved to /var/cache/conftool/dbconfig/20221027-103248-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P36727 and previous config saved to /var/cache/conftool/dbconfig/20221027-103236-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P36726 and previous config saved to /var/cache/conftool/dbconfig/20221027-103214-ladsgroup.json
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P36725 and previous config saved to /var/cache/conftool/dbconfig/20221027-103110-ladsgroup.json
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 1%: After upgrade', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20221027-102852-root.json
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 1%: After upgrade', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20221027-102847-root.json
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 1%: After upgrade', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20221027-102843-root.json
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T321123)', diff saved to https://phabricator.wikimedia.org/P36724 and previous config saved to /var/cache/conftool/dbconfig/20221027-102611-marostegui.json
  • 10:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 10:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T321123)', diff saved to https://phabricator.wikimedia.org/P36723 and previous config saved to /var/cache/conftool/dbconfig/20221027-102550-marostegui.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2026 es2027 es2028 for upgrade', diff saved to https://phabricator.wikimedia.org/P36722 and previous config saved to /var/cache/conftool/dbconfig/20221027-102209-root.json
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1009.eqiad.wmnet
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T321312)', diff saved to https://phabricator.wikimedia.org/P36721 and previous config saved to /var/cache/conftool/dbconfig/20221027-101848-ladsgroup.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2030 as es1 master, es2031 as es2 master, es2029 as es3 master', diff saved to https://phabricator.wikimedia.org/P36720 and previous config saved to /var/cache/conftool/dbconfig/20221027-101842-marostegui.json
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T318950)', diff saved to https://phabricator.wikimedia.org/P36719 and previous config saved to /var/cache/conftool/dbconfig/20221027-101742-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P36718 and previous config saved to /var/cache/conftool/dbconfig/20221027-101708-ladsgroup.json
  • 10:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P36717 and previous config saved to /var/cache/conftool/dbconfig/20221027-101604-ladsgroup.json
  • 10:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host stat1009.eqiad.wmnet
  • 10:14 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 codfw master" (duration: 04m 29s)
  • 10:12 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P36716 and previous config saved to /var/cache/conftool/dbconfig/20221027-101043-marostegui.json
  • 10:09 marostegui@deploy1002: marostegui and marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 codfw master" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T318950)', diff saved to https://phabricator.wikimedia.org/P36715 and previous config saved to /var/cache/conftool/dbconfig/20221027-100948-ladsgroup.json
  • 10:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 10:09 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 codfw master"
  • 10:09 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 10:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T318950)', diff saved to https://phabricator.wikimedia.org/P36714 and previous config saved to /var/cache/conftool/dbconfig/20221027-100915-ladsgroup.json
  • 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025 (T321312)', diff saved to https://phabricator.wikimedia.org/P36713 and previous config saved to /var/cache/conftool/dbconfig/20221027-100547-ladsgroup.json
  • 10:03 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P36712 and previous config saved to /var/cache/conftool/dbconfig/20221027-100303-ladsgroup.json
  • 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P36711 and previous config saved to /var/cache/conftool/dbconfig/20221027-100201-ladsgroup.json
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T318950)', diff saved to https://phabricator.wikimedia.org/P36710 and previous config saved to /var/cache/conftool/dbconfig/20221027-100057-ladsgroup.json
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T318950)', diff saved to https://phabricator.wikimedia.org/P36709 and previous config saved to /var/cache/conftool/dbconfig/20221027-095700-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P36708 and previous config saved to /var/cache/conftool/dbconfig/20221027-095649-ladsgroup.json
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P36707 and previous config saved to /var/cache/conftool/dbconfig/20221027-095537-marostegui.json
  • 09:54 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P36706 and previous config saved to /var/cache/conftool/dbconfig/20221027-095408-ladsgroup.json
  • 09:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025', diff saved to https://phabricator.wikimedia.org/P36705 and previous config saved to /var/cache/conftool/dbconfig/20221027-095041-ladsgroup.json
  • 09:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P36704 and previous config saved to /var/cache/conftool/dbconfig/20221027-094756-ladsgroup.json
  • 09:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P36703 and previous config saved to /var/cache/conftool/dbconfig/20221027-094655-ladsgroup.json
  • 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
  • 09:46 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc1 codfw master (duration: 04m 26s)
  • 09:42 marostegui@deploy1002: marostegui and marostegui: Backport for ProductionServices.php: Promote pc2014 to pc1 codfw master synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P36702 and previous config saved to /var/cache/conftool/dbconfig/20221027-094143-ladsgroup.json
  • 09:41 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc1 codfw master
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T321123)', diff saved to https://phabricator.wikimedia.org/P36701 and previous config saved to /var/cache/conftool/dbconfig/20221027-094030-marostegui.json
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P36700 and previous config saved to /var/cache/conftool/dbconfig/20221027-093902-ladsgroup.json
  • 09:38 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T321123)', diff saved to https://phabricator.wikimedia.org/P36699 and previous config saved to /var/cache/conftool/dbconfig/20221027-093804-marostegui.json
  • 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 09:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 09:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025', diff saved to https://phabricator.wikimedia.org/P36698 and previous config saved to /var/cache/conftool/dbconfig/20221027-093534-ladsgroup.json
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2117', diff saved to https://phabricator.wikimedia.org/P36697 and previous config saved to /var/cache/conftool/dbconfig/20221027-093519-marostegui.json
  • 09:34 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb-test2001.codfw.wmnet
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T321312)', diff saved to https://phabricator.wikimedia.org/P36696 and previous config saved to /var/cache/conftool/dbconfig/20221027-093250-ladsgroup.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P36694 and previous config saved to /var/cache/conftool/dbconfig/20221027-092842-marostegui.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P36693 and previous config saved to /var/cache/conftool/dbconfig/20221027-092636-ladsgroup.json
  • 09:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb-test2001.codfw.wmnet
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T318950)', diff saved to https://phabricator.wikimedia.org/P36692 and previous config saved to /var/cache/conftool/dbconfig/20221027-092355-ladsgroup.json
  • 09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025 (T321312)', diff saved to https://phabricator.wikimedia.org/P36691 and previous config saved to /var/cache/conftool/dbconfig/20221027-092028-ladsgroup.json
  • 09:17 moritzm: failover ganeti master in ulsfo to ganeti4008, unblocking future decom of ganeti4003 T317247
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T318950)', diff saved to https://phabricator.wikimedia.org/P36689 and previous config saved to /var/cache/conftool/dbconfig/20221027-091603-ladsgroup.json
  • 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T318950)', diff saved to https://phabricator.wikimedia.org/P36688 and previous config saved to /var/cache/conftool/dbconfig/20221027-091536-ladsgroup.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P36687 and previous config saved to /var/cache/conftool/dbconfig/20221027-091336-marostegui.json
  • 09:13 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:13 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P36686 and previous config saved to /var/cache/conftool/dbconfig/20221027-091249-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P36685 and previous config saved to /var/cache/conftool/dbconfig/20221027-091227-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P36684 and previous config saved to /var/cache/conftool/dbconfig/20221027-091130-ladsgroup.json
  • 09:10 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 09:10 elukey@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 09:10 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 09:10 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 09:09 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:09 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P36683 and previous config saved to /var/cache/conftool/dbconfig/20221027-090030-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1025 (T321312)', diff saved to https://phabricator.wikimedia.org/P36682 and previous config saved to /var/cache/conftool/dbconfig/20221027-085934-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T321312)', diff saved to https://phabricator.wikimedia.org/P36681 and previous config saved to /var/cache/conftool/dbconfig/20221027-085859-ladsgroup.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T321123)', diff saved to https://phabricator.wikimedia.org/P36680 and previous config saved to /var/cache/conftool/dbconfig/20221027-085829-marostegui.json
  • 08:57 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:57 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P36679 and previous config saved to /var/cache/conftool/dbconfig/20221027-085720-ladsgroup.json
  • 08:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T321123)', diff saved to https://phabricator.wikimedia.org/P36678 and previous config saved to /var/cache/conftool/dbconfig/20221027-085617-marostegui.json
  • 08:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 08:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 08:55 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" (duration: 04m 52s)
  • 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 08:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:50 marostegui@deploy1002: marostegui and marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:50 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master"
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P36676 and previous config saved to /var/cache/conftool/dbconfig/20221027-084523-ladsgroup.json
  • 08:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P36675 and previous config saved to /var/cache/conftool/dbconfig/20221027-084352-ladsgroup.json
  • 08:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P36674 and previous config saved to /var/cache/conftool/dbconfig/20221027-084214-ladsgroup.json
  • 08:37 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:37 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:36 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:36 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:32 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:32 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T318950)', diff saved to https://phabricator.wikimedia.org/P36673 and previous config saved to /var/cache/conftool/dbconfig/20221027-083017-ladsgroup.json
  • 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P36672 and previous config saved to /var/cache/conftool/dbconfig/20221027-082846-ladsgroup.json
  • 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P36671 and previous config saved to /var/cache/conftool/dbconfig/20221027-082707-ladsgroup.json
  • 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T318950)', diff saved to https://phabricator.wikimedia.org/P36670 and previous config saved to /var/cache/conftool/dbconfig/20221027-082211-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2025 (T321312)', diff saved to https://phabricator.wikimedia.org/P36669 and previous config saved to /var/cache/conftool/dbconfig/20221027-082157-ladsgroup.json
  • 08:21 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master (duration: 04m 22s)
  • 08:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023 (T321312)', diff saved to https://phabricator.wikimedia.org/P36668 and previous config saved to /var/cache/conftool/dbconfig/20221027-082131-ladsgroup.json
  • 08:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:20 elukey: powercycle elastic2043 - no mgmt console tty available, not responsive to ssh, memory/dimm errors in `racadm getsel`
  • 08:19 jbond: upload vim python3-stdlib-extensions to buster componet/python39
  • 08:17 marostegui@deploy1002: marostegui and marostegui: Backport for ProductionServices.php: Promote pc1014 to pc3 master synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 08:17 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1009.eqiad.wmnet to cluster eqiad and group C
  • 08:17 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master
  • 08:16 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1009.eqiad.wmnet to cluster eqiad and group C
  • 08:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 08:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T318950)', diff saved to https://phabricator.wikimedia.org/P36667 and previous config saved to /var/cache/conftool/dbconfig/20221027-081534-ladsgroup.json
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T321312)', diff saved to https://phabricator.wikimedia.org/P36666 and previous config saved to /var/cache/conftool/dbconfig/20221027-081339-ladsgroup.json
  • 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:11 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.7 refs T320512
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P36665 and previous config saved to /var/cache/conftool/dbconfig/20221027-081113-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T318950)', diff saved to https://phabricator.wikimedia.org/P36664 and previous config saved to /var/cache/conftool/dbconfig/20221027-081103-ladsgroup.json
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023', diff saved to https://phabricator.wikimedia.org/P36663 and previous config saved to /var/cache/conftool/dbconfig/20221027-080625-ladsgroup.json
  • 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P36662 and previous config saved to /var/cache/conftool/dbconfig/20221027-080027-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P36661 and previous config saved to /var/cache/conftool/dbconfig/20221027-075556-ladsgroup.json
  • 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagetcd1005.eqiad.wmnet to plain
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1024 (T321312)', diff saved to https://phabricator.wikimedia.org/P36660 and previous config saved to /var/cache/conftool/dbconfig/20221027-075433-ladsgroup.json
  • 07:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
  • 07:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
  • 07:53 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagetcd1005.eqiad.wmnet to plain
  • 07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P36659 and previous config saved to /var/cache/conftool/dbconfig/20221027-075327-ladsgroup.json
  • 07:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023', diff saved to https://phabricator.wikimedia.org/P36658 and previous config saved to /var/cache/conftool/dbconfig/20221027-075118-ladsgroup.json
  • 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagetcd1005.eqiad.wmnet to drbd
  • 07:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P36657 and previous config saved to /var/cache/conftool/dbconfig/20221027-074521-ladsgroup.json
  • 07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P36656 and previous config saved to /var/cache/conftool/dbconfig/20221027-074050-ladsgroup.json
  • 07:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
  • 07:39 dcausse: restarting blazegraph on wdqs1016 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 07:38 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagetcd1005.eqiad.wmnet to drbd
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023 (T321312)', diff saved to https://phabricator.wikimedia.org/P36655 and previous config saved to /var/cache/conftool/dbconfig/20221027-073612-ladsgroup.json
  • 07:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T318950)', diff saved to https://phabricator.wikimedia.org/P36654 and previous config saved to /var/cache/conftool/dbconfig/20221027-073014-ladsgroup.json
  • 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1009.eqiad.wmnet with OS bullseye
  • 07:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T318950)', diff saved to https://phabricator.wikimedia.org/P36653 and previous config saved to /var/cache/conftool/dbconfig/20221027-072543-ladsgroup.json
  • 07:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2023 (T321312)', diff saved to https://phabricator.wikimedia.org/P36652 and previous config saved to /var/cache/conftool/dbconfig/20221027-072536-ladsgroup.json
  • 07:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2023.codfw.wmnet with reason: Maintenance
  • 07:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2023.codfw.wmnet with reason: Maintenance
  • 07:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T318950)', diff saved to https://phabricator.wikimedia.org/P36651 and previous config saved to /var/cache/conftool/dbconfig/20221027-072219-ladsgroup.json
  • 07:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 07:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T318950)', diff saved to https://phabricator.wikimedia.org/P36650 and previous config saved to /var/cache/conftool/dbconfig/20221027-072157-ladsgroup.json
  • 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T318950)', diff saved to https://phabricator.wikimedia.org/P36649 and previous config saved to /var/cache/conftool/dbconfig/20221027-072148-ladsgroup.json
  • 07:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 07:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T318950)', diff saved to https://phabricator.wikimedia.org/P36648 and previous config saved to /var/cache/conftool/dbconfig/20221027-071934-ladsgroup.json
  • 07:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1009.eqiad.wmnet with reason: host reimage
  • 07:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1009.eqiad.wmnet with reason: host reimage
  • 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P36647 and previous config saved to /var/cache/conftool/dbconfig/20221027-070644-ladsgroup.json
  • 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P36646 and previous config saved to /var/cache/conftool/dbconfig/20221027-070427-ladsgroup.json
  • 06:54 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1009.eqiad.wmnet with OS bullseye
  • 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P36645 and previous config saved to /var/cache/conftool/dbconfig/20221027-065138-ladsgroup.json
  • 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1009.eqiad.wmnet with reason: Remove from cluster for eventual reimage
  • 06:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1009.eqiad.wmnet with reason: Remove from cluster for eventual reimage
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P36644 and previous config saved to /var/cache/conftool/dbconfig/20221027-064921-ladsgroup.json
  • 06:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T318950)', diff saved to https://phabricator.wikimedia.org/P36643 and previous config saved to /var/cache/conftool/dbconfig/20221027-063631-ladsgroup.json
  • 06:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 06:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1100 T321178', diff saved to https://phabricator.wikimedia.org/P36639 and previous config saved to /var/cache/conftool/dbconfig/20221027-060654-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1130 to s5 primary and set section read-write T321178', diff saved to https://phabricator.wikimedia.org/P36638 and previous config saved to /var/cache/conftool/dbconfig/20221027-060137-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T321178', diff saved to https://phabricator.wikimedia.org/P36637 and previous config saved to /var/cache/conftool/dbconfig/20221027-060102-ladsgroup.json
  • 06:00 Amir1: Starting s5 eqiad failover from db1100 to db1130 - T321178
  • 05:35 marostegui: Deploy schema change on x1 T318518
  • 05:28 marostegui: dbmaint Switch x1 to SBR T318518
  • 05:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1130 with weight 0 T321178', diff saved to https://phabricator.wikimedia.org/P36636 and previous config saved to /var/cache/conftool/dbconfig/20221027-052127-ladsgroup.json
  • 05:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T321178
  • 05:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T321178
  • 02:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:56 tstarling@deploy1002: Synchronized wmf-config/UcfirstOverrides.php: T292552 final configuration (duration: 03m 54s)
  • 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:30 tstarling@deploy1002: Synchronized wmf-config/UcfirstOverrides.php: T292552 allow title case ligatures (duration: 03m 36s)
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS buster
  • 02:10 tstarling@deploy1002: Synchronized php-1.40.0-wmf.7/includes/language/Language.php: T292552 (duration: 03m 39s)
  • 02:06 tstarling@deploy1002: Synchronized php-1.40.0-wmf.6/includes/language/Language.php: T292552 (duration: 03m 40s)
  • 02:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:47 ejegg: payments-wiki upgraded from 4f923066 to 61cf970b
  • 01:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
  • 01:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
  • 01:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS buster
  • 01:11 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4043.ulsfo.wmnet with OS buster
  • 00:59 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS buster
  • 00:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS buster
  • 00:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
  • 00:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage

2022-10-26

  • 23:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS buster
  • 23:23 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4048.ulsfo.wmnet with OS buster
  • 23:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T321312)', diff saved to https://phabricator.wikimedia.org/P36635 and previous config saved to /var/cache/conftool/dbconfig/20221026-232136-ladsgroup.json
  • 23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P36634 and previous config saved to /var/cache/conftool/dbconfig/20221026-230630-ladsgroup.json
  • 22:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P36633 and previous config saved to /var/cache/conftool/dbconfig/20221026-225123-ladsgroup.json
  • 22:49 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS buster
  • 22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T321312)', diff saved to https://phabricator.wikimedia.org/P36632 and previous config saved to /var/cache/conftool/dbconfig/20221026-223617-ladsgroup.json
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T321312)', diff saved to https://phabricator.wikimedia.org/P36631 and previous config saved to /var/cache/conftool/dbconfig/20221026-222956-ladsgroup.json
  • 22:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 22:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T321312)', diff saved to https://phabricator.wikimedia.org/P36630 and previous config saved to /var/cache/conftool/dbconfig/20221026-222932-ladsgroup.json
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P36629 and previous config saved to /var/cache/conftool/dbconfig/20221026-221426-ladsgroup.json
  • 21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P36628 and previous config saved to /var/cache/conftool/dbconfig/20221026-215919-ladsgroup.json
  • 21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T321312)', diff saved to https://phabricator.wikimedia.org/P36627 and previous config saved to /var/cache/conftool/dbconfig/20221026-214412-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T321312)', diff saved to https://phabricator.wikimedia.org/P36626 and previous config saved to /var/cache/conftool/dbconfig/20221026-213801-ladsgroup.json
  • 21:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 21:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T321312)', diff saved to https://phabricator.wikimedia.org/P36625 and previous config saved to /var/cache/conftool/dbconfig/20221026-213737-ladsgroup.json
  • 21:28 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P36624 and previous config saved to /var/cache/conftool/dbconfig/20221026-212230-ladsgroup.json
  • 21:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:16 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:15 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:08 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P36623 and previous config saved to /var/cache/conftool/dbconfig/20221026-210724-ladsgroup.json
  • 20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T321312)', diff saved to https://phabricator.wikimedia.org/P36622 and previous config saved to /var/cache/conftool/dbconfig/20221026-205218-ladsgroup.json
  • 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T321312)', diff saved to https://phabricator.wikimedia.org/P36621 and previous config saved to /var/cache/conftool/dbconfig/20221026-204553-ladsgroup.json
  • 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T321312)', diff saved to https://phabricator.wikimedia.org/P36620 and previous config saved to /var/cache/conftool/dbconfig/20221026-204529-ladsgroup.json
  • 20:42 urbanecm: Deploying security patch for T321733
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P36619 and previous config saved to /var/cache/conftool/dbconfig/20221026-203022-ladsgroup.json
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P36618 and previous config saved to /var/cache/conftool/dbconfig/20221026-201516-ladsgroup.json
  • 20:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS buster
  • 20:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T321312)', diff saved to https://phabricator.wikimedia.org/P36617 and previous config saved to /var/cache/conftool/dbconfig/20221026-200009-ladsgroup.json
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T321312)', diff saved to https://phabricator.wikimedia.org/P36616 and previous config saved to /var/cache/conftool/dbconfig/20221026-195342-ladsgroup.json
  • 19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T321312)', diff saved to https://phabricator.wikimedia.org/P36615 and previous config saved to /var/cache/conftool/dbconfig/20221026-195318-ladsgroup.json
  • 19:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
  • 19:39 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
  • 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P36614 and previous config saved to /var/cache/conftool/dbconfig/20221026-193811-ladsgroup.json
  • 19:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P36613 and previous config saved to /var/cache/conftool/dbconfig/20221026-192305-ladsgroup.json
  • 19:13 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 19:10 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS buster
  • 19:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 19:08 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS buster
  • 19:08 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T321312)', diff saved to https://phabricator.wikimedia.org/P36612 and previous config saved to /var/cache/conftool/dbconfig/20221026-190758-ladsgroup.json
  • 19:00 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS buster
  • 18:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 18:41 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS buster
  • 18:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 18:09 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS buster
  • 18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T321312)', diff saved to https://phabricator.wikimedia.org/P36611 and previous config saved to /var/cache/conftool/dbconfig/20221026-180742-ladsgroup.json
  • 18:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 18:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T321312)', diff saved to https://phabricator.wikimedia.org/P36610 and previous config saved to /var/cache/conftool/dbconfig/20221026-180718-ladsgroup.json
  • 18:01 Amir1: dbmaint on s8@eqiad (T321562)
  • 18:01 Amir1: dbmaint on s5@eqiad (T321562)
  • 18:00 Amir1: dbmaint on s3@eqiad (T321562)
  • 18:00 Amir1: dbmaint on s1@eqiad (T321562)
  • 17:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P36609 and previous config saved to /var/cache/conftool/dbconfig/20221026-175212-ladsgroup.json
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T321312)', diff saved to https://phabricator.wikimedia.org/P36607 and previous config saved to /var/cache/conftool/dbconfig/20221026-172159-ladsgroup.json
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T321312)', diff saved to https://phabricator.wikimedia.org/P36606 and previous config saved to /var/cache/conftool/dbconfig/20221026-171534-ladsgroup.json
  • 17:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 17:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T321312)', diff saved to https://phabricator.wikimedia.org/P36605 and previous config saved to /var/cache/conftool/dbconfig/20221026-171508-ladsgroup.json
  • 17:02 hashar@deploy1002: Finished deploy [releng/phatality@d8dfa72]: Update Phatality on codfw for OpenSearch Dashboard 2.2.0 # T304440 (duration: 00m 27s)
  • 17:01 hashar@deploy1002: Started deploy [releng/phatality@d8dfa72]: Update Phatality on codfw for OpenSearch Dashboard 2.2.0 # T304440
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P36604 and previous config saved to /var/cache/conftool/dbconfig/20221026-170001-ladsgroup.json
  • 16:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS buster
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P36603 and previous config saved to /var/cache/conftool/dbconfig/20221026-164455-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T321312)', diff saved to https://phabricator.wikimedia.org/P36602 and previous config saved to /var/cache/conftool/dbconfig/20221026-162948-ladsgroup.json
  • 16:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T318950)', diff saved to https://phabricator.wikimedia.org/P36601 and previous config saved to /var/cache/conftool/dbconfig/20221026-162549-ladsgroup.json
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T321312)', diff saved to https://phabricator.wikimedia.org/P36600 and previous config saved to /var/cache/conftool/dbconfig/20221026-162316-ladsgroup.json
  • 16:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 16:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 16:11 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4040.ulsfo.wmnet with OS buster
  • 16:11 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM lists1001.wikimedia.org
  • 16:11 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM lists1001.wikimedia.org
  • 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P36599 and previous config saved to /var/cache/conftool/dbconfig/20221026-161042-ladsgroup.json
  • 16:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:04 urbanecm@deploy1002: Finished scap: Backport for Revert "kswiki: Switch to wikitext mentor provider back" (T310905) (duration: 04m 16s)
  • 16:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 16:03 ladsgroup@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM lists1001.wikimedia.org
  • 16:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS buster
  • 16:00 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4040
  • 16:00 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4040
  • 16:00 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Revert "kswiki: Switch to wikitext mentor provider back" (T310905) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 16:00 urbanecm@deploy1002: Started scap: Backport for Revert "kswiki: Switch to wikitext mentor provider back" (T310905)
  • 15:59 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4041.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 15:58 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T321312)', diff saved to https://phabricator.wikimedia.org/P36598 and previous config saved to /var/cache/conftool/dbconfig/20221026-155738-ladsgroup.json
  • 15:57 volans@cumin2002: START - Cookbook sre.hosts.provision for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P36597 and previous config saved to /var/cache/conftool/dbconfig/20221026-155536-ladsgroup.json
  • 15:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 15:54 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4040.ulsfo.wmnet with OS buster
  • 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
  • 15:51 volans@cumin2002: START - Cookbook sre.hosts.provision for host cp4041.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 15:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
  • 15:47 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4046.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 15:46 volans@cumin2002: START - Cookbook sre.hosts.provision for host cp4046.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 15:29 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 15:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4039.ulsfo.wmnet with OS buster
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P36591 and previous config saved to /var/cache/conftool/dbconfig/20221026-152724-ladsgroup.json
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P36590 and previous config saved to /var/cache/conftool/dbconfig/20221026-152327-ladsgroup.json
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P36589 and previous config saved to /var/cache/conftool/dbconfig/20221026-152259-ladsgroup.json
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T321312)', diff saved to https://phabricator.wikimedia.org/P36588 and previous config saved to /var/cache/conftool/dbconfig/20221026-151216-ladsgroup.json
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P36587 and previous config saved to /var/cache/conftool/dbconfig/20221026-150821-ladsgroup.json
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P36586 and previous config saved to /var/cache/conftool/dbconfig/20221026-150752-ladsgroup.json
  • 15:02 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:59 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 14:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T321312)', diff saved to https://phabricator.wikimedia.org/P36585 and previous config saved to /var/cache/conftool/dbconfig/20221026-145924-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 14:59 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1103 (T321312)', diff saved to https://phabricator.wikimedia.org/P36584 and previous config saved to /var/cache/conftool/dbconfig/20221026-145848-ladsgroup.json
  • 14:56 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318950)', diff saved to https://phabricator.wikimedia.org/P36581 and previous config saved to /var/cache/conftool/dbconfig/20221026-145314-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T318950)', diff saved to https://phabricator.wikimedia.org/P36580 and previous config saved to /var/cache/conftool/dbconfig/20221026-145246-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T318950)', diff saved to https://phabricator.wikimedia.org/P36579 and previous config saved to /var/cache/conftool/dbconfig/20221026-145148-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36578 and previous config saved to /var/cache/conftool/dbconfig/20221026-145138-ladsgroup.json
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T318950)', diff saved to https://phabricator.wikimedia.org/P36577 and previous config saved to /var/cache/conftool/dbconfig/20221026-145033-ladsgroup.json
  • 14:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T318950)', diff saved to https://phabricator.wikimedia.org/P36576 and previous config saved to /var/cache/conftool/dbconfig/20221026-145023-ladsgroup.json
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1103', diff saved to https://phabricator.wikimedia.org/P36575 and previous config saved to /var/cache/conftool/dbconfig/20221026-144341-ladsgroup.json
  • 14:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:38 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2004.codfw.wmnet
  • 14:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P36574 and previous config saved to /var/cache/conftool/dbconfig/20221026-143631-ladsgroup.json
  • 14:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:36 urbanecm@deploy1002: Finished scap: Backport for kswiki: Switch to wikitext mentor provider back (T310905) (duration: 04m 47s)
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P36573 and previous config saved to /var/cache/conftool/dbconfig/20221026-143516-ladsgroup.json
  • 14:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2004.codfw.wmnet
  • 14:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
  • 14:31 urbanecm@deploy1002: urbanecm and urbanecm: Backport for kswiki: Switch to wikitext mentor provider back (T310905) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:31 urbanecm@deploy1002: Started scap: Backport for kswiki: Switch to wikitext mentor provider back (T310905)
  • 14:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS buster
  • 14:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
  • 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1103', diff saved to https://phabricator.wikimedia.org/P36571 and previous config saved to /var/cache/conftool/dbconfig/20221026-142834-ladsgroup.json
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36570 and previous config saved to /var/cache/conftool/dbconfig/20221026-142700-root.json
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36569 and previous config saved to /var/cache/conftool/dbconfig/20221026-142656-root.json
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36568 and previous config saved to /var/cache/conftool/dbconfig/20221026-142651-root.json
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P36567 and previous config saved to /var/cache/conftool/dbconfig/20221026-142125-ladsgroup.json
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P36566 and previous config saved to /var/cache/conftool/dbconfig/20221026-142010-ladsgroup.json
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36565 and previous config saved to /var/cache/conftool/dbconfig/20221026-141833-root.json
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36564 and previous config saved to /var/cache/conftool/dbconfig/20221026-141824-root.json
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36563 and previous config saved to /var/cache/conftool/dbconfig/20221026-141823-root.json
  • 14:14 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS buster
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1103 (T321312)', diff saved to https://phabricator.wikimedia.org/P36562 and previous config saved to /var/cache/conftool/dbconfig/20221026-141328-ladsgroup.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36561 and previous config saved to /var/cache/conftool/dbconfig/20221026-141155-root.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36560 and previous config saved to /var/cache/conftool/dbconfig/20221026-141151-root.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36559 and previous config saved to /var/cache/conftool/dbconfig/20221026-141146-root.json
  • 14:11 papaul: disable interface et-1/0/2 on cr1-eqiad to bounce fpc 1 pic0
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36558 and previous config saved to /var/cache/conftool/dbconfig/20221026-140618-ladsgroup.json
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1103 (T321312)', diff saved to https://phabricator.wikimedia.org/P36557 and previous config saved to /var/cache/conftool/dbconfig/20221026-140510-ladsgroup.json
  • 14:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1103.eqiad.wmnet with reason: Maintenance
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T318950)', diff saved to https://phabricator.wikimedia.org/P36556 and previous config saved to /var/cache/conftool/dbconfig/20221026-140503-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1103.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36555 and previous config saved to /var/cache/conftool/dbconfig/20221026-140351-ladsgroup.json
  • 14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36554 and previous config saved to /var/cache/conftool/dbconfig/20221026-140328-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36553 and previous config saved to /var/cache/conftool/dbconfig/20221026-140320-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36552 and previous config saved to /var/cache/conftool/dbconfig/20221026-140318-root.json
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T321312)', diff saved to https://phabricator.wikimedia.org/P36551 and previous config saved to /var/cache/conftool/dbconfig/20221026-140312-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T318950)', diff saved to https://phabricator.wikimedia.org/P36550 and previous config saved to /var/cache/conftool/dbconfig/20221026-140250-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36549 and previous config saved to /var/cache/conftool/dbconfig/20221026-140219-ladsgroup.json
  • 13:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 13:45 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 13:44 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 13:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:44 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 13:43 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 13:43 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36539 and previous config saved to /var/cache/conftool/dbconfig/20221026-134145-root.json
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36538 and previous config saved to /var/cache/conftool/dbconfig/20221026-134141-root.json
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36537 and previous config saved to /var/cache/conftool/dbconfig/20221026-134136-root.json
  • 13:39 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 13:38 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 13:38 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 13:38 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 13:37 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 13:37 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 13:35 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:34 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P36536 and previous config saved to /var/cache/conftool/dbconfig/20221026-133317-ladsgroup.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36535 and previous config saved to /var/cache/conftool/dbconfig/20221026-133312-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36534 and previous config saved to /var/cache/conftool/dbconfig/20221026-133308-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36533 and previous config saved to /var/cache/conftool/dbconfig/20221026-133303-root.json
  • 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P36532 and previous config saved to /var/cache/conftool/dbconfig/20221026-133259-ladsgroup.json
  • 13:33 urbanecm: UTC afternoon B&C window done
  • 13:32 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P36531 and previous config saved to /var/cache/conftool/dbconfig/20221026-133206-ladsgroup.json
  • 13:30 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: Enable link recommendation frontend for 5th round (T304549) (duration: 05m 52s)
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:29 sukhe: sudo ipmitool -I lanplus -H "cp4039.mgmt.ulsfo.wmnet" -U root -E chassis power cycle
  • 13:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:28 moritzm: installing curl security updates on buster
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36530 and previous config saved to /var/cache/conftool/dbconfig/20221026-132640-root.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36529 and previous config saved to /var/cache/conftool/dbconfig/20221026-132637-root.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36528 and previous config saved to /var/cache/conftool/dbconfig/20221026-132631-root.json
  • 13:25 urbanecm@deploy1002: urbanecm and kharlan: Backport for GrowthExperiments: Enable link recommendation frontend for 5th round (T304549) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:24 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: Enable link recommendation frontend for 5th round (T304549)
  • 13:23 urbanecm@deploy1002: Finished scap: Backport for [Growth] Enable structured mentor list everywhere (T310905) (duration: 10m 28s)
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:22 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS buster
  • 13:21 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:21 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T318950)', diff saved to https://phabricator.wikimedia.org/P36527 and previous config saved to /var/cache/conftool/dbconfig/20221026-131810-ladsgroup.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36526 and previous config saved to /var/cache/conftool/dbconfig/20221026-131807-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36525 and previous config saved to /var/cache/conftool/dbconfig/20221026-131803-root.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36524 and previous config saved to /var/cache/conftool/dbconfig/20221026-131758-root.json
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T321312)', diff saved to https://phabricator.wikimedia.org/P36523 and previous config saved to /var/cache/conftool/dbconfig/20221026-131752-ladsgroup.json
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36522 and previous config saved to /var/cache/conftool/dbconfig/20221026-131659-ladsgroup.json
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T318950)', diff saved to https://phabricator.wikimedia.org/P36521 and previous config saved to /var/cache/conftool/dbconfig/20221026-131544-ladsgroup.json
  • 13:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36520 and previous config saved to /var/cache/conftool/dbconfig/20221026-131534-ladsgroup.json
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36513 and previous config saved to /var/cache/conftool/dbconfig/20221026-130302-root.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36512 and previous config saved to /var/cache/conftool/dbconfig/20221026-130258-root.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36511 and previous config saved to /var/cache/conftool/dbconfig/20221026-130253-root.json
  • 13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P36510 and previous config saved to /var/cache/conftool/dbconfig/20221026-130027-ladsgroup.json
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P36509 and previous config saved to /var/cache/conftool/dbconfig/20221026-125918-ladsgroup.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36508 and previous config saved to /var/cache/conftool/dbconfig/20221026-125630-root.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36507 and previous config saved to /var/cache/conftool/dbconfig/20221026-125626-root.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36506 and previous config saved to /var/cache/conftool/dbconfig/20221026-125621-root.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P36505 and previous config saved to /var/cache/conftool/dbconfig/20221026-125604-ladsgroup.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36504 and previous config saved to /var/cache/conftool/dbconfig/20221026-124757-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36503 and previous config saved to /var/cache/conftool/dbconfig/20221026-124753-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36502 and previous config saved to /var/cache/conftool/dbconfig/20221026-124748-root.json
  • 12:47 moritzm: installing isc-dhcp security updates
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P36501 and previous config saved to /var/cache/conftool/dbconfig/20221026-124521-ladsgroup.json
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P36500 and previous config saved to /var/cache/conftool/dbconfig/20221026-124411-ladsgroup.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36499 and previous config saved to /var/cache/conftool/dbconfig/20221026-124125-root.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36498 and previous config saved to /var/cache/conftool/dbconfig/20221026-124121-root.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36497 and previous config saved to /var/cache/conftool/dbconfig/20221026-124116-root.json
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P36496 and previous config saved to /var/cache/conftool/dbconfig/20221026-124057-ladsgroup.json
  • 12:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to plain
  • 12:38 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to plain
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1034 es1033 es1032', diff saved to https://phabricator.wikimedia.org/P36495 and previous config saved to /var/cache/conftool/dbconfig/20221026-123545-root.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36494 and previous config saved to /var/cache/conftool/dbconfig/20221026-123252-root.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36493 and previous config saved to /var/cache/conftool/dbconfig/20221026-123248-root.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36492 and previous config saved to /var/cache/conftool/dbconfig/20221026-123243-root.json
  • 12:32 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS buster
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36491 and previous config saved to /var/cache/conftool/dbconfig/20221026-123014-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T318950)', diff saved to https://phabricator.wikimedia.org/P36490 and previous config saved to /var/cache/conftool/dbconfig/20221026-122905-ladsgroup.json
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36489 and previous config saved to /var/cache/conftool/dbconfig/20221026-122748-ladsgroup.json
  • 12:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 12:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T318950)', diff saved to https://phabricator.wikimedia.org/P36488 and previous config saved to /var/cache/conftool/dbconfig/20221026-122726-ladsgroup.json
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T318950)', diff saved to https://phabricator.wikimedia.org/P36487 and previous config saved to /var/cache/conftool/dbconfig/20221026-122652-ladsgroup.json
  • 12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36486 and previous config saved to /var/cache/conftool/dbconfig/20221026-122641-ladsgroup.json
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2034 es20233 es2032', diff saved to https://phabricator.wikimedia.org/P36485 and previous config saved to /var/cache/conftool/dbconfig/20221026-122632-root.json
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T321312)', diff saved to https://phabricator.wikimedia.org/P36484 and previous config saved to /var/cache/conftool/dbconfig/20221026-122550-ladsgroup.json
  • 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to drbd
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T321312)', diff saved to https://phabricator.wikimedia.org/P36483 and previous config saved to /var/cache/conftool/dbconfig/20221026-121928-ladsgroup.json
  • 12:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T321312)', diff saved to https://phabricator.wikimedia.org/P36482 and previous config saved to /var/cache/conftool/dbconfig/20221026-121853-ladsgroup.json
  • 12:13 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to drbd
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P36481 and previous config saved to /var/cache/conftool/dbconfig/20221026-121220-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P36480 and previous config saved to /var/cache/conftool/dbconfig/20221026-121122-ladsgroup.json
  • 12:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS buster
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P36479 and previous config saved to /var/cache/conftool/dbconfig/20221026-120346-ladsgroup.json
  • 12:02 moritzm: draining ganeti1009 for eventual reimage T311687
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P36478 and previous config saved to /var/cache/conftool/dbconfig/20221026-115714-ladsgroup.json
  • 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P36477 and previous config saved to /var/cache/conftool/dbconfig/20221026-115615-ladsgroup.json
  • 11:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P36476 and previous config saved to /var/cache/conftool/dbconfig/20221026-114840-ladsgroup.json
  • 11:46 sukhe: sudo ipmitool -I lanplus -H "cp4046.mgmt.ulsfo.wmnet" -U root -E chassis power cycle
  • 11:45 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS buster
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T318950)', diff saved to https://phabricator.wikimedia.org/P36475 and previous config saved to /var/cache/conftool/dbconfig/20221026-114207-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36474 and previous config saved to /var/cache/conftool/dbconfig/20221026-114109-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T318950)', diff saved to https://phabricator.wikimedia.org/P36473 and previous config saved to /var/cache/conftool/dbconfig/20221026-113941-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T318950)', diff saved to https://phabricator.wikimedia.org/P36472 and previous config saved to /var/cache/conftool/dbconfig/20221026-113925-ladsgroup.json
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36471 and previous config saved to /var/cache/conftool/dbconfig/20221026-113856-ladsgroup.json
  • 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T318950)', diff saved to https://phabricator.wikimedia.org/P36470 and previous config saved to /var/cache/conftool/dbconfig/20221026-113835-ladsgroup.json
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T321312)', diff saved to https://phabricator.wikimedia.org/P36469 and previous config saved to /var/cache/conftool/dbconfig/20221026-113333-ladsgroup.json
  • 11:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 11:29 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4046
  • 11:29 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4046
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T321312)', diff saved to https://phabricator.wikimedia.org/P36468 and previous config saved to /var/cache/conftool/dbconfig/20221026-112634-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T321312)', diff saved to https://phabricator.wikimedia.org/P36467 and previous config saved to /var/cache/conftool/dbconfig/20221026-112609-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P36466 and previous config saved to /var/cache/conftool/dbconfig/20221026-112419-ladsgroup.json
  • 11:24 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS buster
  • 11:23 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1015.eqiad.wmnet to cluster eqiad and group B
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P36465 and previous config saved to /var/cache/conftool/dbconfig/20221026-112328-ladsgroup.json
  • 11:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1015.eqiad.wmnet to cluster eqiad and group B
  • 11:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1023.eqiad.wmnet to cluster eqiad and group B
  • 11:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1023.eqiad.wmnet to cluster eqiad and group B
  • 11:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P36461 and previous config saved to /var/cache/conftool/dbconfig/20221026-105556-ladsgroup.json
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T318950)', diff saved to https://phabricator.wikimedia.org/P36460 and previous config saved to /var/cache/conftool/dbconfig/20221026-105406-ladsgroup.json
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T318950)', diff saved to https://phabricator.wikimedia.org/P36459 and previous config saved to /var/cache/conftool/dbconfig/20221026-105315-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T318950)', diff saved to https://phabricator.wikimedia.org/P36458 and previous config saved to /var/cache/conftool/dbconfig/20221026-105140-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T318950)', diff saved to https://phabricator.wikimedia.org/P36457 and previous config saved to /var/cache/conftool/dbconfig/20221026-105129-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T318950)', diff saved to https://phabricator.wikimedia.org/P36456 and previous config saved to /var/cache/conftool/dbconfig/20221026-105102-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36455 and previous config saved to /var/cache/conftool/dbconfig/20221026-105052-ladsgroup.json
  • 10:50 dcausse: restarting blazegraph on wdqs1007 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1023.eqiad.wmnet to cluster eqiad and group A
  • 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1023.eqiad.wmnet to cluster eqiad and group A
  • 10:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T321312)', diff saved to https://phabricator.wikimedia.org/P36454 and previous config saved to /var/cache/conftool/dbconfig/20221026-104050-ladsgroup.json
  • 10:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P36453 and previous config saved to /var/cache/conftool/dbconfig/20221026-103623-ladsgroup.json
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P36452 and previous config saved to /var/cache/conftool/dbconfig/20221026-103545-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T321312)', diff saved to https://phabricator.wikimedia.org/P36451 and previous config saved to /var/cache/conftool/dbconfig/20221026-103432-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T321312)', diff saved to https://phabricator.wikimedia.org/P36450 and previous config saved to /var/cache/conftool/dbconfig/20221026-103407-ladsgroup.json
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T321312)', diff saved to https://phabricator.wikimedia.org/P36449 and previous config saved to /var/cache/conftool/dbconfig/20221026-103138-ladsgroup.json
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P36448 and previous config saved to /var/cache/conftool/dbconfig/20221026-102116-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P36447 and previous config saved to /var/cache/conftool/dbconfig/20221026-102039-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P36446 and previous config saved to /var/cache/conftool/dbconfig/20221026-101901-ladsgroup.json
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P36445 and previous config saved to /var/cache/conftool/dbconfig/20221026-101631-ladsgroup.json
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T318950)', diff saved to https://phabricator.wikimedia.org/P36444 and previous config saved to /var/cache/conftool/dbconfig/20221026-100610-ladsgroup.json
  • 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36443 and previous config saved to /var/cache/conftool/dbconfig/20221026-100532-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P36442 and previous config saved to /var/cache/conftool/dbconfig/20221026-100354-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T318950)', diff saved to https://phabricator.wikimedia.org/P36441 and previous config saved to /var/cache/conftool/dbconfig/20221026-100344-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36440 and previous config saved to /var/cache/conftool/dbconfig/20221026-100319-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 10:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P36439 and previous config saved to /var/cache/conftool/dbconfig/20221026-100125-ladsgroup.json
  • 09:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9038
  • 09:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9038
  • 09:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 59605
  • 09:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 59605
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T321312)', diff saved to https://phabricator.wikimedia.org/P36438 and previous config saved to /var/cache/conftool/dbconfig/20221026-094841-ladsgroup.json
  • 09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T321312)', diff saved to https://phabricator.wikimedia.org/P36437 and previous config saved to /var/cache/conftool/dbconfig/20221026-094619-ladsgroup.json
  • 09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T321312)', diff saved to https://phabricator.wikimedia.org/P36436 and previous config saved to /var/cache/conftool/dbconfig/20221026-094226-ladsgroup.json
  • 09:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T321312)', diff saved to https://phabricator.wikimedia.org/P36435 and previous config saved to /var/cache/conftool/dbconfig/20221026-094154-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T321312)', diff saved to https://phabricator.wikimedia.org/P36434 and previous config saved to /var/cache/conftool/dbconfig/20221026-093842-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T321312)', diff saved to https://phabricator.wikimedia.org/P36433 and previous config saved to /var/cache/conftool/dbconfig/20221026-093816-ladsgroup.json
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P36432 and previous config saved to /var/cache/conftool/dbconfig/20221026-093509-ladsgroup.json
  • 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P36431 and previous config saved to /var/cache/conftool/dbconfig/20221026-092647-ladsgroup.json
  • 09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1015.eqiad.wmnet with OS bullseye
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P36430 and previous config saved to /var/cache/conftool/dbconfig/20221026-092310-ladsgroup.json
  • 09:20 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P36429 and previous config saved to /var/cache/conftool/dbconfig/20221026-092004-ladsgroup.json
  • 09:17 jbond: add netbx yes will do thanks
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P36428 and previous config saved to /var/cache/conftool/dbconfig/20221026-091141-ladsgroup.json
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1015.eqiad.wmnet with reason: host reimage
  • 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P36427 and previous config saved to /var/cache/conftool/dbconfig/20221026-090803-ladsgroup.json
  • 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1015.eqiad.wmnet with reason: host reimage
  • 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P36426 and previous config saved to /var/cache/conftool/dbconfig/20221026-090459-ladsgroup.json
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T321312)', diff saved to https://phabricator.wikimedia.org/P36425 and previous config saved to /var/cache/conftool/dbconfig/20221026-085634-ladsgroup.json
  • 08:55 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:55 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T321312)', diff saved to https://phabricator.wikimedia.org/P36424 and previous config saved to /var/cache/conftool/dbconfig/20221026-085257-ladsgroup.json
  • 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1015.eqiad.wmnet with OS bullseye
  • 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T321312)', diff saved to https://phabricator.wikimedia.org/P36423 and previous config saved to /var/cache/conftool/dbconfig/20221026-085022-ladsgroup.json
  • 08:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 08:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P36422 and previous config saved to /var/cache/conftool/dbconfig/20221026-084954-ladsgroup.json
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P36421 and previous config saved to /var/cache/conftool/dbconfig/20221026-084922-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2115 (T321312)', diff saved to https://phabricator.wikimedia.org/P36420 and previous config saved to /var/cache/conftool/dbconfig/20221026-084741-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2115.codfw.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2115.codfw.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36418 and previous config saved to /var/cache/conftool/dbconfig/20221026-083157-root.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36417 and previous config saved to /var/cache/conftool/dbconfig/20221026-083149-root.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36416 and previous config saved to /var/cache/conftool/dbconfig/20221026-083142-root.json
  • 08:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1015.eqiad.wmnet with reason: Remove from cluster for eventual reimage
  • 08:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1015.eqiad.wmnet with reason: Remove from cluster for eventual reimage
  • 08:20 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 08:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 08:18 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 08:18 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 08:18 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 08:17 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36415 and previous config saved to /var/cache/conftool/dbconfig/20221026-081652-root.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36414 and previous config saved to /var/cache/conftool/dbconfig/20221026-081644-root.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36413 and previous config saved to /var/cache/conftool/dbconfig/20221026-081637-root.json
  • 08:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:15 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 08:14 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:12 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.7 refs T320512 (duration: 03m 46s)
  • 08:12 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 08:11 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:11 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 08:10 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 08:08 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.7 refs T320512
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36412 and previous config saved to /var/cache/conftool/dbconfig/20221026-080643-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36411 and previous config saved to /var/cache/conftool/dbconfig/20221026-080638-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36410 and previous config saved to /var/cache/conftool/dbconfig/20221026-080631-root.json
  • 08:02 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 08:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36409 and previous config saved to /var/cache/conftool/dbconfig/20221026-080147-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36408 and previous config saved to /var/cache/conftool/dbconfig/20221026-080139-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36407 and previous config saved to /var/cache/conftool/dbconfig/20221026-080132-root.json
  • 08:00 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 08:00 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 08:00 jbond: upload python3.9 packages for buster (component python39)
  • 07:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 07:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 07:56 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:56 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:55 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:55 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:53 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:52 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36406 and previous config saved to /var/cache/conftool/dbconfig/20221026-075138-root.json
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36405 and previous config saved to /var/cache/conftool/dbconfig/20221026-075133-root.json
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36404 and previous config saved to /var/cache/conftool/dbconfig/20221026-075126-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36403 and previous config saved to /var/cache/conftool/dbconfig/20221026-074642-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36402 and previous config saved to /var/cache/conftool/dbconfig/20221026-074634-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36401 and previous config saved to /var/cache/conftool/dbconfig/20221026-074627-root.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36400 and previous config saved to /var/cache/conftool/dbconfig/20221026-073633-root.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36399 and previous config saved to /var/cache/conftool/dbconfig/20221026-073628-root.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36398 and previous config saved to /var/cache/conftool/dbconfig/20221026-073621-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36397 and previous config saved to /var/cache/conftool/dbconfig/20221026-073137-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36396 and previous config saved to /var/cache/conftool/dbconfig/20221026-073129-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36395 and previous config saved to /var/cache/conftool/dbconfig/20221026-073122-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36394 and previous config saved to /var/cache/conftool/dbconfig/20221026-072128-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36393 and previous config saved to /var/cache/conftool/dbconfig/20221026-072123-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36392 and previous config saved to /var/cache/conftool/dbconfig/20221026-072116-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36391 and previous config saved to /var/cache/conftool/dbconfig/20221026-071632-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36390 and previous config saved to /var/cache/conftool/dbconfig/20221026-071624-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36389 and previous config saved to /var/cache/conftool/dbconfig/20221026-071617-root.json
  • 07:10 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 07:09 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 07:09 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 07:08 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36388 and previous config saved to /var/cache/conftool/dbconfig/20221026-070623-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36387 and previous config saved to /var/cache/conftool/dbconfig/20221026-070618-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36386 and previous config saved to /var/cache/conftool/dbconfig/20221026-070611-root.json
  • 07:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25091
  • 07:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25091
  • 07:04 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 25091
  • 07:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25091
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36385 and previous config saved to /var/cache/conftool/dbconfig/20221026-070127-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36384 and previous config saved to /var/cache/conftool/dbconfig/20221026-070119-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36383 and previous config saved to /var/cache/conftool/dbconfig/20221026-070112-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36382 and previous config saved to /var/cache/conftool/dbconfig/20221026-065118-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36381 and previous config saved to /var/cache/conftool/dbconfig/20221026-065113-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36380 and previous config saved to /var/cache/conftool/dbconfig/20221026-065106-root.json
  • 06:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 16509
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36379 and previous config saved to /var/cache/conftool/dbconfig/20221026-064622-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36378 and previous config saved to /var/cache/conftool/dbconfig/20221026-064614-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36377 and previous config saved to /var/cache/conftool/dbconfig/20221026-064607-root.json
  • 06:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 16509
  • 06:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 63949
  • 06:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 63949
  • 06:38 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36376 and previous config saved to /var/cache/conftool/dbconfig/20221026-063613-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36375 and previous config saved to /var/cache/conftool/dbconfig/20221026-063608-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36374 and previous config saved to /var/cache/conftool/dbconfig/20221026-063601-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2029 es2030 es2031', diff saved to https://phabricator.wikimedia.org/P36373 and previous config saved to /var/cache/conftool/dbconfig/20221026-063524-root.json
  • 06:34 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 06:33 _joe_: build2001:~# docker-registryctl delete-tags docker-registry.discovery.wmnet/httpd-fcgi:2.4.38-7 (to fix the uid issues)
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36372 and previous config saved to /var/cache/conftool/dbconfig/20221026-062108-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36371 and previous config saved to /var/cache/conftool/dbconfig/20221026-062103-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36370 and previous config saved to /var/cache/conftool/dbconfig/20221026-062056-root.json
  • 06:18 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029 es1030 es1031', diff saved to https://phabricator.wikimedia.org/P36369 and previous config saved to /var/cache/conftool/dbconfig/20221026-061044-root.json
  • 06:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply

2022-10-25

  • 22:33 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4052.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:32 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4051.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:30 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:29 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:28 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4040
  • 22:28 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4040
  • 22:24 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4009.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:24 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4010.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:24 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:20 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs4010.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:19 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs4009.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:18 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:18 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:18 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4042.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:17 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4042.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:17 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:17 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4050.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:17 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4048.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:16 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4046.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:04 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4050.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:04 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4048.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:03 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4046.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T321312)', diff saved to https://phabricator.wikimedia.org/P36368 and previous config saved to /var/cache/conftool/dbconfig/20221025-220249-ladsgroup.json
  • 22:01 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:01 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4051
  • 22:00 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4051
  • 21:59 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:57 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4044.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:51 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp4035.ulsfo.wmnet
  • 21:51 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:50 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:50 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P36367 and previous config saved to /var/cache/conftool/dbconfig/20221025-214743-ladsgroup.json
  • 21:47 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs4010
  • 21:46 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:46 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs4010
  • 21:46 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs4009
  • 21:45 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs4009
  • 21:43 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4044.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:43 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4042.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:42 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4035.ulsfo.wmnet
  • 21:41 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:38 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp4033.ulsfo.wmnet
  • 21:38 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:37 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:37 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:34 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P36366 and previous config saved to /var/cache/conftool/dbconfig/20221025-213236-ladsgroup.json
  • 21:29 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4033.ulsfo.wmnet
  • 21:28 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4042.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:21 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T321312)', diff saved to https://phabricator.wikimedia.org/P36365 and previous config saved to /var/cache/conftool/dbconfig/20221025-211730-ladsgroup.json
  • 21:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4038.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:12 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp4033.ulsfo.wmnet
  • 21:12 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:12 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp4035.ulsfo.wmnet
  • 21:12 robh@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T321312)', diff saved to https://phabricator.wikimedia.org/P36364 and previous config saved to /var/cache/conftool/dbconfig/20221025-211125-ladsgroup.json
  • 21:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 21:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T321312)', diff saved to https://phabricator.wikimedia.org/P36363 and previous config saved to /var/cache/conftool/dbconfig/20221025-211058-ladsgroup.json
  • 21:08 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:08 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:05 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:03 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4035.ulsfo.wmnet
  • 21:03 volans@cumin2002: START - Cookbook sre.hosts.provision for host cp4038.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:03 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4033.ulsfo.wmnet
  • 21:03 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:01 cjming: end of UTC late backport window
  • 20:59 cjming@deploy1002: Finished scap: Backport for Revert tagline of zhwiki (cont.) (duration: 04m 49s)
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T321312)', diff saved to https://phabricator.wikimedia.org/P36362 and previous config saved to /var/cache/conftool/dbconfig/20221025-205902-ladsgroup.json
  • 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P36361 and previous config saved to /var/cache/conftool/dbconfig/20221025-205551-ladsgroup.json
  • 20:55 cjming@deploy1002: cjming and stang: Backport for Revert tagline of zhwiki (cont.) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:55 cjming@deploy1002: Started scap: Backport for Revert tagline of zhwiki (cont.)
  • 20:53 cjming@deploy1002: Finished scap: Backport for Revert tagline of zhwiki (duration: 09m 11s)
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4052
  • 20:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4052
  • 20:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4050
  • 20:48 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4050
  • 20:48 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 20:44 cjming@deploy1002: cjming and stang: Backport for Revert tagline of zhwiki synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:44 cjming@deploy1002: Started scap: Backport for Revert tagline of zhwiki
  • 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P36360 and previous config saved to /var/cache/conftool/dbconfig/20221025-204356-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P36359 and previous config saved to /var/cache/conftool/dbconfig/20221025-204045-ladsgroup.json
  • 20:37 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4038.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host cp4038.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P36358 and previous config saved to /var/cache/conftool/dbconfig/20221025-202849-ladsgroup.json
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:28 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4038.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:27 cjming@deploy1002: Finished scap: Backport for Update remaining Wikipedia logos (T319223) (duration: 06m 48s)
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T321312)', diff saved to https://phabricator.wikimedia.org/P36357 and previous config saved to /var/cache/conftool/dbconfig/20221025-202538-ladsgroup.json
  • 20:24 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4034.ulsfo.wmnet
  • 20:24 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:23 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4036.ulsfo.wmnet
  • 20:23 robh@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:21 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 20:21 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 20:20 cjming@deploy1002: cjming and jdlrobson: Backport for Update remaining Wikipedia logos (T319223) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:20 cjming@deploy1002: Started scap: Backport for Update remaining Wikipedia logos (T319223)
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T321312)', diff saved to https://phabricator.wikimedia.org/P36356 and previous config saved to /var/cache/conftool/dbconfig/20221025-201918-ladsgroup.json
  • 20:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 20:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T321312)', diff saved to https://phabricator.wikimedia.org/P36355 and previous config saved to /var/cache/conftool/dbconfig/20221025-201852-ladsgroup.json
  • 20:15 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4036.ulsfo.wmnet
  • 20:14 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4034.ulsfo.wmnet
  • 20:14 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts cp4034.ulsfo.wmnet
  • 20:13 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4034.ulsfo.wmnet
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T321312)', diff saved to https://phabricator.wikimedia.org/P36354 and previous config saved to /var/cache/conftool/dbconfig/20221025-201343-ladsgroup.json
  • 20:11 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4048
  • 20:11 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4048
  • 20:11 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4046
  • 20:11 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4046
  • 20:11 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4044
  • 20:10 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4044
  • 20:10 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4042
  • 20:10 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4042
  • 20:10 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4040
  • 20:10 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4040
  • 20:09 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T321312)', diff saved to https://phabricator.wikimedia.org/P36353 and previous config saved to /var/cache/conftool/dbconfig/20221025-200746-ladsgroup.json
  • 20:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 20:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 20:07 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T321312)', diff saved to https://phabricator.wikimedia.org/P36352 and previous config saved to /var/cache/conftool/dbconfig/20221025-200723-ladsgroup.json
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P36351 and previous config saved to /var/cache/conftool/dbconfig/20221025-200344-ladsgroup.json
  • 20:00 volans@cumin2002: START - Cookbook sre.hosts.provision for host cp4038.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:57 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4038
  • 19:56 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4038
  • 19:56 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:54 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P36350 and previous config saved to /var/cache/conftool/dbconfig/20221025-195216-ladsgroup.json
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P36349 and previous config saved to /var/cache/conftool/dbconfig/20221025-194838-ladsgroup.json
  • 19:39 cwhite: logstash opensearch 2.2.0 codfw transition complete T304440
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P36348 and previous config saved to /var/cache/conftool/dbconfig/20221025-193709-ladsgroup.json
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T321312)', diff saved to https://phabricator.wikimedia.org/P36347 and previous config saved to /var/cache/conftool/dbconfig/20221025-193331-ladsgroup.json
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T321312)', diff saved to https://phabricator.wikimedia.org/P36345 and previous config saved to /var/cache/conftool/dbconfig/20221025-192556-ladsgroup.json
  • 19:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T321312)', diff saved to https://phabricator.wikimedia.org/P36344 and previous config saved to /var/cache/conftool/dbconfig/20221025-192526-ladsgroup.json
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T321312)', diff saved to https://phabricator.wikimedia.org/P36343 and previous config saved to /var/cache/conftool/dbconfig/20221025-192203-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T321312)', diff saved to https://phabricator.wikimedia.org/P36342 and previous config saved to /var/cache/conftool/dbconfig/20221025-191552-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T321312)', diff saved to https://phabricator.wikimedia.org/P36341 and previous config saved to /var/cache/conftool/dbconfig/20221025-191527-ladsgroup.json
  • 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P36340 and previous config saved to /var/cache/conftool/dbconfig/20221025-191020-ladsgroup.json
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P36339 and previous config saved to /var/cache/conftool/dbconfig/20221025-190021-ladsgroup.json
  • 18:59 inflatador: bking@elastic2070 'restarting elastic7 services to apply 838141'
  • 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4032.ulsfo.wmnet
  • 18:57 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:56 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P36338 and previous config saved to /var/cache/conftool/dbconfig/20221025-185513-ladsgroup.json
  • 18:53 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4028.ulsfo.wmnet
  • 18:53 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:53 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4030.ulsfo.wmnet
  • 18:52 robh@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:50 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:50 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:49 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4032.ulsfo.wmnet
  • 18:49 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4026.ulsfo.wmnet
  • 18:49 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:46 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:45 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4030.ulsfo.wmnet
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P36337 and previous config saved to /var/cache/conftool/dbconfig/20221025-184514-ladsgroup.json
  • 18:44 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4028.ulsfo.wmnet
  • 18:41 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4026.ulsfo.wmnet
  • 18:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T321312)', diff saved to https://phabricator.wikimedia.org/P36336 and previous config saved to /var/cache/conftool/dbconfig/20221025-184006-ladsgroup.json
  • 18:38 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4024.ulsfo.wmnet
  • 18:38 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:37 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4022.ulsfo.wmnet
  • 18:37 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:37 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:34 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T321312)', diff saved to https://phabricator.wikimedia.org/P36335 and previous config saved to /var/cache/conftool/dbconfig/20221025-183224-ladsgroup.json
  • 18:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 18:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T321312)', diff saved to https://phabricator.wikimedia.org/P36334 and previous config saved to /var/cache/conftool/dbconfig/20221025-183158-ladsgroup.json
  • 18:31 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4024.ulsfo.wmnet
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T321312)', diff saved to https://phabricator.wikimedia.org/P36333 and previous config saved to /var/cache/conftool/dbconfig/20221025-183008-ladsgroup.json
  • 18:29 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4022.ulsfo.wmnet
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T321312)', diff saved to https://phabricator.wikimedia.org/P36332 and previous config saved to /var/cache/conftool/dbconfig/20221025-182402-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T321312)', diff saved to https://phabricator.wikimedia.org/P36331 and previous config saved to /var/cache/conftool/dbconfig/20221025-182336-ladsgroup.json
  • 18:22 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4043.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:20 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4039.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:19 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4041.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P36330 and previous config saved to /var/cache/conftool/dbconfig/20221025-181652-ladsgroup.json
  • 18:09 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4043.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P36329 and previous config saved to /var/cache/conftool/dbconfig/20221025-180830-ladsgroup.json
  • 18:07 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4041.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:07 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4039.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:05 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4043
  • 18:05 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4043
  • 18:04 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4041
  • 18:04 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4041
  • 18:04 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4039
  • 18:04 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4039
  • 18:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P36328 and previous config saved to /var/cache/conftool/dbconfig/20221025-180145-ladsgroup.json
  • 18:00 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:58 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:58 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4023.ulsfo.wmnet
  • 17:58 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:57 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4025.ulsfo.wmnet
  • 17:57 robh@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P36327 and previous config saved to /var/cache/conftool/dbconfig/20221025-175323-ladsgroup.json
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T321312)', diff saved to https://phabricator.wikimedia.org/P36326 and previous config saved to /var/cache/conftool/dbconfig/20221025-174639-ladsgroup.json
  • 17:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T321312)', diff saved to https://phabricator.wikimedia.org/P36325 and previous config saved to /var/cache/conftool/dbconfig/20221025-174013-ladsgroup.json
  • 17:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 17:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T321312)', diff saved to https://phabricator.wikimedia.org/P36324 and previous config saved to /var/cache/conftool/dbconfig/20221025-173817-ladsgroup.json
  • 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T321312)', diff saved to https://phabricator.wikimedia.org/P36323 and previous config saved to /var/cache/conftool/dbconfig/20221025-172909-ladsgroup.json
  • 17:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T321312)', diff saved to https://phabricator.wikimedia.org/P36322 and previous config saved to /var/cache/conftool/dbconfig/20221025-172844-ladsgroup.json
  • 17:25 mforns@deploy1002: Finished deploy [analytics/refinery@d3b7785] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d3b7785] (duration: 01m 04s)
  • 17:23 mforns@deploy1002: Started deploy [analytics/refinery@d3b7785] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d3b7785]
  • 17:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P36321 and previous config saved to /var/cache/conftool/dbconfig/20221025-171337-ladsgroup.json
  • 17:12 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:05 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:05 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:02 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 17:02 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:02 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P36320 and previous config saved to /var/cache/conftool/dbconfig/20221025-165831-ladsgroup.json
  • 16:53 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4025.ulsfo.wmnet
  • 16:52 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4023.ulsfo.wmnet
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T321312)', diff saved to https://phabricator.wikimedia.org/P36319 and previous config saved to /var/cache/conftool/dbconfig/20221025-165028-ladsgroup.json
  • 16:47 papaul: disable interface et-1/0/2 on cr2-eaid to bounce fpc 1 pic0
  • 16:43 volans: uploaded python3-gjson 0.2.1 to apt.wikimedia.org bullseye-wikimedia
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T321312)', diff saved to https://phabricator.wikimedia.org/P36318 and previous config saved to /var/cache/conftool/dbconfig/20221025-164324-ladsgroup.json
  • 16:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T321312)', diff saved to https://phabricator.wikimedia.org/P36317 and previous config saved to /var/cache/conftool/dbconfig/20221025-163719-ladsgroup.json
  • 16:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P36316 and previous config saved to /var/cache/conftool/dbconfig/20221025-163522-ladsgroup.json
  • 16:33 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P36315 and previous config saved to /var/cache/conftool/dbconfig/20221025-162015-ladsgroup.json
  • 16:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wcqs2002.codfw.wmnet
  • 16:06 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for wcqs2002.codfw.wmnet
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T321312)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20221025-160504-ladsgroup.json
  • 16:04 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 16:04 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 16:04 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T321312)', diff saved to https://phabricator.wikimedia.org/P36313 and previous config saved to /var/cache/conftool/dbconfig/20221025-155855-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T321312)', diff saved to https://phabricator.wikimedia.org/P36312 and previous config saved to /var/cache/conftool/dbconfig/20221025-155828-ladsgroup.json
  • 15:56 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P36311 and previous config saved to /var/cache/conftool/dbconfig/20221025-154321-ladsgroup.json
  • 15:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8399
  • 15:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8399
  • 15:30 claime: added package otelcol-contrib_0.62.1_linux_amd64.deb to component thirdparty/otelcol-contrib for bullseye-wikimedia and buster-wikimedia - T320551
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P36310 and previous config saved to /var/cache/conftool/dbconfig/20221025-152815-ladsgroup.json
  • 15:25 claime: added component thirdparty/otelcol-contrib to apt repository
  • 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T321312)', diff saved to https://phabricator.wikimedia.org/P36309 and previous config saved to /var/cache/conftool/dbconfig/20221025-151308-ladsgroup.json
  • 15:11 moritzm: installing isc-dhcp security updates
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T321312)', diff saved to https://phabricator.wikimedia.org/P36308 and previous config saved to /var/cache/conftool/dbconfig/20221025-150653-ladsgroup.json
  • 15:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 15:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T321312)', diff saved to https://phabricator.wikimedia.org/P36307 and previous config saved to /var/cache/conftool/dbconfig/20221025-150626-ladsgroup.json
  • 14:54 sukhe: running authdns-update for depooling ulsfo: Gerrit 849105
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P36306 and previous config saved to /var/cache/conftool/dbconfig/20221025-145120-ladsgroup.json
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T321312)', diff saved to https://phabricator.wikimedia.org/P36305 and previous config saved to /var/cache/conftool/dbconfig/20221025-144651-ladsgroup.json
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P36304 and previous config saved to /var/cache/conftool/dbconfig/20221025-143613-ladsgroup.json
  • 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P36303 and previous config saved to /var/cache/conftool/dbconfig/20221025-143144-ladsgroup.json
  • 14:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb1016.eqiad.wmnet with reason: db1154 having hw issues
  • 14:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb1016.eqiad.wmnet with reason: db1154 having hw issues
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T321312)', diff saved to https://phabricator.wikimedia.org/P36301 and previous config saved to /var/cache/conftool/dbconfig/20221025-142106-ladsgroup.json
  • 14:18 hashar: Restarting CI Jenkins
  • 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P36300 and previous config saved to /var/cache/conftool/dbconfig/20221025-141638-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T321312)', diff saved to https://phabricator.wikimedia.org/P36299 and previous config saved to /var/cache/conftool/dbconfig/20221025-141440-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T321312)', diff saved to https://phabricator.wikimedia.org/P36298 and previous config saved to /var/cache/conftool/dbconfig/20221025-141358-ladsgroup.json
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T321312)', diff saved to https://phabricator.wikimedia.org/P36297 and previous config saved to /var/cache/conftool/dbconfig/20221025-140131-ladsgroup.json
  • 13:59 XioNoX: test bouncing VC port on asw2-d-eqiad
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P36296 and previous config saved to /var/cache/conftool/dbconfig/20221025-135852-ladsgroup.json
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T321312)', diff saved to https://phabricator.wikimedia.org/P36295 and previous config saved to /var/cache/conftool/dbconfig/20221025-135515-ladsgroup.json
  • 13:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 13:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T321312)', diff saved to https://phabricator.wikimedia.org/P36294 and previous config saved to /var/cache/conftool/dbconfig/20221025-135451-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1020-1021].eqiad.wmnet with reason: db1154 having hw issues
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1020-1021].eqiad.wmnet with reason: db1154 having hw issues
  • 13:53 jgiannelos@deploy1002: Finished deploy [restbase/deploy@5575605]: Update restbase to c1d391c7 (duration: 18m 14s)
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P36293 and previous config saved to /var/cache/conftool/dbconfig/20221025-134345-ladsgroup.json
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P36292 and previous config saved to /var/cache/conftool/dbconfig/20221025-133944-ladsgroup.json
  • 13:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Maintenance
  • 13:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Maintenance
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain
  • 13:35 jgiannelos@deploy1002: Started deploy [restbase/deploy@5575605]: Update restbase to c1d391c7
  • 13:34 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain
  • 13:34 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:33 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Move wmgSiteLogoVariants to logos.php (T308620 T321519) (duration: 05m 47s)
  • 13:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 13:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T321312)', diff saved to https://phabricator.wikimedia.org/P36291 and previous config saved to /var/cache/conftool/dbconfig/20221025-132839-ladsgroup.json
  • 13:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:27 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for Move wmgSiteLogoVariants to logos.php (T308620 T321519) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:27 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Move wmgSiteLogoVariants to logos.php (T308620 T321519)
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P36290 and previous config saved to /var/cache/conftool/dbconfig/20221025-132438-ladsgroup.json
  • 13:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T321312)', diff saved to https://phabricator.wikimedia.org/P36287 and previous config saved to /var/cache/conftool/dbconfig/20221025-130931-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P36286 and previous config saved to /var/cache/conftool/dbconfig/20221025-130628-ladsgroup.json
  • 13:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T321312)', diff saved to https://phabricator.wikimedia.org/P36285 and previous config saved to /var/cache/conftool/dbconfig/20221025-130314-ladsgroup.json
  • 13:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T321312)', diff saved to https://phabricator.wikimedia.org/P36284 and previous config saved to /var/cache/conftool/dbconfig/20221025-130249-ladsgroup.json
  • 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1023.eqiad.wmnet with reason: host reimage
  • 12:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1023.eqiad.wmnet with reason: host reimage
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P36283 and previous config saved to /var/cache/conftool/dbconfig/20221025-125122-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P36281 and previous config saved to /var/cache/conftool/dbconfig/20221025-124743-ladsgroup.json
  • 12:39 moritzm: drain ganeti1015 for eventual reimage T311687
  • 12:38 hashar: Restarting CI Jenkins
  • 12:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1023.eqiad.wmnet with OS bullseye
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T321312)', diff saved to https://phabricator.wikimedia.org/P36280 and previous config saved to /var/cache/conftool/dbconfig/20221025-123615-ladsgroup.json
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1023.eqiad.wmnet with reason: Remove from cluster for eventual reimage
  • 12:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1023.eqiad.wmnet with reason: Remove from cluster for eventual reimage
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P36279 and previous config saved to /var/cache/conftool/dbconfig/20221025-123236-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T321312)', diff saved to https://phabricator.wikimedia.org/P36278 and previous config saved to /var/cache/conftool/dbconfig/20221025-123001-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 12:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2519
  • 12:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2519
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T321312)', diff saved to https://phabricator.wikimedia.org/P36277 and previous config saved to /var/cache/conftool/dbconfig/20221025-122015-ladsgroup.json
  • 12:19 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:19 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T321312)', diff saved to https://phabricator.wikimedia.org/P36276 and previous config saved to /var/cache/conftool/dbconfig/20221025-121730-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T321312)', diff saved to https://phabricator.wikimedia.org/P36275 and previous config saved to /var/cache/conftool/dbconfig/20221025-121111-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T321312)', diff saved to https://phabricator.wikimedia.org/P36274 and previous config saved to /var/cache/conftool/dbconfig/20221025-121047-ladsgroup.json
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti4002.ulsfo.wmnet
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P36273 and previous config saved to /var/cache/conftool/dbconfig/20221025-120509-ladsgroup.json
  • 12:00 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 132203
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P36272 and previous config saved to /var/cache/conftool/dbconfig/20221025-115540-ladsgroup.json
  • 11:55 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4002.ulsfo.wmnet
  • 11:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 132203
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P36271 and previous config saved to /var/cache/conftool/dbconfig/20221025-115002-ladsgroup.json
  • 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti4002.ulsfo.wmnet with reason: Remove from cluster for eventual decom
  • 11:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti4002.ulsfo.wmnet with reason: Remove from cluster for eventual decom
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P36270 and previous config saved to /var/cache/conftool/dbconfig/20221025-114034-ladsgroup.json
  • 11:38 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2002.wikimedia.org
  • 11:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2002.wikimedia.org
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T321312)', diff saved to https://phabricator.wikimedia.org/P36269 and previous config saved to /var/cache/conftool/dbconfig/20221025-113455-ladsgroup.json
  • 11:34 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp2002.wikimedia.org
  • 11:33 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test2002.wikimedia.org
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T321312)', diff saved to https://phabricator.wikimedia.org/P36268 and previous config saved to /var/cache/conftool/dbconfig/20221025-112848-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 11:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36267 and previous config saved to /var/cache/conftool/dbconfig/20221025-112822-ladsgroup.json
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T321312)', diff saved to https://phabricator.wikimedia.org/P36266 and previous config saved to /var/cache/conftool/dbconfig/20221025-112527-ladsgroup.json
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T321312)', diff saved to https://phabricator.wikimedia.org/P36265 and previous config saved to /var/cache/conftool/dbconfig/20221025-111930-ladsgroup.json
  • 11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T321312)', diff saved to https://phabricator.wikimedia.org/P36264 and previous config saved to /var/cache/conftool/dbconfig/20221025-111906-ladsgroup.json
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P36263 and previous config saved to /var/cache/conftool/dbconfig/20221025-111316-ladsgroup.json
  • 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P36262 and previous config saved to /var/cache/conftool/dbconfig/20221025-110359-ladsgroup.json
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P36261 and previous config saved to /var/cache/conftool/dbconfig/20221025-105810-ladsgroup.json
  • 10:53 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P36260 and previous config saved to /var/cache/conftool/dbconfig/20221025-104852-ladsgroup.json
  • 10:43 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36259 and previous config saved to /var/cache/conftool/dbconfig/20221025-104303-ladsgroup.json
  • 10:41 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
  • 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36258 and previous config saved to /var/cache/conftool/dbconfig/20221025-104047-ladsgroup.json
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T321312)', diff saved to https://phabricator.wikimedia.org/P36257 and previous config saved to /var/cache/conftool/dbconfig/20221025-103346-ladsgroup.json
  • 10:32 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
  • 10:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T321312)', diff saved to https://phabricator.wikimedia.org/P36256 and previous config saved to /var/cache/conftool/dbconfig/20221025-102724-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T321312)', diff saved to https://phabricator.wikimedia.org/P36255 and previous config saved to /var/cache/conftool/dbconfig/20221025-102642-ladsgroup.json
  • 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P36254 and previous config saved to /var/cache/conftool/dbconfig/20221025-102540-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P36253 and previous config saved to /var/cache/conftool/dbconfig/20221025-101135-ladsgroup.json
  • 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P36252 and previous config saved to /var/cache/conftool/dbconfig/20221025-101034-ladsgroup.json
  • 09:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P36251 and previous config saved to /var/cache/conftool/dbconfig/20221025-095629-ladsgroup.json
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36250 and previous config saved to /var/cache/conftool/dbconfig/20221025-095527-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36249 and previous config saved to /var/cache/conftool/dbconfig/20221025-094921-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36248 and previous config saved to /var/cache/conftool/dbconfig/20221025-094800-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T321312)', diff saved to https://phabricator.wikimedia.org/P36247 and previous config saved to /var/cache/conftool/dbconfig/20221025-094733-ladsgroup.json
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T321312)', diff saved to https://phabricator.wikimedia.org/P36246 and previous config saved to /var/cache/conftool/dbconfig/20221025-094122-ladsgroup.json
  • 09:36 moritzm: drain ganeti4002 for eventual decom T317247
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T321312)', diff saved to https://phabricator.wikimedia.org/P36245 and previous config saved to /var/cache/conftool/dbconfig/20221025-093513-ladsgroup.json
  • 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 09:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36244 and previous config saved to /var/cache/conftool/dbconfig/20221025-093449-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P36243 and previous config saved to /var/cache/conftool/dbconfig/20221025-093226-ladsgroup.json
  • 09:31 vgutierrez: restart pybal on lvs5003
  • 09:27 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
  • 09:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P36241 and previous config saved to /var/cache/conftool/dbconfig/20221025-091942-ladsgroup.json
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P36240 and previous config saved to /var/cache/conftool/dbconfig/20221025-091720-ladsgroup.json
  • 09:14 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
  • 09:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
  • 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P36239 and previous config saved to /var/cache/conftool/dbconfig/20221025-090436-ladsgroup.json
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T321312)', diff saved to https://phabricator.wikimedia.org/P36238 and previous config saved to /var/cache/conftool/dbconfig/20221025-090213-ladsgroup.json
  • 08:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
  • 08:57 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T321312)', diff saved to https://phabricator.wikimedia.org/P36237 and previous config saved to /var/cache/conftool/dbconfig/20221025-085558-ladsgroup.json
  • 08:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 08:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T321312)', diff saved to https://phabricator.wikimedia.org/P36236 and previous config saved to /var/cache/conftool/dbconfig/20221025-085527-ladsgroup.json
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36235 and previous config saved to /var/cache/conftool/dbconfig/20221025-084929-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36234 and previous config saved to /var/cache/conftool/dbconfig/20221025-084713-ladsgroup.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36233 and previous config saved to /var/cache/conftool/dbconfig/20221025-084541-root.json
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P36232 and previous config saved to /var/cache/conftool/dbconfig/20221025-084020-ladsgroup.json
  • 08:36 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P36230 and previous config saved to /var/cache/conftool/dbconfig/20221025-083206-ladsgroup.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36229 and previous config saved to /var/cache/conftool/dbconfig/20221025-083034-root.json
  • 08:29 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be1065.eqiad.wmnet
  • 08:26 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4005.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:26 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4005.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P36228 and previous config saved to /var/cache/conftool/dbconfig/20221025-082514-ladsgroup.json
  • 08:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
  • 08:19 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
  • 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
  • 08:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P36227 and previous config saved to /var/cache/conftool/dbconfig/20221025-081700-ladsgroup.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36226 and previous config saved to /var/cache/conftool/dbconfig/20221025-081529-root.json
  • 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:12 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.7 refs T320512
  • 08:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
  • 08:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T321312)', diff saved to https://phabricator.wikimedia.org/P36225 and previous config saved to /var/cache/conftool/dbconfig/20221025-081007-ladsgroup.json
  • 08:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T321312)', diff saved to https://phabricator.wikimedia.org/P36224 and previous config saved to /var/cache/conftool/dbconfig/20221025-080238-ladsgroup.json
  • 08:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 08:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 08:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T321312)', diff saved to https://phabricator.wikimedia.org/P36223 and previous config saved to /var/cache/conftool/dbconfig/20221025-080212-ladsgroup.json
  • 08:02 moritzm: drain ganeti1023 for eventual reimage T311687
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36222 and previous config saved to /var/cache/conftool/dbconfig/20221025-080153-ladsgroup.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36221 and previous config saved to /var/cache/conftool/dbconfig/20221025-080024-root.json
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36220 and previous config saved to /var/cache/conftool/dbconfig/20221025-075657-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36219 and previous config saved to /var/cache/conftool/dbconfig/20221025-075638-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36218 and previous config saved to /var/cache/conftool/dbconfig/20221025-075613-ladsgroup.json
  • 07:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P36217 and previous config saved to /var/cache/conftool/dbconfig/20221025-074705-ladsgroup.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36216 and previous config saved to /var/cache/conftool/dbconfig/20221025-074519-root.json
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P36215 and previous config saved to /var/cache/conftool/dbconfig/20221025-074106-ladsgroup.json
  • 07:38 moritzm: installing 5.10.149-2 update on bullseye hosts (regression doesn't concern any of our servers, but still makes sense to have further reboots move to the latest kernel)
  • 07:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P36213 and previous config saved to /var/cache/conftool/dbconfig/20221025-073159-ladsgroup.json
  • 07:31 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4005.ulsfo.wmnet to cluster ulsfo and group 1
  • 07:31 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4005.ulsfo.wmnet to cluster ulsfo and group 1
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36212 and previous config saved to /var/cache/conftool/dbconfig/20221025-073014-root.json
  • 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P36211 and previous config saved to /var/cache/conftool/dbconfig/20221025-072600-ladsgroup.json
  • 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
  • 07:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T321312)', diff saved to https://phabricator.wikimedia.org/P36210 and previous config saved to /var/cache/conftool/dbconfig/20221025-071652-ladsgroup.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36209 and previous config saved to /var/cache/conftool/dbconfig/20221025-071509-root.json
  • 07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36208 and previous config saved to /var/cache/conftool/dbconfig/20221025-071053-ladsgroup.json
  • 07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T321312)', diff saved to https://phabricator.wikimedia.org/P36207 and previous config saved to /var/cache/conftool/dbconfig/20221025-070922-ladsgroup.json
  • 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T321312)', diff saved to https://phabricator.wikimedia.org/P36206 and previous config saved to /var/cache/conftool/dbconfig/20221025-070856-ladsgroup.json
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36205 and previous config saved to /var/cache/conftool/dbconfig/20221025-070837-ladsgroup.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36204 and previous config saved to /var/cache/conftool/dbconfig/20221025-070004-root.json
  • 06:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:57 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P36203 and previous config saved to /var/cache/conftool/dbconfig/20221025-065350-ladsgroup.json
  • 06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P36202 and previous config saved to /var/cache/conftool/dbconfig/20221025-065330-ladsgroup.json
  • 06:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P36201 and previous config saved to /var/cache/conftool/dbconfig/20221025-063843-ladsgroup.json
  • 06:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 7795
  • 06:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P36200 and previous config saved to /var/cache/conftool/dbconfig/20221025-063824-ladsgroup.json
  • 06:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 7795
  • 06:34 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bruno Scarone out of all services on: 1206 hosts
  • 06:33 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bruno Scarone out of all services on: 1206 hosts
  • 06:33 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bruno Scarone out of all services on: 799 hosts
  • 06:32 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bruno Scarone out of all services on: 799 hosts
  • 06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T321312)', diff saved to https://phabricator.wikimedia.org/P36199 and previous config saved to /var/cache/conftool/dbconfig/20221025-062337-ladsgroup.json
  • 06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36198 and previous config saved to /var/cache/conftool/dbconfig/20221025-062318-ladsgroup.json
  • 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36196 and previous config saved to /var/cache/conftool/dbconfig/20221025-061710-ladsgroup.json
  • 06:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T321312)', diff saved to https://phabricator.wikimedia.org/P36195 and previous config saved to /var/cache/conftool/dbconfig/20221025-061621-ladsgroup.json
  • 06:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 06:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36194 and previous config saved to /var/cache/conftool/dbconfig/20221025-061552-ladsgroup.json
  • 06:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 06:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 06:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1160 T321177', diff saved to https://phabricator.wikimedia.org/P36193 and previous config saved to /var/cache/conftool/dbconfig/20221025-060643-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 primary and set section read-write T321177', diff saved to https://phabricator.wikimedia.org/P36192 and previous config saved to /var/cache/conftool/dbconfig/20221025-060118-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T321177', diff saved to https://phabricator.wikimedia.org/P36191 and previous config saved to /var/cache/conftool/dbconfig/20221025-060043-ladsgroup.json
  • 06:00 Amir1: Starting s4 eqiad failover from db1160 to db1138 - T321177
  • 05:56 _joe_: restarting pybal again on lvs1020, again for testing
  • 05:25 _joe_: restarting pybal on lvs1020 to test cookbook mechanism
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1138 with weight 0 T321177', diff saved to https://phabricator.wikimedia.org/P36190 and previous config saved to /var/cache/conftool/dbconfig/20221025-051933-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T321177
  • 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T321177
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T321312)', diff saved to https://phabricator.wikimedia.org/P36189 and previous config saved to /var/cache/conftool/dbconfig/20221025-041558-ladsgroup.json
  • 04:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P36188 and previous config saved to /var/cache/conftool/dbconfig/20221025-040052-ladsgroup.json
  • 03:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P36187 and previous config saved to /var/cache/conftool/dbconfig/20221025-034546-ladsgroup.json
  • 03:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.5 (duration: 01m 56s)
  • 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.7 refs T320512 (duration: 35m 54s)
  • 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T321312)', diff saved to https://phabricator.wikimedia.org/P36186 and previous config saved to /var/cache/conftool/dbconfig/20221025-033039-ladsgroup.json
  • 03:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T321312)', diff saved to https://phabricator.wikimedia.org/P36185 and previous config saved to /var/cache/conftool/dbconfig/20221025-032316-ladsgroup.json
  • 03:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 03:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 03:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T321312)', diff saved to https://phabricator.wikimedia.org/P36184 and previous config saved to /var/cache/conftool/dbconfig/20221025-032252-ladsgroup.json
  • 03:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P36183 and previous config saved to /var/cache/conftool/dbconfig/20221025-030745-ladsgroup.json
  • 03:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.7 refs T320512
  • 02:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P36182 and previous config saved to /var/cache/conftool/dbconfig/20221025-025239-ladsgroup.json
  • 02:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T321312)', diff saved to https://phabricator.wikimedia.org/P36181 and previous config saved to /var/cache/conftool/dbconfig/20221025-024709-ladsgroup.json
  • 02:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T321312)', diff saved to https://phabricator.wikimedia.org/P36180 and previous config saved to /var/cache/conftool/dbconfig/20221025-023733-ladsgroup.json
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P36179 and previous config saved to /var/cache/conftool/dbconfig/20221025-023203-ladsgroup.json
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T321312)', diff saved to https://phabricator.wikimedia.org/P36178 and previous config saved to /var/cache/conftool/dbconfig/20221025-023120-ladsgroup.json
  • 02:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 02:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T321312)', diff saved to https://phabricator.wikimedia.org/P36177 and previous config saved to /var/cache/conftool/dbconfig/20221025-023056-ladsgroup.json
  • 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P36176 and previous config saved to /var/cache/conftool/dbconfig/20221025-021656-ladsgroup.json
  • 02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P36175 and previous config saved to /var/cache/conftool/dbconfig/20221025-021550-ladsgroup.json
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T321312)', diff saved to https://phabricator.wikimedia.org/P36174 and previous config saved to /var/cache/conftool/dbconfig/20221025-020150-ladsgroup.json
  • 02:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P36173 and previous config saved to /var/cache/conftool/dbconfig/20221025-020043-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T321312)', diff saved to https://phabricator.wikimedia.org/P36172 and previous config saved to /var/cache/conftool/dbconfig/20221025-015528-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36171 and previous config saved to /var/cache/conftool/dbconfig/20221025-015502-ladsgroup.json
  • 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T321312)', diff saved to https://phabricator.wikimedia.org/P36170 and previous config saved to /var/cache/conftool/dbconfig/20221025-014536-ladsgroup.json
  • 01:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P36169 and previous config saved to /var/cache/conftool/dbconfig/20221025-013956-ladsgroup.json
  • 01:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T321312)', diff saved to https://phabricator.wikimedia.org/P36168 and previous config saved to /var/cache/conftool/dbconfig/20221025-013917-ladsgroup.json
  • 01:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 01:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 01:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36167 and previous config saved to /var/cache/conftool/dbconfig/20221025-013852-ladsgroup.json
  • 01:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P36166 and previous config saved to /var/cache/conftool/dbconfig/20221025-012449-ladsgroup.json
  • 01:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P36165 and previous config saved to /var/cache/conftool/dbconfig/20221025-012345-ladsgroup.json
  • 01:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36164 and previous config saved to /var/cache/conftool/dbconfig/20221025-010943-ladsgroup.json
  • 01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P36163 and previous config saved to /var/cache/conftool/dbconfig/20221025-010839-ladsgroup.json
  • 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36162 and previous config saved to /var/cache/conftool/dbconfig/20221025-010225-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36161 and previous config saved to /var/cache/conftool/dbconfig/20221025-005332-ladsgroup.json
  • 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P36160 and previous config saved to /var/cache/conftool/dbconfig/20221025-004817-ladsgroup.json
  • 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P36159 and previous config saved to /var/cache/conftool/dbconfig/20221025-004718-ladsgroup.json
  • 00:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P36158 and previous config saved to /var/cache/conftool/dbconfig/20221025-003310-ladsgroup.json
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P36157 and previous config saved to /var/cache/conftool/dbconfig/20221025-003211-ladsgroup.json
  • 00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P36151 and previous config saved to /var/cache/conftool/dbconfig/20221025-000257-ladsgroup.json

2022-10-24

  • 23:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36150 and previous config saved to /var/cache/conftool/dbconfig/20221024-235804-ladsgroup.json
  • 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P36149 and previous config saved to /var/cache/conftool/dbconfig/20221024-235645-ladsgroup.json
  • 23:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T321312)', diff saved to https://phabricator.wikimedia.org/P36148 and previous config saved to /var/cache/conftool/dbconfig/20221024-235618-ladsgroup.json
  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P36147 and previous config saved to /var/cache/conftool/dbconfig/20221024-235357-ladsgroup.json
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P36146 and previous config saved to /var/cache/conftool/dbconfig/20221024-234111-ladsgroup.json
  • 23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P36145 and previous config saved to /var/cache/conftool/dbconfig/20221024-233849-ladsgroup.json
  • 23:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P36144 and previous config saved to /var/cache/conftool/dbconfig/20221024-232604-ladsgroup.json
  • 23:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T321312)', diff saved to https://phabricator.wikimedia.org/P36143 and previous config saved to /var/cache/conftool/dbconfig/20221024-232343-ladsgroup.json
  • 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T321312)', diff saved to https://phabricator.wikimedia.org/P36142 and previous config saved to /var/cache/conftool/dbconfig/20221024-231721-ladsgroup.json
  • 23:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T321312)', diff saved to https://phabricator.wikimedia.org/P36141 and previous config saved to /var/cache/conftool/dbconfig/20221024-231629-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T321312)', diff saved to https://phabricator.wikimedia.org/P36140 and previous config saved to /var/cache/conftool/dbconfig/20221024-231058-ladsgroup.json
  • 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T321312)', diff saved to https://phabricator.wikimedia.org/P36139 and previous config saved to /var/cache/conftool/dbconfig/20221024-230446-ladsgroup.json
  • 23:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 23:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 23:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 23:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T321312)', diff saved to https://phabricator.wikimedia.org/P36138 and previous config saved to /var/cache/conftool/dbconfig/20221024-230405-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P36137 and previous config saved to /var/cache/conftool/dbconfig/20221024-230122-ladsgroup.json
  • 23:00 TimStarling: on mwmaint1002 running renameInvalidUsernames.php for T292552
  • 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P36136 and previous config saved to /var/cache/conftool/dbconfig/20221024-224858-ladsgroup.json
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P36135 and previous config saved to /var/cache/conftool/dbconfig/20221024-224616-ladsgroup.json
  • 22:35 cstone: civicrm upgraded from 89a46665 to 4cb2d91e
  • 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P36134 and previous config saved to /var/cache/conftool/dbconfig/20221024-223352-ladsgroup.json
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T321312)', diff saved to https://phabricator.wikimedia.org/P36133 and previous config saved to /var/cache/conftool/dbconfig/20221024-223109-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T321312)', diff saved to https://phabricator.wikimedia.org/P36131 and previous config saved to /var/cache/conftool/dbconfig/20221024-222444-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 22:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T321312)', diff saved to https://phabricator.wikimedia.org/P36130 and previous config saved to /var/cache/conftool/dbconfig/20221024-222418-ladsgroup.json
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T321312)', diff saved to https://phabricator.wikimedia.org/P36129 and previous config saved to /var/cache/conftool/dbconfig/20221024-221845-ladsgroup.json
  • 22:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T321312)', diff saved to https://phabricator.wikimedia.org/P36128 and previous config saved to /var/cache/conftool/dbconfig/20221024-221227-ladsgroup.json
  • 22:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 22:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 22:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T321312)', diff saved to https://phabricator.wikimedia.org/P36127 and previous config saved to /var/cache/conftool/dbconfig/20221024-221203-ladsgroup.json
  • 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P36126 and previous config saved to /var/cache/conftool/dbconfig/20221024-220912-ladsgroup.json
  • 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P36125 and previous config saved to /var/cache/conftool/dbconfig/20221024-215657-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P36124 and previous config saved to /var/cache/conftool/dbconfig/20221024-215405-ladsgroup.json
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P36123 and previous config saved to /var/cache/conftool/dbconfig/20221024-214150-ladsgroup.json
  • 21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T321312)', diff saved to https://phabricator.wikimedia.org/P36122 and previous config saved to /var/cache/conftool/dbconfig/20221024-213859-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T321312)', diff saved to https://phabricator.wikimedia.org/P36121 and previous config saved to /var/cache/conftool/dbconfig/20221024-213032-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T321312)', diff saved to https://phabricator.wikimedia.org/P36120 and previous config saved to /var/cache/conftool/dbconfig/20221024-213006-ladsgroup.json
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T321312)', diff saved to https://phabricator.wikimedia.org/P36119 and previous config saved to /var/cache/conftool/dbconfig/20221024-212644-ladsgroup.json
  • 21:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1001.wikimedia.org
  • 21:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1003.wikimedia.org
  • 21:06 volans: uploaded python3-gjson_0.2.0 to apt.wikimedia.org bullseye-wikimedia
  • 21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P36115 and previous config saved to /var/cache/conftool/dbconfig/20221024-210508-ladsgroup.json
  • 21:04 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1007.wikimedia.org
  • 21:02 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
  • 21:01 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P36114 and previous config saved to /var/cache/conftool/dbconfig/20221024-205953-ladsgroup.json
  • 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P36113 and previous config saved to /var/cache/conftool/dbconfig/20221024-205002-ladsgroup.json
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
  • 20:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:48 urbanecm: UTC late B&C window completed
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:45 urbanecm@deploy1002: Finished scap: Backport for logos: Automate icon generation (T319223) (duration: 08m 49s)
  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T321312)', diff saved to https://phabricator.wikimedia.org/P36112 and previous config saved to /var/cache/conftool/dbconfig/20221024-204446-ladsgroup.json
  • 20:41 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T321312)', diff saved to https://phabricator.wikimedia.org/P36111 and previous config saved to /var/cache/conftool/dbconfig/20221024-203713-ladsgroup.json
  • 20:37 urbanecm@deploy1002: urbanecm and stang: Backport for logos: Automate icon generation (T319223) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 20:36 urbanecm@deploy1002: Started scap: Backport for logos: Automate icon generation (T319223)
  • 20:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T321312)', diff saved to https://phabricator.wikimedia.org/P36110 and previous config saved to /var/cache/conftool/dbconfig/20221024-203647-ladsgroup.json
  • 20:36 urbanecm@deploy1002: Finished scap: Backport for Add wmgSiteLogoVariants support to Chinese projects (T308620) (duration: 07m 02s)
  • 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36108 and previous config saved to /var/cache/conftool/dbconfig/20221024-203455-ladsgroup.json
  • 20:34 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
  • 20:29 urbanecm@deploy1002: urbanecm and stang: Backport for Add wmgSiteLogoVariants support to Chinese projects (T308620) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:29 urbanecm@deploy1002: Started scap: Backport for Add wmgSiteLogoVariants support to Chinese projects (T308620)
  • 20:28 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
  • 20:28 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
  • 20:27 urbanecm@deploy1002: Finished scap: Backport for Allow 'nofollow' on external links in Parsoid output (T321437), Retry without RESTBase when the page/revision seems to be missing (T315688) (duration: 06m 38s)
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36107 and previous config saved to /var/cache/conftool/dbconfig/20221024-202738-ladsgroup.json
  • 20:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:25 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudbackup1002-dev.eqiad.wmnet
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P36106 and previous config saved to /var/cache/conftool/dbconfig/20221024-202141-ladsgroup.json
  • 20:21 urbanecm@deploy1002: urbanecm and matmarex: Backport for Allow 'nofollow' on external links in Parsoid output (T321437), Retry without RESTBase when the page/revision seems to be missing (T315688) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:21 urbanecm@deploy1002: Started scap: Backport for Allow 'nofollow' on external links in Parsoid output (T321437), Retry without RESTBase when the page/revision seems to be missing (T315688)
  • 20:17 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
  • 20:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup1002-dev.eqiad.wmnet
  • 20:14 urbanecm@deploy1002: Finished scap: Backport for Promote several Wikipedias to desktop improvements group (T319012) (duration: 05m 53s)
  • 20:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1001-dev.eqiad.wmnet
  • 20:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P36105 and previous config saved to /var/cache/conftool/dbconfig/20221024-201232-ladsgroup.json
  • 20:09 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup1001-dev.eqiad.wmnet
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:09 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Promote several Wikipedias to desktop improvements group (T319012) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:08 urbanecm@deploy1002: Started scap: Backport for Promote several Wikipedias to desktop improvements group (T319012)
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:08 urbanecm@deploy1002: Finished scap: Backport for Unset some bad logos (duration: 06m 07s)
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P36104 and previous config saved to /var/cache/conftool/dbconfig/20221024-200634-ladsgroup.json
  • 20:02 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Unset some bad logos synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:01 urbanecm@deploy1002: Started scap: Backport for Unset some bad logos
  • 19:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P36103 and previous config saved to /var/cache/conftool/dbconfig/20221024-195725-ladsgroup.json
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T321312)', diff saved to https://phabricator.wikimedia.org/P36102 and previous config saved to /var/cache/conftool/dbconfig/20221024-195128-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T321312)', diff saved to https://phabricator.wikimedia.org/P36101 and previous config saved to /var/cache/conftool/dbconfig/20221024-194452-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T321312)', diff saved to https://phabricator.wikimedia.org/P36100 and previous config saved to /var/cache/conftool/dbconfig/20221024-194416-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36099 and previous config saved to /var/cache/conftool/dbconfig/20221024-194219-ladsgroup.json
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36098 and previous config saved to /var/cache/conftool/dbconfig/20221024-193610-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36097 and previous config saved to /var/cache/conftool/dbconfig/20221024-193447-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P36096 and previous config saved to /var/cache/conftool/dbconfig/20221024-192909-ladsgroup.json
  • 19:27 ladsgroup@deploy1002: Finished scap: Backport for Add 'class' to LBFactory callback config (duration: 05m 20s)
  • 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321312)', diff saved to https://phabricator.wikimedia.org/P36095 and previous config saved to /var/cache/conftool/dbconfig/20221024-192251-ladsgroup.json
  • 19:22 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Add 'class' to LBFactory callback config synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 19:22 ladsgroup@deploy1002: Started scap: Backport for Add 'class' to LBFactory callback config
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P36094 and previous config saved to /var/cache/conftool/dbconfig/20221024-191403-ladsgroup.json
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P36093 and previous config saved to /var/cache/conftool/dbconfig/20221024-190745-ladsgroup.json
  • 19:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:02 mforns@deploy1002: Finished deploy [analytics/refinery@d3b7785] (thin): Regular analytics weekly train THIN [analytics/refinery@d3b7785] (duration: 00m 07s)
  • 19:02 mforns@deploy1002: Started deploy [analytics/refinery@d3b7785] (thin): Regular analytics weekly train THIN [analytics/refinery@d3b7785]
  • 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:00 ladsgroup@deploy1002: Finished scap: Backport for Avoid using DBLoadBalancerFactoryConfigBuilder mw service (T298485) (duration: 06m 55s)
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T321312)', diff saved to https://phabricator.wikimedia.org/P36092 and previous config saved to /var/cache/conftool/dbconfig/20221024-185856-ladsgroup.json
  • 18:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:53 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Avoid using DBLoadBalancerFactoryConfigBuilder mw service (T298485) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 18:53 ladsgroup@deploy1002: Started scap: Backport for Avoid using DBLoadBalancerFactoryConfigBuilder mw service (T298485)
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P36091 and previous config saved to /var/cache/conftool/dbconfig/20221024-185238-ladsgroup.json
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T321312)', diff saved to https://phabricator.wikimedia.org/P36090 and previous config saved to /var/cache/conftool/dbconfig/20221024-185230-ladsgroup.json
  • 18:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 18:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db1112', diff saved to https://phabricator.wikimedia.org/P36089 and previous config saved to /var/cache/conftool/dbconfig/20221024-184359-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P36088 and previous config saved to /var/cache/conftool/dbconfig/20221024-184239-ladsgroup.json
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321312)', diff saved to https://phabricator.wikimedia.org/P36087 and previous config saved to /var/cache/conftool/dbconfig/20221024-183732-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T321312)', diff saved to https://phabricator.wikimedia.org/P36086 and previous config saved to /var/cache/conftool/dbconfig/20221024-183015-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 18:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321312)', diff saved to https://phabricator.wikimedia.org/P36085 and previous config saved to /var/cache/conftool/dbconfig/20221024-182951-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P36084 and previous config saved to /var/cache/conftool/dbconfig/20221024-181444-ladsgroup.json
  • 17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321312)', diff saved to https://phabricator.wikimedia.org/P36082 and previous config saved to /var/cache/conftool/dbconfig/20221024-174431-ladsgroup.json
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T321312)', diff saved to https://phabricator.wikimedia.org/P36081 and previous config saved to /var/cache/conftool/dbconfig/20221024-173812-ladsgroup.json
  • 17:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321312)', diff saved to https://phabricator.wikimedia.org/P36080 and previous config saved to /var/cache/conftool/dbconfig/20221024-173748-ladsgroup.json
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P36079 and previous config saved to /var/cache/conftool/dbconfig/20221024-172242-ladsgroup.json
  • 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P36078 and previous config saved to /var/cache/conftool/dbconfig/20221024-170735-ladsgroup.json
  • 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321312)', diff saved to https://phabricator.wikimedia.org/P36077 and previous config saved to /var/cache/conftool/dbconfig/20221024-165229-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T321312)', diff saved to https://phabricator.wikimedia.org/P36076 and previous config saved to /var/cache/conftool/dbconfig/20221024-164510-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321312)', diff saved to https://phabricator.wikimedia.org/P36075 and previous config saved to /var/cache/conftool/dbconfig/20221024-164446-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P36074 and previous config saved to /var/cache/conftool/dbconfig/20221024-162939-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321312)', diff saved to https://phabricator.wikimedia.org/P36073 and previous config saved to /var/cache/conftool/dbconfig/20221024-162035-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P36072 and previous config saved to /var/cache/conftool/dbconfig/20221024-161432-ladsgroup.json
  • 16:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P36071 and previous config saved to /var/cache/conftool/dbconfig/20221024-160528-ladsgroup.json
  • 16:03 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2068.codfw.wmnet
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321312)', diff saved to https://phabricator.wikimedia.org/P36070 and previous config saved to /var/cache/conftool/dbconfig/20221024-155926-ladsgroup.json
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T321312)', diff saved to https://phabricator.wikimedia.org/P36069 and previous config saved to /var/cache/conftool/dbconfig/20221024-155313-ladsgroup.json
  • 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:52 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
  • 15:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
  • 15:51 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P36068 and previous config saved to /var/cache/conftool/dbconfig/20221024-155022-ladsgroup.json
  • 15:50 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
  • 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321312)', diff saved to https://phabricator.wikimedia.org/P36067 and previous config saved to /var/cache/conftool/dbconfig/20221024-154543-ladsgroup.json
  • 15:43 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 15:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
  • 15:42 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
  • 15:41 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321312)', diff saved to https://phabricator.wikimedia.org/P36066 and previous config saved to /var/cache/conftool/dbconfig/20221024-153515-ladsgroup.json
  • 15:34 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
  • 15:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
  • 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P36065 and previous config saved to /var/cache/conftool/dbconfig/20221024-153037-ladsgroup.json
  • 15:29 mforns@deploy1002: Finished deploy [airflow-dags/analytics@62b4181]: (no justification provided) (duration: 00m 11s)
  • 15:29 mforns@deploy1002: Started deploy [airflow-dags/analytics@62b4181]: (no justification provided)
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T321312)', diff saved to https://phabricator.wikimedia.org/P36064 and previous config saved to /var/cache/conftool/dbconfig/20221024-152856-ladsgroup.json
  • 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321312)', diff saved to https://phabricator.wikimedia.org/P36063 and previous config saved to /var/cache/conftool/dbconfig/20221024-152830-ladsgroup.json
  • 15:27 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
  • 15:26 XioNoX: drain eqiad-esams transport
  • 15:23 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
  • 15:23 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:22 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:19 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 15:18 mforns@deploy1002: Finished deploy [analytics/refinery@d3b7785] (thin): Regular analytics weekly train THIN [analytics/refinery@d3b7785] (duration: 00m 09s)
  • 15:18 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
  • 15:18 mforns@deploy1002: Started deploy [analytics/refinery@d3b7785] (thin): Regular analytics weekly train THIN [analytics/refinery@d3b7785]
  • 15:18 mforns@deploy1002: Finished deploy [analytics/refinery@d3b7785]: Regular analytics weekly train [analytics/refinery@d3b7785] (duration: 05m 34s)
  • 15:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P36062 and previous config saved to /var/cache/conftool/dbconfig/20221024-151530-ladsgroup.json
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P36061 and previous config saved to /var/cache/conftool/dbconfig/20221024-151324-ladsgroup.json
  • 15:12 mforns@deploy1002: Started deploy [analytics/refinery@d3b7785]: Regular analytics weekly train [analytics/refinery@d3b7785]
  • 15:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
  • 15:09 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
  • 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321312)', diff saved to https://phabricator.wikimedia.org/P36060 and previous config saved to /var/cache/conftool/dbconfig/20221024-150024-ladsgroup.json
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P36059 and previous config saved to /var/cache/conftool/dbconfig/20221024-145817-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T321312)', diff saved to https://phabricator.wikimedia.org/P36058 and previous config saved to /var/cache/conftool/dbconfig/20221024-145511-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321312)', diff saved to https://phabricator.wikimedia.org/P36057 and previous config saved to /var/cache/conftool/dbconfig/20221024-145436-ladsgroup.json
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321312)', diff saved to https://phabricator.wikimedia.org/P36056 and previous config saved to /var/cache/conftool/dbconfig/20221024-144311-ladsgroup.json
  • 14:42 XioNoX: drain NTT on cr1-eqiad
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P36055 and previous config saved to /var/cache/conftool/dbconfig/20221024-143930-ladsgroup.json
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T321312)', diff saved to https://phabricator.wikimedia.org/P36054 and previous config saved to /var/cache/conftool/dbconfig/20221024-143650-ladsgroup.json
  • 14:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 14:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321312)', diff saved to https://phabricator.wikimedia.org/P36053 and previous config saved to /var/cache/conftool/dbconfig/20221024-143625-ladsgroup.json
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P36052 and previous config saved to /var/cache/conftool/dbconfig/20221024-142423-ladsgroup.json
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P36051 and previous config saved to /var/cache/conftool/dbconfig/20221024-142118-ladsgroup.json
  • 14:16 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:15 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321312)', diff saved to https://phabricator.wikimedia.org/P36050 and previous config saved to /var/cache/conftool/dbconfig/20221024-140917-ladsgroup.json
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P36049 and previous config saved to /var/cache/conftool/dbconfig/20221024-140612-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T321312)', diff saved to https://phabricator.wikimedia.org/P36048 and previous config saved to /var/cache/conftool/dbconfig/20221024-140404-ladsgroup.json
  • 14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
  • 13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 13:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321312)', diff saved to https://phabricator.wikimedia.org/P36047 and previous config saved to /var/cache/conftool/dbconfig/20221024-135557-ladsgroup.json
  • 13:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:54 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:53 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for plwikimedia: Enable VisualEditor by default (T321308) (duration: 07m 25s)
  • 13:53 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
  • 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321312)', diff saved to https://phabricator.wikimedia.org/P36046 and previous config saved to /var/cache/conftool/dbconfig/20221024-135105-ladsgroup.json
  • 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:48 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
  • 13:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
  • 13:46 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for plwikimedia: Enable VisualEditor by default (T321308) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:46 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for plwikimedia: Enable VisualEditor by default (T321308)
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T321312)', diff saved to https://phabricator.wikimedia.org/P36045 and previous config saved to /var/cache/conftool/dbconfig/20221024-134437-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 13:44 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
  • 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P36044 and previous config saved to /var/cache/conftool/dbconfig/20221024-134356-ladsgroup.json
  • 13:42 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
  • 13:42 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:41 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P36043 and previous config saved to /var/cache/conftool/dbconfig/20221024-134050-ladsgroup.json
  • 13:20 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:16 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:16 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P36040 and previous config saved to /var/cache/conftool/dbconfig/20221024-131343-ladsgroup.json
  • 13:12 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321312)', diff saved to https://phabricator.wikimedia.org/P36039 and previous config saved to /var/cache/conftool/dbconfig/20221024-131037-ladsgroup.json
  • 13:07 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T321312)', diff saved to https://phabricator.wikimedia.org/P36038 and previous config saved to /var/cache/conftool/dbconfig/20221024-130413-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 13:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 13:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321312)', diff saved to https://phabricator.wikimedia.org/P36037 and previous config saved to /var/cache/conftool/dbconfig/20221024-130349-ladsgroup.json
  • 13:03 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
  • 13:01 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P36036 and previous config saved to /var/cache/conftool/dbconfig/20221024-125836-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P36035 and previous config saved to /var/cache/conftool/dbconfig/20221024-125420-ladsgroup.json
  • 12:52 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
  • 12:52 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
  • 12:49 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin1001"
  • 12:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P36034 and previous config saved to /var/cache/conftool/dbconfig/20221024-124842-ladsgroup.json
  • 12:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
  • 12:48 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
  • 12:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P36033 and previous config saved to /var/cache/conftool/dbconfig/20221024-123913-ladsgroup.json
  • 12:35 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
  • 12:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
  • 12:34 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
  • 12:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
  • 12:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P36032 and previous config saved to /var/cache/conftool/dbconfig/20221024-123336-ladsgroup.json
  • 12:25 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
  • 12:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P36031 and previous config saved to /var/cache/conftool/dbconfig/20221024-122407-ladsgroup.json
  • 12:23 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
  • 12:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321312)', diff saved to https://phabricator.wikimedia.org/P36030 and previous config saved to /var/cache/conftool/dbconfig/20221024-121829-ladsgroup.json
  • 12:15 dcausse: restarting blazegraph on wdqs1005, wdqs1006, wdqs1012 and wdqs1016 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 12:14 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 12:13 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
  • 12:13 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be1054.eqiad.wmnet
  • 12:13 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T321312)', diff saved to https://phabricator.wikimedia.org/P36029 and previous config saved to /var/cache/conftool/dbconfig/20221024-121058-ladsgroup.json
  • 12:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T321312)', diff saved to https://phabricator.wikimedia.org/P36028 and previous config saved to /var/cache/conftool/dbconfig/20221024-121034-ladsgroup.json
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P36027 and previous config saved to /var/cache/conftool/dbconfig/20221024-120900-ladsgroup.json
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P36025 and previous config saved to /var/cache/conftool/dbconfig/20221024-120153-ladsgroup.json
  • 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P36024 and previous config saved to /var/cache/conftool/dbconfig/20221024-120026-ladsgroup.json
  • 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321312)', diff saved to https://phabricator.wikimedia.org/P36023 and previous config saved to /var/cache/conftool/dbconfig/20221024-115959-ladsgroup.json
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P36022 and previous config saved to /var/cache/conftool/dbconfig/20221024-115528-ladsgroup.json
  • 11:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
  • 11:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P36021 and previous config saved to /var/cache/conftool/dbconfig/20221024-114452-ladsgroup.json
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P36020 and previous config saved to /var/cache/conftool/dbconfig/20221024-114022-ladsgroup.json
  • 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P36019 and previous config saved to /var/cache/conftool/dbconfig/20221024-112946-ladsgroup.json
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T321312)', diff saved to https://phabricator.wikimedia.org/P36018 and previous config saved to /var/cache/conftool/dbconfig/20221024-112515-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T321312)', diff saved to https://phabricator.wikimedia.org/P36017 and previous config saved to /var/cache/conftool/dbconfig/20221024-111849-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321312)', diff saved to https://phabricator.wikimedia.org/P36016 and previous config saved to /var/cache/conftool/dbconfig/20221024-111825-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db1189', diff saved to https://phabricator.wikimedia.org/P36015 and previous config saved to /var/cache/conftool/dbconfig/20221024-111813-ladsgroup.json
  • 11:17 ladsgroup@deploy1002: Finished scap: Backport for Enable LBFactory config callback in CLI (T298485) (duration: 08m 35s)
  • 11:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321312)', diff saved to https://phabricator.wikimedia.org/P36014 and previous config saved to /var/cache/conftool/dbconfig/20221024-111439-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P36013 and previous config saved to /var/cache/conftool/dbconfig/20221024-111121-ladsgroup.json
  • 11:09 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Enable LBFactory config callback in CLI (T298485) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 11:09 ladsgroup@deploy1002: Started scap: Backport for Enable LBFactory config callback in CLI (T298485)
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T321312)', diff saved to https://phabricator.wikimedia.org/P36012 and previous config saved to /var/cache/conftool/dbconfig/20221024-110822-ladsgroup.json
  • 11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321312)', diff saved to https://phabricator.wikimedia.org/P36011 and previous config saved to /var/cache/conftool/dbconfig/20221024-110756-ladsgroup.json
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P36010 and previous config saved to /var/cache/conftool/dbconfig/20221024-110318-ladsgroup.json
  • 10:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P36009 and previous config saved to /var/cache/conftool/dbconfig/20221024-105250-ladsgroup.json
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P36008 and previous config saved to /var/cache/conftool/dbconfig/20221024-104812-ladsgroup.json
  • 10:47 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
  • 10:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
  • 10:43 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
  • 10:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
  • 10:42 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
  • 10:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P36007 and previous config saved to /var/cache/conftool/dbconfig/20221024-103743-ladsgroup.json
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321312)', diff saved to https://phabricator.wikimedia.org/P36006 and previous config saved to /var/cache/conftool/dbconfig/20221024-103305-ladsgroup.json
  • 10:32 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
  • 10:32 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
  • 10:30 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
  • 10:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 (T321312)', diff saved to https://phabricator.wikimedia.org/P36005 and previous config saved to /var/cache/conftool/dbconfig/20221024-102636-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321312)', diff saved to https://phabricator.wikimedia.org/P36004 and previous config saved to /var/cache/conftool/dbconfig/20221024-102612-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321312)', diff saved to https://phabricator.wikimedia.org/P36003 and previous config saved to /var/cache/conftool/dbconfig/20221024-102237-ladsgroup.json
  • 10:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
  • 10:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
  • 10:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
  • 10:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
  • 10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T321312)', diff saved to https://phabricator.wikimedia.org/P36002 and previous config saved to /var/cache/conftool/dbconfig/20221024-101518-ladsgroup.json
  • 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321312)', diff saved to https://phabricator.wikimedia.org/P36001 and previous config saved to /var/cache/conftool/dbconfig/20221024-101453-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P36000 and previous config saved to /var/cache/conftool/dbconfig/20221024-101105-ladsgroup.json
  • 10:07 Emperor: upload wmf-beamer-style 0.3 to apt
  • 10:07 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:06 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:04 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes-staging,service=kubemaster
  • 10:01 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P35999 and previous config saved to /var/cache/conftool/dbconfig/20221024-095946-ladsgroup.json
  • 09:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
  • 09:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P35998 and previous config saved to /var/cache/conftool/dbconfig/20221024-095559-ladsgroup.json
  • 09:53 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P35997 and previous config saved to /var/cache/conftool/dbconfig/20221024-094440-ladsgroup.json
  • 09:43 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
  • 09:41 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=kubemaster,name=kubemaster2002.codfw.wmnet
  • 09:41 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
  • 09:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2002.codfw.wmnet
  • 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321312)', diff saved to https://phabricator.wikimedia.org/P35996 and previous config saved to /var/cache/conftool/dbconfig/20221024-094052-ladsgroup.json
  • 09:35 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
  • 09:35 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=kubemaster,name=kubemaster2002.codfw.wmnet
  • 09:34 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=kubemaster,name=kubemaster2001.codfw.wmnet
  • 09:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
  • 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321312)', diff saved to https://phabricator.wikimedia.org/P35995 and previous config saved to /var/cache/conftool/dbconfig/20221024-092933-ladsgroup.json
  • 09:27 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
  • 09:27 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=kubemaster,name=kubemaster2001.codfw.wmnet
  • 09:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2016.codfw.wmnet
  • 09:23 oblivian@deploy1002: Finished scap: Backport for Stop assigning the PHP_ENGINE cookie (T271736) (duration: 04m 59s)
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T321312)', diff saved to https://phabricator.wikimedia.org/P35994 and previous config saved to /var/cache/conftool/dbconfig/20221024-092310-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:20 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2016.codfw.wmnet
  • 09:19 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=kubesvc,name=kubernetes2016.codfw.wmnet
  • 09:19 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=kubesvc,name=kubernetes2015.codfw.wmnet
  • 09:18 oblivian@deploy1002: oblivian and oblivian: Backport for Stop assigning the PHP_ENGINE cookie (T271736) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2015.codfw.wmnet
  • 09:18 oblivian@deploy1002: Started scap: Backport for Stop assigning the PHP_ENGINE cookie (T271736)
  • 09:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321312)', diff saved to https://phabricator.wikimedia.org/P35993 and previous config saved to /var/cache/conftool/dbconfig/20221024-091801-ladsgroup.json
  • 09:17 claime: kubernetes2015:~$ sudo systemctl reset-failed ifup@ens13.service T273026
  • 09:15 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2051.codfw.wmnet
  • 09:12 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2015.codfw.wmnet
  • 09:11 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=kubesvc,name=kubernetes2015.codfw.wmnet
  • 09:10 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
  • 09:09 claime: Starting october reboots of lingering wikikube codfw hosts
  • 09:08 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,service=kubemaster,name=kubemaster1002.eqiad.wmnet
  • 09:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1002.eqiad.wmnet
  • 09:03 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P35991 and previous config saved to /var/cache/conftool/dbconfig/20221024-090255-ladsgroup.json
  • 09:01 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
  • 09:01 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kubernetes,service=kubemaster,name=kubemaster1002.eqiad.wmnet
  • 09:01 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
  • 09:01 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
  • 09:00 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
  • 09:00 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,service=kubemaster,name=kubemaster1001.eqiad.wmnet
  • 08:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1001.eqiad.wmnet
  • 08:53 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
  • 08:52 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kubernetes,service=kubemaster,name=kubemaster1001.eqiad.wmnet
  • 08:52 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
  • 08:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
  • 08:48 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P35990 and previous config saved to /var/cache/conftool/dbconfig/20221024-084748-ladsgroup.json
  • 08:45 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1016.eqiad.wmnet
  • 08:44 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1016.eqiad.wmnet
  • 08:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 199524
  • 08:44 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
  • 08:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 199524
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T321312)', diff saved to https://phabricator.wikimedia.org/P35989 and previous config saved to /var/cache/conftool/dbconfig/20221024-084037-ladsgroup.json
  • 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31042
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P35988 and previous config saved to /var/cache/conftool/dbconfig/20221024-083955-ladsgroup.json
  • 08:39 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1016.eqiad.wmnet
  • 08:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31042
  • 08:38 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1016.eqiad.wmnet
  • 08:38 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1015.eqiad.wmnet
  • 08:37 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
  • 08:37 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1015.eqiad.wmnet
  • 08:37 claime: kubernetes1015:~$ sudo systemctl reset-failed ifup@ens13.service T273026
  • 08:36 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
  • 08:35 Emperor: set thanos ring replicas to 3.50 T311690
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321312)', diff saved to https://phabricator.wikimedia.org/P35987 and previous config saved to /var/cache/conftool/dbconfig/20221024-083242-ladsgroup.json
  • 08:30 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1015.eqiad.wmnet
  • 08:29 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1015.eqiad.wmnet
  • 08:28 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1006.eqiad.wmnet
  • 08:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1006.eqiad.wmnet
  • 08:27 claime: kubernetes1006:~$ sudo systemctl reset-failed ifup@ens13.service T273026
  • 08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3303
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T321312)', diff saved to https://phabricator.wikimedia.org/P35986 and previous config saved to /var/cache/conftool/dbconfig/20221024-082605-ladsgroup.json
  • 08:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T321312)', diff saved to https://phabricator.wikimedia.org/P35985 and previous config saved to /var/cache/conftool/dbconfig/20221024-082540-ladsgroup.json
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P35984 and previous config saved to /var/cache/conftool/dbconfig/20221024-082448-ladsgroup.json
  • 08:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3303
  • 08:19 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1006.eqiad.wmnet
  • 08:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
  • 08:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
  • 08:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8075
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P35983 and previous config saved to /var/cache/conftool/dbconfig/20221024-081033-ladsgroup.json
  • 08:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8075
  • 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P35982 and previous config saved to /var/cache/conftool/dbconfig/20221024-080942-ladsgroup.json
  • 08:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 132337
  • 08:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 132337
  • 08:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32787
  • 08:05 ladsgroup@deploy1002: Finished scap: Backport for Enable source links on Translation ns on enwikisource and thwikisource (T53980) (duration: 09m 18s)
  • 08:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32787
  • 08:01 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1006.eqiad.wmnet
  • 08:00 claime: Starting october reboots of lingering wikikube eqiad hosts

2022-10-22

  • 03:16 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T321312)', diff saved to https://phabricator.wikimedia.org/P35965 and previous config saved to /var/cache/conftool/dbconfig/20221022-000300-ladsgroup.json

2022-10-21

  • 23:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P35964 and previous config saved to /var/cache/conftool/dbconfig/20221021-234754-ladsgroup.json
  • 23:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P35963 and previous config saved to /var/cache/conftool/dbconfig/20221021-233247-ladsgroup.json
  • 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T321312)', diff saved to https://phabricator.wikimedia.org/P35962 and previous config saved to /var/cache/conftool/dbconfig/20221021-231741-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T321312)', diff saved to https://phabricator.wikimedia.org/P35961 and previous config saved to /var/cache/conftool/dbconfig/20221021-231026-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 23:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T321312)', diff saved to https://phabricator.wikimedia.org/P35960 and previous config saved to /var/cache/conftool/dbconfig/20221021-231001-ladsgroup.json
  • 22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P35959 and previous config saved to /var/cache/conftool/dbconfig/20221021-225455-ladsgroup.json
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P35958 and previous config saved to /var/cache/conftool/dbconfig/20221021-223948-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T321312)', diff saved to https://phabricator.wikimedia.org/P35957 and previous config saved to /var/cache/conftool/dbconfig/20221021-222442-ladsgroup.json
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T321312)', diff saved to https://phabricator.wikimedia.org/P35956 and previous config saved to /var/cache/conftool/dbconfig/20221021-221826-ladsgroup.json
  • 22:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T321312)', diff saved to https://phabricator.wikimedia.org/P35955 and previous config saved to /var/cache/conftool/dbconfig/20221021-221802-ladsgroup.json
  • 22:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P35954 and previous config saved to /var/cache/conftool/dbconfig/20221021-220256-ladsgroup.json
  • 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P35953 and previous config saved to /var/cache/conftool/dbconfig/20221021-214749-ladsgroup.json
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T321312)', diff saved to https://phabricator.wikimedia.org/P35952 and previous config saved to /var/cache/conftool/dbconfig/20221021-213242-ladsgroup.json
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T321312)', diff saved to https://phabricator.wikimedia.org/P35951 and previous config saved to /var/cache/conftool/dbconfig/20221021-212629-ladsgroup.json
  • 21:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 21:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T321312)', diff saved to https://phabricator.wikimedia.org/P35950 and previous config saved to /var/cache/conftool/dbconfig/20221021-212604-ladsgroup.json
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P35949 and previous config saved to /var/cache/conftool/dbconfig/20221021-211058-ladsgroup.json
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P35948 and previous config saved to /var/cache/conftool/dbconfig/20221021-205551-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T321312)', diff saved to https://phabricator.wikimedia.org/P35947 and previous config saved to /var/cache/conftool/dbconfig/20221021-204045-ladsgroup.json
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T321312)', diff saved to https://phabricator.wikimedia.org/P35946 and previous config saved to /var/cache/conftool/dbconfig/20221021-203430-ladsgroup.json
  • 20:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T321312)', diff saved to https://phabricator.wikimedia.org/P35945 and previous config saved to /var/cache/conftool/dbconfig/20221021-203406-ladsgroup.json
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T321312)', diff saved to https://phabricator.wikimedia.org/P35944 and previous config saved to /var/cache/conftool/dbconfig/20221021-202721-ladsgroup.json
  • 20:20 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: apply updates - bking@cumin2002 - T321310
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P35943 and previous config saved to /var/cache/conftool/dbconfig/20221021-201900-ladsgroup.json
  • 20:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P35942 and previous config saved to /var/cache/conftool/dbconfig/20221021-201214-ladsgroup.json
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P35941 and previous config saved to /var/cache/conftool/dbconfig/20221021-200353-ladsgroup.json
  • 19:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P35940 and previous config saved to /var/cache/conftool/dbconfig/20221021-195708-ladsgroup.json
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T321312)', diff saved to https://phabricator.wikimedia.org/P35939 and previous config saved to /var/cache/conftool/dbconfig/20221021-194847-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T321312)', diff saved to https://phabricator.wikimedia.org/P35938 and previous config saved to /var/cache/conftool/dbconfig/20221021-194234-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T321312)', diff saved to https://phabricator.wikimedia.org/P35937 and previous config saved to /var/cache/conftool/dbconfig/20221021-194210-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T321312)', diff saved to https://phabricator.wikimedia.org/P35936 and previous config saved to /var/cache/conftool/dbconfig/20221021-194201-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T321312)', diff saved to https://phabricator.wikimedia.org/P35935 and previous config saved to /var/cache/conftool/dbconfig/20221021-193550-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 19:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35934 and previous config saved to /var/cache/conftool/dbconfig/20221021-193524-ladsgroup.json
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P35933 and previous config saved to /var/cache/conftool/dbconfig/20221021-192704-ladsgroup.json
  • 19:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1002-dev.eqiad.wmnet
  • 19:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-puppet-dashboard updates (duration: 02m 55s)
  • 19:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol2001-dev.wikimedia.org
  • 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P35932 and previous config saved to /var/cache/conftool/dbconfig/20221021-192016-ladsgroup.json
  • 19:20 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup1002-dev.eqiad.wmnet
  • 19:19 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-puppet-dashboard updates
  • 19:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1001-dev.eqiad.wmnet
  • 19:18 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-puppet-dashboard updates (duration: 01m 12s)
  • 19:17 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-puppet-dashboard updates
  • 19:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup1001-dev.eqiad.wmnet
  • 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P35931 and previous config saved to /var/cache/conftool/dbconfig/20221021-191157-ladsgroup.json
  • 19:10 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2001-dev.wikimedia.org
  • 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P35930 and previous config saved to /var/cache/conftool/dbconfig/20221021-190509-ladsgroup.json
  • 18:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol2004-dev.wikimedia.org
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T321312)', diff saved to https://phabricator.wikimedia.org/P35929 and previous config saved to /var/cache/conftool/dbconfig/20221021-185651-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T321312)', diff saved to https://phabricator.wikimedia.org/P35928 and previous config saved to /var/cache/conftool/dbconfig/20221021-185032-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35927 and previous config saved to /var/cache/conftool/dbconfig/20221021-185003-ladsgroup.json
  • 18:49 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices2004-dev.wikimedia.org
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P35926 and previous config saved to /var/cache/conftool/dbconfig/20221021-184747-ladsgroup.json
  • 18:46 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2004-dev.wikimedia.org
  • 18:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol2005-dev.wikimedia.org
  • 18:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T321312)', diff saved to https://phabricator.wikimedia.org/P35925 and previous config saved to /var/cache/conftool/dbconfig/20221021-184547-ladsgroup.json
  • 18:40 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet,service=varnish-fe
  • 18:40 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet,service=ats-tls
  • 18:40 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet,service=ats-be
  • 18:40 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4049.ulsfo.wmnet,service=varnish-fe
  • 18:40 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4049.ulsfo.wmnet,service=ats-tls
  • 18:40 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4049.ulsfo.wmnet,service=ats-be
  • 18:39 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet,service=varnish-fe
  • 18:39 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet,service=ats-tls
  • 18:39 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet,service=ats-be
  • 18:39 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4047.ulsfo.wmnet,service=varnish-fe
  • 18:39 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4047.ulsfo.wmnet,service=ats-tls
  • 18:39 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4047.ulsfo.wmnet,service=ats-be
  • 18:38 sukhe: pool new host cp4049: T317244
  • 18:38 sukhe: pool new host cp4047: T317244
  • 18:37 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.wikimedia.org
  • 18:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices2005-dev.wikimedia.org
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P35924 and previous config saved to /var/cache/conftool/dbconfig/20221021-183241-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P35923 and previous config saved to /var/cache/conftool/dbconfig/20221021-183041-ladsgroup.json
  • 18:24 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
  • 18:24 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
  • 18:22 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2005-dev.wikimedia.org
  • 18:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.wikimedia.org
  • 18:19 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
  • 18:19 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P35922 and previous config saved to /var/cache/conftool/dbconfig/20221021-181734-ladsgroup.json
  • 18:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
  • 18:15 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P35921 and previous config saved to /var/cache/conftool/dbconfig/20221021-181534-ladsgroup.json
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P35920 and previous config saved to /var/cache/conftool/dbconfig/20221021-180228-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T321312)', diff saved to https://phabricator.wikimedia.org/P35919 and previous config saved to /var/cache/conftool/dbconfig/20221021-180028-ladsgroup.json
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35918 and previous config saved to /var/cache/conftool/dbconfig/20221021-175615-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P35917 and previous config saved to /var/cache/conftool/dbconfig/20221021-175453-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35916 and previous config saved to /var/cache/conftool/dbconfig/20221021-175427-ladsgroup.json
  • 17:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2003-dev.codfw.wmnet with OS bullseye
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P35915 and previous config saved to /var/cache/conftool/dbconfig/20221021-173921-ladsgroup.json
  • 17:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: host reimage
  • 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
  • 17:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: host reimage
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P35914 and previous config saved to /var/cache/conftool/dbconfig/20221021-172414-ladsgroup.json
  • 17:20 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: apply updates - bking@cumin2002 - T321310
  • 17:20 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply updates - bking@cumin2002 - T321310
  • 17:10 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2003-dev.codfw.wmnet with OS bullseye
  • 17:09 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-ctrl1002.eqiad.wmnet
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35913 and previous config saved to /var/cache/conftool/dbconfig/20221021-170908-ladsgroup.json
  • 17:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P35912 and previous config saved to /var/cache/conftool/dbconfig/20221021-170551-ladsgroup.json
  • 17:03 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T321312)', diff saved to https://phabricator.wikimedia.org/P35911 and previous config saved to /var/cache/conftool/dbconfig/20221021-170011-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T321312)', diff saved to https://phabricator.wikimedia.org/P35910 and previous config saved to /var/cache/conftool/dbconfig/20221021-165930-ladsgroup.json
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35909 and previous config saved to /var/cache/conftool/dbconfig/20221021-165045-ladsgroup.json
  • 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-ctrl1001.eqiad.wmnet
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P35908 and previous config saved to /var/cache/conftool/dbconfig/20221021-164424-ladsgroup.json
  • 16:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:42 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35907 and previous config saved to /var/cache/conftool/dbconfig/20221021-163538-ladsgroup.json
  • 16:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P35906 and previous config saved to /var/cache/conftool/dbconfig/20221021-162917-ladsgroup.json
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:27 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl1002.eqiad.wmnet on all recursors
  • 16:27 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl1002.eqiad.wmnet on all recursors
  • 16:27 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 16:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:23 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 16:23 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl1002.eqiad.wmnet
  • 16:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:22 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl1001.eqiad.wmnet on all recursors
  • 16:22 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl1001.eqiad.wmnet on all recursors
  • 16:22 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P35905 and previous config saved to /var/cache/conftool/dbconfig/20221021-162032-ladsgroup.json
  • 16:20 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 16:20 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl1001.eqiad.wmnet
  • 16:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 16:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T321312)', diff saved to https://phabricator.wikimedia.org/P35899 and previous config saved to /var/cache/conftool/dbconfig/20221021-160246-ladsgroup.json
  • 16:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2001-dev.codfw.wmnet with reason: host reimage
  • 15:59 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2002-dev']
  • 15:58 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
  • 15:58 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
  • 15:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2001-dev.codfw.wmnet with reason: host reimage
  • 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P35898 and previous config saved to /var/cache/conftool/dbconfig/20221021-155616-ladsgroup.json
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P35897 and previous config saved to /var/cache/conftool/dbconfig/20221021-154740-ladsgroup.json
  • 15:46 cgoubert@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-worker-eqiad
  • 15:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
  • 15:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P35896 and previous config saved to /var/cache/conftool/dbconfig/20221021-154110-ladsgroup.json
  • 15:40 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 15:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2001-dev']
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P35895 and previous config saved to /var/cache/conftool/dbconfig/20221021-153234-ladsgroup.json
  • 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T321312)', diff saved to https://phabricator.wikimedia.org/P35894 and previous config saved to /var/cache/conftool/dbconfig/20221021-152603-ladsgroup.json
  • 15:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2001-dev']
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T321312)', diff saved to https://phabricator.wikimedia.org/P35893 and previous config saved to /var/cache/conftool/dbconfig/20221021-151945-ladsgroup.json
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T321312)', diff saved to https://phabricator.wikimedia.org/P35892 and previous config saved to /var/cache/conftool/dbconfig/20221021-151920-ladsgroup.json
  • 15:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T321312)', diff saved to https://phabricator.wikimedia.org/P35890 and previous config saved to /var/cache/conftool/dbconfig/20221021-151727-ladsgroup.json
  • 15:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS buster
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T321312)', diff saved to https://phabricator.wikimedia.org/P35889 and previous config saved to /var/cache/conftool/dbconfig/20221021-151104-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T321312)', diff saved to https://phabricator.wikimedia.org/P35888 and previous config saved to /var/cache/conftool/dbconfig/20221021-151040-ladsgroup.json
  • 15:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4049.ulsfo.wmnet with OS buster
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P35887 and previous config saved to /var/cache/conftool/dbconfig/20221021-150413-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P35886 and previous config saved to /var/cache/conftool/dbconfig/20221021-145534-ladsgroup.json
  • 14:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
  • 14:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
  • 14:49 btullis@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P35885 and previous config saved to /var/cache/conftool/dbconfig/20221021-144907-ladsgroup.json
  • 14:48 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
  • 14:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
  • 14:47 bblack@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4047
  • 14:46 bblack@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp4047
  • 14:44 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply updates - bking@cumin2002 - T321310
  • 14:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
  • 14:40 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P35884 and previous config saved to /var/cache/conftool/dbconfig/20221021-144028-ladsgroup.json
  • 14:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
  • 14:34 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2045.codfw.wmnet
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T321312)', diff saved to https://phabricator.wikimedia.org/P35883 and previous config saved to /var/cache/conftool/dbconfig/20221021-143400-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T321312)', diff saved to https://phabricator.wikimedia.org/P35882 and previous config saved to /var/cache/conftool/dbconfig/20221021-142742-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T321312)', diff saved to https://phabricator.wikimedia.org/P35881 and previous config saved to /var/cache/conftool/dbconfig/20221021-142712-ladsgroup.json
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T321312)', diff saved to https://phabricator.wikimedia.org/P35880 and previous config saved to /var/cache/conftool/dbconfig/20221021-142521-ladsgroup.json
  • 14:23 bblack@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS buster
  • 14:22 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
  • 14:22 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
  • 14:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=varnish-fe
  • 14:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=ats-tls
  • 14:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=ats-be
  • 14:22 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4037.ulsfo.wmnet,service=varnish-fe
  • 14:22 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4037.ulsfo.wmnet,service=ats-tls
  • 14:22 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4037.ulsfo.wmnet,service=ats-be
  • 14:22 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply updates - bking@cumin2002 - T321310
  • 14:21 sukhe: pool new host cp4037: T317244
  • 14:21 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
  • 14:18 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T321312)', diff saved to https://phabricator.wikimedia.org/P35879 and previous config saved to /var/cache/conftool/dbconfig/20221021-141815-ladsgroup.json
  • 14:18 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS buster
  • 14:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 14:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T321312)', diff saved to https://phabricator.wikimedia.org/P35878 and previous config saved to /var/cache/conftool/dbconfig/20221021-141752-ladsgroup.json
  • 14:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS buster
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P35877 and previous config saved to /var/cache/conftool/dbconfig/20221021-141206-ladsgroup.json
  • 14:11 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
  • 14:07 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P35876 and previous config saved to /var/cache/conftool/dbconfig/20221021-140245-ladsgroup.json
  • 14:00 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
  • 14:00 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be1042.eqiad.wmnet
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P35875 and previous config saved to /var/cache/conftool/dbconfig/20221021-135659-ladsgroup.json
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P35874 and previous config saved to /var/cache/conftool/dbconfig/20221021-134737-ladsgroup.json
  • 13:45 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
  • 13:44 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T321312)', diff saved to https://phabricator.wikimedia.org/P35873 and previous config saved to /var/cache/conftool/dbconfig/20221021-134153-ladsgroup.json
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T321312)', diff saved to https://phabricator.wikimedia.org/P35872 and previous config saved to /var/cache/conftool/dbconfig/20221021-133534-ladsgroup.json
  • 13:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T321312)', diff saved to https://phabricator.wikimedia.org/P35871 and previous config saved to /var/cache/conftool/dbconfig/20221021-133509-ladsgroup.json
  • 13:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
  • 13:34 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1005.eqiad.wmnet
  • 13:33 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
  • 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T321312)', diff saved to https://phabricator.wikimedia.org/P35870 and previous config saved to /var/cache/conftool/dbconfig/20221021-133231-ladsgroup.json
  • 13:32 claime: kubernetes1005:~$ sudo systemctl reset-failed ifup@ens13.service T273026
  • 13:31 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply updates - bking@cumin2002 - T321310
  • 13:28 btullis@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
  • 13:27 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T321312)', diff saved to https://phabricator.wikimedia.org/P35869 and previous config saved to /var/cache/conftool/dbconfig/20221021-132716-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 13:27 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
  • 13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35868 and previous config saved to /var/cache/conftool/dbconfig/20221021-132652-ladsgroup.json
  • 13:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P35867 and previous config saved to /var/cache/conftool/dbconfig/20221021-132003-ladsgroup.json
  • 13:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS buster
  • 13:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
  • 13:17 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 13:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P35866 and previous config saved to /var/cache/conftool/dbconfig/20221021-131145-ladsgroup.json
  • 13:09 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
  • 13:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS buster
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P35865 and previous config saved to /var/cache/conftool/dbconfig/20221021-130456-ladsgroup.json
  • 13:00 cgoubert@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-worker-codfw
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P35864 and previous config saved to /var/cache/conftool/dbconfig/20221021-125639-ladsgroup.json
  • 12:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2024.codfw.wmnet
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T321312)', diff saved to https://phabricator.wikimedia.org/P35863 and previous config saved to /var/cache/conftool/dbconfig/20221021-124950-ladsgroup.json
  • 12:48 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2024.codfw.wmnet
  • 12:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2023.codfw.wmnet
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T321312)', diff saved to https://phabricator.wikimedia.org/P35862 and previous config saved to /var/cache/conftool/dbconfig/20221021-124327-ladsgroup.json
  • 12:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 12:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T321312)', diff saved to https://phabricator.wikimedia.org/P35861 and previous config saved to /var/cache/conftool/dbconfig/20221021-124302-ladsgroup.json
  • 12:42 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2023.codfw.wmnet
  • 12:42 dcausse: restarting blazegraph on wdqs1013 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35860 and previous config saved to /var/cache/conftool/dbconfig/20221021-124132-ladsgroup.json
  • 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P35859 and previous config saved to /var/cache/conftool/dbconfig/20221021-123815-ladsgroup.json
  • 12:35 claime: rebooted kubernetes2006.codfw.wmnet manually - root cause T273026
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P35858 and previous config saved to /var/cache/conftool/dbconfig/20221021-122755-ladsgroup.json
  • 12:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=kubesvc,name=kubernetes2006.codfw.wmnet
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P35857 and previous config saved to /var/cache/conftool/dbconfig/20221021-122308-ladsgroup.json
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P35856 and previous config saved to /var/cache/conftool/dbconfig/20221021-121249-ladsgroup.json
  • 12:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 12:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2006.codfw.wmnet
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P35855 and previous config saved to /var/cache/conftool/dbconfig/20221021-120802-ladsgroup.json
  • 12:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2006.codfw.wmnet
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T321312)', diff saved to https://phabricator.wikimedia.org/P35854 and previous config saved to /var/cache/conftool/dbconfig/20221021-115742-ladsgroup.json
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P35853 and previous config saved to /var/cache/conftool/dbconfig/20221021-115255-ladsgroup.json
  • 11:52 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:51 Emperor: rolling reboot of codfw swift frontends re October reboots
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T321312)', diff saved to https://phabricator.wikimedia.org/P35852 and previous config saved to /var/cache/conftool/dbconfig/20221021-115128-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T321312)', diff saved to https://phabricator.wikimedia.org/P35851 and previous config saved to /var/cache/conftool/dbconfig/20221021-115103-ladsgroup.json
  • 11:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 11:47 klausman@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
  • 11:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35850 and previous config saved to /var/cache/conftool/dbconfig/20221021-114553-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P35849 and previous config saved to /var/cache/conftool/dbconfig/20221021-114429-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35848 and previous config saved to /var/cache/conftool/dbconfig/20221021-114405-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P35847 and previous config saved to /var/cache/conftool/dbconfig/20221021-113556-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P35846 and previous config saved to /var/cache/conftool/dbconfig/20221021-112859-ladsgroup.json
  • 11:27 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:27 Emperor: rolling reboot of eqiad swift frontends re October reboots
  • 11:22 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P35845 and previous config saved to /var/cache/conftool/dbconfig/20221021-112050-ladsgroup.json
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35840 and previous config saved to /var/cache/conftool/dbconfig/20221021-105845-ladsgroup.json
  • 10:58 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P35839 and previous config saved to /var/cache/conftool/dbconfig/20221021-105529-ladsgroup.json
  • 10:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
  • 10:49 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P35838 and previous config saved to /var/cache/conftool/dbconfig/20221021-104349-ladsgroup.json
  • 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P35837 and previous config saved to /var/cache/conftool/dbconfig/20221021-104022-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P35836 and previous config saved to /var/cache/conftool/dbconfig/20221021-102842-ladsgroup.json
  • 10:25 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P35835 and previous config saved to /var/cache/conftool/dbconfig/20221021-102516-ladsgroup.json
  • 10:24 jynus: restart of ms-backup hosts
  • 10:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T321312)', diff saved to https://phabricator.wikimedia.org/P35834 and previous config saved to /var/cache/conftool/dbconfig/20221021-101336-ladsgroup.json
  • 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P35833 and previous config saved to /var/cache/conftool/dbconfig/20221021-101009-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T321312)', diff saved to https://phabricator.wikimedia.org/P35832 and previous config saved to /var/cache/conftool/dbconfig/20221021-100813-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 10:07 btullis@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:dse-k8s-worker
  • 10:07 btullis@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
  • 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 10:03 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-staging-worker
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35831 and previous config saved to /var/cache/conftool/dbconfig/20221021-100305-ladsgroup.json
  • 10:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P35830 and previous config saved to /var/cache/conftool/dbconfig/20221021-100137-ladsgroup.json
  • 10:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 10:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 09:56 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 09:55 btullis@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:dse-k8s-worker
  • 09:54 btullis@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
  • 09:54 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 09:46 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 09:43 klausman@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 09:18 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
  • 09:16 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
  • 09:14 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
  • 09:11 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
  • 09:10 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
  • 09:10 jynus: finished rolling restart of dbprov hosts
  • 09:09 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
  • 08:52 jynus: finished rolling restart of backup hosts
  • 08:47 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
  • 08:40 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
  • 07:37 jynus: start of rolling restart of backup hosts
  • 07:20 oblivian@deploy1002: Finished scap: Backport for Fix broken links (duration: 07m 11s)
  • 07:13 oblivian@deploy1002: oblivian and oblivian: Backport for Fix broken links synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 07:13 oblivian@deploy1002: Started scap: Backport for Fix broken links
  • 07:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36692
  • 06:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36692
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T321312)', diff saved to https://phabricator.wikimedia.org/P35829 and previous config saved to /var/cache/conftool/dbconfig/20221021-062817-ladsgroup.json
  • 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P35828 and previous config saved to /var/cache/conftool/dbconfig/20221021-061311-ladsgroup.json
  • 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P35827 and previous config saved to /var/cache/conftool/dbconfig/20221021-055804-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T321312)', diff saved to https://phabricator.wikimedia.org/P35826 and previous config saved to /var/cache/conftool/dbconfig/20221021-054258-ladsgroup.json
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T321312)', diff saved to https://phabricator.wikimedia.org/P35825 and previous config saved to /var/cache/conftool/dbconfig/20221021-053636-ladsgroup.json
  • 05:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 05:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T321312)', diff saved to https://phabricator.wikimedia.org/P35824 and previous config saved to /var/cache/conftool/dbconfig/20221021-053611-ladsgroup.json
  • 05:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P35823 and previous config saved to /var/cache/conftool/dbconfig/20221021-052104-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P35822 and previous config saved to /var/cache/conftool/dbconfig/20221021-050558-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T321312)', diff saved to https://phabricator.wikimedia.org/P35821 and previous config saved to /var/cache/conftool/dbconfig/20221021-045051-ladsgroup.json
  • 04:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T321312)', diff saved to https://phabricator.wikimedia.org/P35820 and previous config saved to /var/cache/conftool/dbconfig/20221021-044433-ladsgroup.json
  • 04:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 04:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 04:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T321312)', diff saved to https://phabricator.wikimedia.org/P35819 and previous config saved to /var/cache/conftool/dbconfig/20221021-044407-ladsgroup.json
  • 04:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P35818 and previous config saved to /var/cache/conftool/dbconfig/20221021-042901-ladsgroup.json
  • 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P35817 and previous config saved to /var/cache/conftool/dbconfig/20221021-041354-ladsgroup.json
  • 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T321312)', diff saved to https://phabricator.wikimedia.org/P35816 and previous config saved to /var/cache/conftool/dbconfig/20221021-035848-ladsgroup.json
  • 03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T321312)', diff saved to https://phabricator.wikimedia.org/P35815 and previous config saved to /var/cache/conftool/dbconfig/20221021-035120-ladsgroup.json
  • 03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 03:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 03:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T321312)', diff saved to https://phabricator.wikimedia.org/P35814 and previous config saved to /var/cache/conftool/dbconfig/20221021-035050-ladsgroup.json
  • 03:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P35813 and previous config saved to /var/cache/conftool/dbconfig/20221021-033544-ladsgroup.json
  • 03:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P35812 and previous config saved to /var/cache/conftool/dbconfig/20221021-032037-ladsgroup.json
  • 02:48 cstone: civicrm upgraded from 3e24d6f7 to 89a46665
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P35808 and previous config saved to /var/cache/conftool/dbconfig/20221021-024303-ladsgroup.json
  • 02:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P35807 and previous config saved to /var/cache/conftool/dbconfig/20221021-022757-ladsgroup.json
  • 02:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35806 and previous config saved to /var/cache/conftool/dbconfig/20221021-021250-ladsgroup.json
  • 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P35805 and previous config saved to /var/cache/conftool/dbconfig/20221021-020733-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T321312)', diff saved to https://phabricator.wikimedia.org/P35804 and previous config saved to /var/cache/conftool/dbconfig/20221021-015503-ladsgroup.json
  • 01:52 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T321310
  • 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P35803 and previous config saved to /var/cache/conftool/dbconfig/20221021-015226-ladsgroup.json
  • 01:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P35802 and previous config saved to /var/cache/conftool/dbconfig/20221021-013957-ladsgroup.json
  • 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P35801 and previous config saved to /var/cache/conftool/dbconfig/20221021-013720-ladsgroup.json
  • 01:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P35800 and previous config saved to /var/cache/conftool/dbconfig/20221021-012450-ladsgroup.json
  • 01:24 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS buster
  • 01:22 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T321310
  • 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P35799 and previous config saved to /var/cache/conftool/dbconfig/20221021-012213-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35798 and previous config saved to /var/cache/conftool/dbconfig/20221021-011452-ladsgroup.json
  • 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P35797 and previous config saved to /var/cache/conftool/dbconfig/20221021-011324-ladsgroup.json
  • 01:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P35796 and previous config saved to /var/cache/conftool/dbconfig/20221021-011259-ladsgroup.json
  • 01:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T321312)', diff saved to https://phabricator.wikimedia.org/P35795 and previous config saved to /var/cache/conftool/dbconfig/20221021-010944-ladsgroup.json
  • 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T321312)', diff saved to https://phabricator.wikimedia.org/P35794 and previous config saved to /var/cache/conftool/dbconfig/20221021-010325-ladsgroup.json
  • 01:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 01:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T321312)', diff saved to https://phabricator.wikimedia.org/P35793 and previous config saved to /var/cache/conftool/dbconfig/20221021-010301-ladsgroup.json
  • 01:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS buster
  • 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P35792 and previous config saved to /var/cache/conftool/dbconfig/20221021-005752-ladsgroup.json
  • 00:48 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bullseye
  • 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P35791 and previous config saved to /var/cache/conftool/dbconfig/20221021-004754-ladsgroup.json
  • 00:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P35790 and previous config saved to /var/cache/conftool/dbconfig/20221021-004246-ladsgroup.json
  • 00:39 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T321310
  • 00:38 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS buster
  • 00:34 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T321310
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P35789 and previous config saved to /var/cache/conftool/dbconfig/20221021-003247-ladsgroup.json
  • 00:32 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 00:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P35788 and previous config saved to /var/cache/conftool/dbconfig/20221021-002739-ladsgroup.json
  • 00:27 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35787 and previous config saved to /var/cache/conftool/dbconfig/20221021-002624-ladsgroup.json
  • 00:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS buster
  • 00:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS buster
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T321312)', diff saved to https://phabricator.wikimedia.org/P35786 and previous config saved to /var/cache/conftool/dbconfig/20221021-001740-ladsgroup.json
  • 00:16 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4005.ulsfo.wmnet with OS bullseye
  • 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T321312)', diff saved to https://phabricator.wikimedia.org/P35785 and previous config saved to /var/cache/conftool/dbconfig/20221021-001123-ladsgroup.json
  • 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P35784 and previous config saved to /var/cache/conftool/dbconfig/20221021-001117-ladsgroup.json
  • 00:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 00:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 00:07 robh@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bullseye
  • 00:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T321312)', diff saved to https://phabricator.wikimedia.org/P35783 and previous config saved to /var/cache/conftool/dbconfig/20221021-000636-ladsgroup.json
  • 00:06 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs4008
  • 00:06 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs4008
  • 00:05 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:04 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 00:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS buster
  • 00:01 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti4005.ulsfo.wmnet with reason: host reimage
  • 00:00 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4005.ulsfo.wmnet with reason: host reimage

2022-10-20

  • 23:58 robh@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs4008
  • 23:58 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs4008
  • 23:41 sukhe: COMPLETED: sudo cumin 'A:installserver' 'run-puppet-agent -q' for Gerrit 845074
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35780 and previous config saved to /var/cache/conftool/dbconfig/20221020-234104-ladsgroup.json
  • 23:39 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4005.ulsfo.wmnet with OS bullseye
  • 23:38 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti4005.ulsfo.wmnet with OS bullseye
  • 23:38 sukhe: sudo cumin 'A:installserver' 'run-puppet-agent -q' for Gerrit 845074
  • 23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P35779 and previous config saved to /var/cache/conftool/dbconfig/20221020-233623-ladsgroup.json
  • 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P35778 and previous config saved to /var/cache/conftool/dbconfig/20221020-233452-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35777 and previous config saved to /var/cache/conftool/dbconfig/20221020-233325-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 23:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T321312)', diff saved to https://phabricator.wikimedia.org/P35776 and previous config saved to /var/cache/conftool/dbconfig/20221020-233300-ladsgroup.json
  • 23:32 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4005.ulsfo.wmnet with OS bullseye
  • 23:31 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 23:31 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 23:29 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 23:25 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 23:25 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 23:23 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 23:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T321312)', diff saved to https://phabricator.wikimedia.org/P35775 and previous config saved to /var/cache/conftool/dbconfig/20221020-232116-ladsgroup.json
  • 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P35774 and previous config saved to /var/cache/conftool/dbconfig/20221020-231754-ladsgroup.json
  • 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T321312)', diff saved to https://phabricator.wikimedia.org/P35773 and previous config saved to /var/cache/conftool/dbconfig/20221020-231446-ladsgroup.json
  • 23:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 23:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T321312)', diff saved to https://phabricator.wikimedia.org/P35772 and previous config saved to /var/cache/conftool/dbconfig/20221020-231422-ladsgroup.json
  • 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20221020-230242-ladsgroup.json
  • 22:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P35771 and previous config saved to /var/cache/conftool/dbconfig/20221020-225916-ladsgroup.json
  • 22:50 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt2003-dev.codfw.wmnet
  • 22:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt2002-dev.codfw.wmnet
  • 22:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt2001-dev.codfw.wmnet
  • 22:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet
  • 22:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet
  • 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T321312)', diff saved to https://phabricator.wikimedia.org/P35770 and previous config saved to /var/cache/conftool/dbconfig/20221020-224736-ladsgroup.json
  • 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P35769 and previous config saved to /var/cache/conftool/dbconfig/20221020-224409-ladsgroup.json
  • 22:43 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
  • 22:43 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
  • 22:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet
  • 22:40 cstone: civicrm upgraded from c96dd3ae to 3e24d6f7
  • 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T321312)', diff saved to https://phabricator.wikimedia.org/P35768 and previous config saved to /var/cache/conftool/dbconfig/20221020-224003-ladsgroup.json
  • 22:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T321312)', diff saved to https://phabricator.wikimedia.org/P35767 and previous config saved to /var/cache/conftool/dbconfig/20221020-223937-ladsgroup.json
  • 22:36 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T321312)', diff saved to https://phabricator.wikimedia.org/P35766 and previous config saved to /var/cache/conftool/dbconfig/20221020-222903-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P35765 and previous config saved to /var/cache/conftool/dbconfig/20221020-222431-ladsgroup.json
  • 22:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T321312)', diff saved to https://phabricator.wikimedia.org/P35764 and previous config saved to /var/cache/conftool/dbconfig/20221020-222253-ladsgroup.json
  • 22:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 22:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 22:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T321312)', diff saved to https://phabricator.wikimedia.org/P35763 and previous config saved to /var/cache/conftool/dbconfig/20221020-222229-ladsgroup.json
  • 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P35762 and previous config saved to /var/cache/conftool/dbconfig/20221020-220924-ladsgroup.json
  • 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P35761 and previous config saved to /var/cache/conftool/dbconfig/20221020-220722-ladsgroup.json
  • 21:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS buster
  • 21:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T321310
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T321312)', diff saved to https://phabricator.wikimedia.org/P35760 and previous config saved to /var/cache/conftool/dbconfig/20221020-215418-ladsgroup.json
  • 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P35759 and previous config saved to /var/cache/conftool/dbconfig/20221020-215216-ladsgroup.json
  • 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T321312)', diff saved to https://phabricator.wikimedia.org/P35758 and previous config saved to /var/cache/conftool/dbconfig/20221020-214750-ladsgroup.json
  • 21:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 21:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T321312)', diff saved to https://phabricator.wikimedia.org/P35757 and previous config saved to /var/cache/conftool/dbconfig/20221020-214725-ladsgroup.json
  • 21:40 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T321312)', diff saved to https://phabricator.wikimedia.org/P35756 and previous config saved to /var/cache/conftool/dbconfig/20221020-213709-ladsgroup.json
  • 21:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P35755 and previous config saved to /var/cache/conftool/dbconfig/20221020-213218-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T321312)', diff saved to https://phabricator.wikimedia.org/P35754 and previous config saved to /var/cache/conftool/dbconfig/20221020-213050-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35753 and previous config saved to /var/cache/conftool/dbconfig/20221020-213025-ladsgroup.json
  • 21:25 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:19 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P35752 and previous config saved to /var/cache/conftool/dbconfig/20221020-211712-ladsgroup.json
  • 21:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:16 TheresNoTime: close UTC late window
  • 21:16 samtar@deploy1002: Finished scap: Backport for statsv: Add error counters to delete/tags .js (T320543) (duration: 08m 26s)
  • 21:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P35751 and previous config saved to /var/cache/conftool/dbconfig/20221020-211519-ladsgroup.json
  • 21:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org
  • 21:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 21:11 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:10 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
  • 21:08 samtar@deploy1002: samtar and samtar: Backport for statsv: Add error counters to delete/tags .js (T320543) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:08 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
  • 21:07 samtar@deploy1002: Started scap: Backport for statsv: Add error counters to delete/tags .js (T320543)
  • 21:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: apply updates - bking@cumin2002 - T321310
  • 21:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T318955)', diff saved to https://phabricator.wikimedia.org/P35750 and previous config saved to /var/cache/conftool/dbconfig/20221020-210308-ladsgroup.json
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T321312)', diff saved to https://phabricator.wikimedia.org/P35749 and previous config saved to /var/cache/conftool/dbconfig/20221020-210205-ladsgroup.json
  • 21:01 TheresNoTime: extending UTC late backport window
  • 21:01 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
  • 21:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1005.eqiad.wmnet
  • 21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P35748 and previous config saved to /var/cache/conftool/dbconfig/20221020-210012-ladsgroup.json
  • 20:59 hoo@deploy1002: Finished scap: Backport for Only generate QS maxlag for pooled servers (T315423 T238751) (duration: 07m 12s)
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T321312)', diff saved to https://phabricator.wikimedia.org/P35747 and previous config saved to /var/cache/conftool/dbconfig/20221020-205532-ladsgroup.json
  • 20:37 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
  • 20:36 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host graphite2004.codfw.wmnet
  • 20:36 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
  • 20:35 samtar@deploy1002: Finished scap: Backport for ReplyLinksController: Skip empty reply buttons container (T321185) (duration: 05m 23s)
  • 20:33 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P35741 and previous config saved to /var/cache/conftool/dbconfig/20221020-203255-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P35739 and previous config saved to /var/cache/conftool/dbconfig/20221020-202344-ladsgroup.json
  • 20:22 thcipriani@deploy1002: Finished scap: Backport for Updates to Wikipedia wordmark/taglines and project icons (T319223) (duration: 14m 12s)
  • 20:22 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
  • 20:21 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:21 dzahn@cumin2002: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:eqiad and (A:gitlab-runner)
  • 20:19 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T318955)', diff saved to https://phabricator.wikimedia.org/P35737 and previous config saved to /var/cache/conftool/dbconfig/20221020-201748-ladsgroup.json
  • 20:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
  • 20:14 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4049.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:14 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4047.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T321312)', diff saved to https://phabricator.wikimedia.org/P35736 and previous config saved to /var/cache/conftool/dbconfig/20221020-200947-ladsgroup.json
  • 20:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS buster
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P35735 and previous config saved to /var/cache/conftool/dbconfig/20221020-200838-ladsgroup.json
  • 20:08 thcipriani@deploy1002: thcipriani and jdlrobson: Backport for Updates to Wikipedia wordmark/taglines and project icons (T319223) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:08 thcipriani@deploy1002: Started scap: Backport for Updates to Wikipedia wordmark/taglines and project icons (T319223)
  • 20:07 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
  • 20:05 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS buster
  • 20:05 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T321312)', diff saved to https://phabricator.wikimedia.org/P35734 and previous config saved to /var/cache/conftool/dbconfig/20221020-200321-ladsgroup.json
  • 20:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 20:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T318955)', diff saved to https://phabricator.wikimedia.org/P35733 and previous config saved to /var/cache/conftool/dbconfig/20221020-200205-ladsgroup.json
  • 20:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 20:01 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:codfw and (A:gitlab-runner)
  • 20:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T318955)', diff saved to https://phabricator.wikimedia.org/P35732 and previous config saved to /var/cache/conftool/dbconfig/20221020-200143-ladsgroup.json
  • 20:00 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4049.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 20:00 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4047.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 19:59 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 19:57 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P35731 and previous config saved to /var/cache/conftool/dbconfig/20221020-195331-ladsgroup.json
  • 19:50 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS buster
  • 19:41 dzahn@cumin2002: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:codfw and (A:gitlab-runner)
  • 19:39 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P35727 and previous config saved to /var/cache/conftool/dbconfig/20221020-193756-ladsgroup.json
  • 19:37 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS buster
  • 19:33 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P35726 and previous config saved to /var/cache/conftool/dbconfig/20221020-193130-ladsgroup.json
  • 19:29 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: apply updates - bking@cumin2002 - T321310
  • 19:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS buster
  • 19:27 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS buster
  • 19:26 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
  • 19:23 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P35725 and previous config saved to /var/cache/conftool/dbconfig/20221020-192249-ladsgroup.json
  • 19:16 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T318955)', diff saved to https://phabricator.wikimedia.org/P35724 and previous config saved to /var/cache/conftool/dbconfig/20221020-191624-ladsgroup.json
  • 19:15 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:13 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 19:13 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4029.ulsfo.wmnet
  • 19:13 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:13 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS buster
  • 19:12 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS buster
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P35723 and previous config saved to /var/cache/conftool/dbconfig/20221020-190743-ladsgroup.json
  • 19:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS buster
  • 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T318955)', diff saved to https://phabricator.wikimedia.org/P35722 and previous config saved to /var/cache/conftool/dbconfig/20221020-190101-ladsgroup.json
  • 19:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T318955)', diff saved to https://phabricator.wikimedia.org/P35721 and previous config saved to /var/cache/conftool/dbconfig/20221020-190039-ladsgroup.json
  • 18:56 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P35720 and previous config saved to /var/cache/conftool/dbconfig/20221020-185236-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35719 and previous config saved to /var/cache/conftool/dbconfig/20221020-185021-ladsgroup.json
  • 18:49 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4029.ulsfo.wmnet
  • 18:46 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T318950)', diff saved to https://phabricator.wikimedia.org/P35718 and previous config saved to /var/cache/conftool/dbconfig/20221020-184547-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P35717 and previous config saved to /var/cache/conftool/dbconfig/20221020-184533-ladsgroup.json
  • 18:42 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:37 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:36 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P35716 and previous config saved to /var/cache/conftool/dbconfig/20221020-183515-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P35715 and previous config saved to /var/cache/conftool/dbconfig/20221020-183040-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P35714 and previous config saved to /var/cache/conftool/dbconfig/20221020-183026-ladsgroup.json
  • 18:28 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:26 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:25 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P35713 and previous config saved to /var/cache/conftool/dbconfig/20221020-182008-ladsgroup.json
  • 18:18 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:18 samtar@deploy1002: Finished scap: Backport for Hooks: Log to statsd when a page is noindex'd (T310974) (duration: 08m 08s)
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T321312)', diff saved to https://phabricator.wikimedia.org/P35712 and previous config saved to /var/cache/conftool/dbconfig/20221020-181630-ladsgroup.json
  • 18:16 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P35711 and previous config saved to /var/cache/conftool/dbconfig/20221020-181533-ladsgroup.json
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T318955)', diff saved to https://phabricator.wikimedia.org/P35710 and previous config saved to /var/cache/conftool/dbconfig/20221020-181520-ladsgroup.json
  • 18:10 samtar@deploy1002: samtar and samtar: Backport for Hooks: Log to statsd when a page is noindex'd (T310974) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 18:09 samtar@deploy1002: Started scap: Backport for Hooks: Log to statsd when a page is noindex'd (T310974)
  • 18:06 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4037
  • 18:06 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4037
  • 18:06 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4037.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:05 TheresNoTime: Backporting gerrit:845012 for T310974 to wmf.6
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35709 and previous config saved to /var/cache/conftool/dbconfig/20221020-180502-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P35708 and previous config saved to /var/cache/conftool/dbconfig/20221020-180123-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T318950)', diff saved to https://phabricator.wikimedia.org/P35707 and previous config saved to /var/cache/conftool/dbconfig/20221020-180027-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T318955)', diff saved to https://phabricator.wikimedia.org/P35706 and previous config saved to /var/cache/conftool/dbconfig/20221020-180021-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T318955)', diff saved to https://phabricator.wikimedia.org/P35705 and previous config saved to /var/cache/conftool/dbconfig/20221020-175959-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P35704 and previous config saved to /var/cache/conftool/dbconfig/20221020-175854-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T318950)', diff saved to https://phabricator.wikimedia.org/P35703 and previous config saved to /var/cache/conftool/dbconfig/20221020-175817-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T318950)', diff saved to https://phabricator.wikimedia.org/P35702 and previous config saved to /var/cache/conftool/dbconfig/20221020-175755-ladsgroup.json
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35701 and previous config saved to /var/cache/conftool/dbconfig/20221020-175726-ladsgroup.json
  • 17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 17:53 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4037.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 17:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T321312)', diff saved to https://phabricator.wikimedia.org/P35700 and previous config saved to /var/cache/conftool/dbconfig/20221020-175244-ladsgroup.json
  • 17:52 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:49 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:48 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P35699 and previous config saved to /var/cache/conftool/dbconfig/20221020-174617-ladsgroup.json
  • 17:46 mutante: phabricator - disabling git-ssh URIs for repo 'phabricator-translations' https://phabricator.wikimedia.org/source/phabricator-translation - T296022
  • 17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P35698 and previous config saved to /var/cache/conftool/dbconfig/20221020-174453-ladsgroup.json
  • 17:44 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P35697 and previous config saved to /var/cache/conftool/dbconfig/20221020-174248-ladsgroup.json
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P35696 and previous config saved to /var/cache/conftool/dbconfig/20221020-173737-ladsgroup.json
  • 17:36 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:35 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T321312)', diff saved to https://phabricator.wikimedia.org/P35695 and previous config saved to /var/cache/conftool/dbconfig/20221020-173111-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P35693 and previous config saved to /var/cache/conftool/dbconfig/20221020-172741-ladsgroup.json
  • 17:26 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T321312)', diff saved to https://phabricator.wikimedia.org/P35692 and previous config saved to /var/cache/conftool/dbconfig/20221020-172445-ladsgroup.json
  • 17:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 17:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T321312)', diff saved to https://phabricator.wikimedia.org/P35691 and previous config saved to /var/cache/conftool/dbconfig/20221020-172419-ladsgroup.json
  • 17:23 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P35690 and previous config saved to /var/cache/conftool/dbconfig/20221020-172231-ladsgroup.json
  • 17:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 10310
  • 17:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
  • 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T318955)', diff saved to https://phabricator.wikimedia.org/P35689 and previous config saved to /var/cache/conftool/dbconfig/20221020-171439-ladsgroup.json
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T318950)', diff saved to https://phabricator.wikimedia.org/P35688 and previous config saved to /var/cache/conftool/dbconfig/20221020-171234-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P35679 and previous config saved to /var/cache/conftool/dbconfig/20221020-165456-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P35678 and previous config saved to /var/cache/conftool/dbconfig/20221020-165406-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P35677 and previous config saved to /var/cache/conftool/dbconfig/20221020-164515-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P35676 and previous config saved to /var/cache/conftool/dbconfig/20221020-164339-ladsgroup.json
  • 16:40 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P35675 and previous config saved to /var/cache/conftool/dbconfig/20221020-163950-ladsgroup.json
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T321312)', diff saved to https://phabricator.wikimedia.org/P35674 and previous config saved to /var/cache/conftool/dbconfig/20221020-163900-ladsgroup.json
  • 16:22 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 16:20 mutante: phab1001 (phabricator) - remove LVS IP from loopback - ip addr del 208.80.154.250 dev lo - T296022
  • 16:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36351
  • 16:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36351
  • 16:18 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 16:17 urbanecm@deploy1002: Finished scap: Backport for MenteeOverview: Fix link under "reverted" column (T321321) (duration: 04m 22s)
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P35668 and previous config saved to /var/cache/conftool/dbconfig/20221020-161648-ladsgroup.json
  • 16:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T321312)', diff saved to https://phabricator.wikimedia.org/P35667 and previous config saved to /var/cache/conftool/dbconfig/20221020-161502-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T318955)', diff saved to https://phabricator.wikimedia.org/P35666 and previous config saved to /var/cache/conftool/dbconfig/20221020-161326-ladsgroup.json
  • 16:13 urbanecm@deploy1002: Started scap: Backport for MenteeOverview: Fix link under "reverted" column (T321321)
  • 16:12 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 16:11 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T318955)', diff saved to https://phabricator.wikimedia.org/P35665 and previous config saved to /var/cache/conftool/dbconfig/20221020-161106-ladsgroup.json
  • 16:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 16:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 16:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 16:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T318955)', diff saved to https://phabricator.wikimedia.org/P35664 and previous config saved to /var/cache/conftool/dbconfig/20221020-161029-ladsgroup.json
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T321312)', diff saved to https://phabricator.wikimedia.org/P35663 and previous config saved to /var/cache/conftool/dbconfig/20221020-160832-ladsgroup.json
  • 16:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 16:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 16:08 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T321312)', diff saved to https://phabricator.wikimedia.org/P35662 and previous config saved to /var/cache/conftool/dbconfig/20221020-160808-ladsgroup.json
  • 16:05 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 16:02 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P35661 and previous config saved to /var/cache/conftool/dbconfig/20221020-160142-ladsgroup.json
  • 16:01 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 16:00 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host failoid2002.codfw.wmnet
  • 16:00 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 15:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 15:55 mutante: phabricator (diffusion) - clicked "disable" and then "deactivate" on Blubber diffusion repo. it's now "inactive, publishing and syncing has been disabled https://phabricator.wikimedia.org/source/blubber/ T317820
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P35660 and previous config saved to /var/cache/conftool/dbconfig/20221020-155523-ladsgroup.json
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P35659 and previous config saved to /var/cache/conftool/dbconfig/20221020-155302-ladsgroup.json
  • 15:49 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T318950)', diff saved to https://phabricator.wikimedia.org/P35658 and previous config saved to /var/cache/conftool/dbconfig/20221020-154724-ladsgroup.json
  • 15:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35657 and previous config saved to /var/cache/conftool/dbconfig/20221020-154644-ladsgroup.json
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T321312)', diff saved to https://phabricator.wikimedia.org/P35656 and previous config saved to /var/cache/conftool/dbconfig/20221020-154635-ladsgroup.json
  • 15:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 15:41 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P35655 and previous config saved to /var/cache/conftool/dbconfig/20221020-154016-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T321312)', diff saved to https://phabricator.wikimedia.org/P35654 and previous config saved to /var/cache/conftool/dbconfig/20221020-154006-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 15:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 15:06 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: Reorder variables, GrowthExperiments: Define wgGEMentorshipUseIsActiveFlag (T318457) (duration: 04m 30s)
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T318955)', diff saved to https://phabricator.wikimedia.org/P35644 and previous config saved to /var/cache/conftool/dbconfig/20221020-150537-ladsgroup.json
  • 15:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T318955)', diff saved to https://phabricator.wikimedia.org/P35643 and previous config saved to /var/cache/conftool/dbconfig/20221020-150515-ladsgroup.json
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P35642 and previous config saved to /var/cache/conftool/dbconfig/20221020-150329-ladsgroup.json
  • 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P35641 and previous config saved to /var/cache/conftool/dbconfig/20221020-150201-ladsgroup.json
  • 15:01 urbanecm@deploy1002: urbanecm and urbanecm: Backport for GrowthExperiments: Reorder variables, GrowthExperiments: Define wgGEMentorshipUseIsActiveFlag (T318457) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 15:01 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: Reorder variables, GrowthExperiments: Define wgGEMentorshipUseIsActiveFlag (T318457)
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35640 and previous config saved to /var/cache/conftool/dbconfig/20221020-150125-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35639 and previous config saved to /var/cache/conftool/dbconfig/20221020-145214-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T318950)', diff saved to https://phabricator.wikimedia.org/P35638 and previous config saved to /var/cache/conftool/dbconfig/20221020-145152-ladsgroup.json
  • 14:51 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1002.wikimedia.org
  • 14:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2005']
  • 14:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2005']
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P35637 and previous config saved to /var/cache/conftool/dbconfig/20221020-145009-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T321312)', diff saved to https://phabricator.wikimedia.org/P35636 and previous config saved to /var/cache/conftool/dbconfig/20221020-144823-ladsgroup.json
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P35635 and previous config saved to /var/cache/conftool/dbconfig/20221020-144655-ladsgroup.json
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T321312)', diff saved to https://phabricator.wikimedia.org/P35634 and previous config saved to /var/cache/conftool/dbconfig/20221020-144150-ladsgroup.json
  • 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T321312)', diff saved to https://phabricator.wikimedia.org/P35633 and previous config saved to /var/cache/conftool/dbconfig/20221020-144125-ladsgroup.json
  • 14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36012
  • 14:37 papaul: powerdown wdqs2005 for maintenance
  • 14:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36012
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P35632 and previous config saved to /var/cache/conftool/dbconfig/20221020-143646-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P35631 and previous config saved to /var/cache/conftool/dbconfig/20221020-143502-ladsgroup.json
  • 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T321312)', diff saved to https://phabricator.wikimedia.org/P35630 and previous config saved to /var/cache/conftool/dbconfig/20221020-143148-ladsgroup.json
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P35629 and previous config saved to /var/cache/conftool/dbconfig/20221020-142618-ladsgroup.json
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T321312)', diff saved to https://phabricator.wikimedia.org/P35628 and previous config saved to /var/cache/conftool/dbconfig/20221020-142331-ladsgroup.json
  • 14:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 14:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P35627 and previous config saved to /var/cache/conftool/dbconfig/20221020-142139-ladsgroup.json
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T318955)', diff saved to https://phabricator.wikimedia.org/P35626 and previous config saved to /var/cache/conftool/dbconfig/20221020-141956-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T318955)', diff saved to https://phabricator.wikimedia.org/P35625 and previous config saved to /var/cache/conftool/dbconfig/20221020-141736-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1003.eqiad.wmnet
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P35624 and previous config saved to /var/cache/conftool/dbconfig/20221020-141112-ladsgroup.json
  • 14:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1001.eqiad.wmnet
  • 14:09 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host etherpad1003.eqiad.wmnet
  • 14:09 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host etherpad1003.eqiad.wmnet
  • 14:08 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host etherpad1003.eqiad.wmnet
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T318950)', diff saved to https://phabricator.wikimedia.org/P35623 and previous config saved to /var/cache/conftool/dbconfig/20221020-140633-ladsgroup.json
  • 14:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T318950)', diff saved to https://phabricator.wikimedia.org/P35622 and previous config saved to /var/cache/conftool/dbconfig/20221020-140423-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 13:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2906
  • 13:57 btullis: building production-images on build2001 - to build spark T318730
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T321312)', diff saved to https://phabricator.wikimedia.org/P35621 and previous config saved to /var/cache/conftool/dbconfig/20221020-135605-ladsgroup.json
  • 13:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2906
  • 13:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8966
  • 13:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8966
  • 13:36 urbanecm@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on viwiki (T314318) (duration: 06m 59s)
  • 13:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 1239
  • 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 20115
  • 13:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 20115
  • 13:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 7843
  • 13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 7843
  • 13:29 urbanecm@deploy1002: urbanecm and arlolra: Backport for Disable wgParserEnableLegacyMediaDOM on viwiki (T314318) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:29 urbanecm@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on viwiki (T314318)
  • 13:28 btullis@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 13:27 btullis@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 13:27 urbanecm@deploy1002: Finished scap: Backport for DataTableCellMentee: Strike-through suppressed mentees (T319185) (duration: 05m 18s)
  • 13:23 btullis@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 13:23 btullis@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 13:22 urbanecm@deploy1002: urbanecm and urbanecm: Backport for DataTableCellMentee: Strike-through suppressed mentees (T319185) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:21 urbanecm@deploy1002: Started scap: Backport for DataTableCellMentee: Strike-through suppressed mentees (T319185)
  • 13:21 btullis@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 13:20 btullis@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 13:19 urbanecm@deploy1002: Finished scap: Backport for zhwiki: Add 20 years logos (T320859), zhwiki: Update 20 years logos in logos.php and IS.php (T320859) (duration: 06m 57s)
  • 13:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12200
  • 13:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12200
  • 13:13 urbanecm@deploy1002: urbanecm and stang: Backport for zhwiki: Add 20 years logos (T320859), zhwiki: Update 20 years logos in logos.php and IS.php (T320859) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:12 urbanecm@deploy1002: Started scap: Backport for zhwiki: Add 20 years logos (T320859), zhwiki: Update 20 years logos in logos.php and IS.php (T320859)
  • 13:12 urbanecm@deploy1002: Finished scap: Backport for Fix broken wordmarks/taglines (T320944 T321124 T321258) (duration: 06m 03s)
  • 13:11 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199524
  • 13:10 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 199524
  • 13:09 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 132203
  • 13:08 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 132203
  • 13:08 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45102
  • 13:06 urbanecm@deploy1002: urbanecm and stang: Backport for Fix broken wordmarks/taglines (T320944 T321124 T321258) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:06 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 45102
  • 13:06 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
  • 13:06 urbanecm@deploy1002: Started scap: Backport for Fix broken wordmarks/taglines (T320944 T321124 T321258)
  • 13:05 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 36692
  • 13:05 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36351
  • 13:04 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 36351
  • 13:04 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32787
  • 13:02 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 32787
  • 13:02 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 29791
  • 13:01 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 29791
  • 13:01 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19151
  • 13:00 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 19151
  • 13:00 ayounsi@cumin2002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 16509
  • 12:58 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 16509
  • 12:58 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16276
  • 12:55 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 16276
  • 12:55 ayounsi@cumin2002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 15169
  • 12:52 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 15169
  • 12:52 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14061
  • 12:50 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 14061
  • 12:50 ayounsi@cumin2002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 13335
  • 12:48 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 13335
  • 12:48 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11404
  • 12:47 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 11404
  • 12:47 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11164
  • 12:46 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 11164
  • 12:46 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10310
  • 12:44 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 10310
  • 12:44 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
  • 12:43 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 8966
  • 12:43 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4637
  • 12:42 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 4637
  • 12:42 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3292
  • 12:41 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 3292
  • 12:41 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2647
  • 12:39 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 2647
  • 12:39 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2152
  • 12:39 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 2152
  • 12:39 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 714
  • 12:37 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 714
  • 12:37 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42
  • 12:35 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 42
  • 12:33 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26803
  • 12:33 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 26803
  • 12:33 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9505
  • 12:31 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 9505
  • 12:31 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9498
  • 12:31 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 9498
  • 12:31 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
  • 12:30 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 8674
  • 12:30 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7713
  • 12:29 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 7713
  • 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7575
  • 12:28 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 7575
  • 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7091
  • 12:28 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 7091
  • 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5650
  • 12:28 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 5650
  • 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4780
  • 12:28 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 4780
  • 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4766
  • 12:27 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 4766
  • 12:27 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4648
  • 12:26 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 4648
  • 12:26 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
  • 12:25 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 3856
  • 12:25 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1280
  • 12:25 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 1280
  • 12:08 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:08 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:07 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:05 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:05 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:05 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:04 claime: Deploying new mw-debug namespace
  • 12:01 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:47 jbond: roll update for libksba
  • 11:16 jbond: upload new pypuppetdb package
  • 09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 63949
  • 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 63949
  • 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58453
  • 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58453
  • 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36012
  • 09:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36012
  • 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 24429
  • 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 24429
  • 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16591
  • 09:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16591
  • 09:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16265
  • 09:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16265
  • 09:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12200
  • 09:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12200
  • 09:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9269
  • 09:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9269
  • 09:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6327
  • 09:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6327
  • 09:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3605
  • 09:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3605
  • 09:49 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 2828
  • 09:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2828
  • 09:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2518
  • 09:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2518
  • 09:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2516
  • 09:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
  • 08:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
  • 08:35 XioNoX: re-enabling Arelion on cr1-drmrs - T321157
  • 08:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.6 refs T320511
  • 07:52 godog: +40 to k8s-mlserve on prometheus codfw
  • 07:49 apergos: UTC morning backport and config training window closed
  • 07:11 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation in 15 Wikipedias (T319175 T319176) (duration: 06m 41s)
  • 07:04 kartik@deploy1002: kartik and kartik: Backport for testwiki: Enable Section Translation in 15 Wikipedias (T319175 T319176) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 07:04 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation in 15 Wikipedias (T319175 T319176)
  • 06:53 kart_: Updated Updated cxserver to 2022-10-18-161640-production (T317224, T319175, T319176)
  • 06:52 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:51 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:48 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:47 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:42 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:41 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply

2022-10-19

  • 23:33 wfan: civicrm upgraded from 477323fe to c96dd3ae
  • 23:24 ejegg: re-enabled refund queue consumer job
  • 22:43 ejegg: updated fundraising CiviCRM from 4b9e981a to 477323fe
  • 22:33 ejegg: disabled fundraising refund QC, started language update job
  • 22:02 ejegg: updated standalone SmashPig IPN listener from f36143f0 to 9295dc2a
  • 22:00 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1003.eqiad.wmnet
  • 21:49 eileen: config revision changed from 903c8ce2 to 4aad20b1
  • 21:37 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1003.eqiad.wmnet on all recursors
  • 21:36 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1003.eqiad.wmnet on all recursors
  • 21:36 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:30 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 21:30 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1003.eqiad.wmnet
  • 21:30 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1002.eqiad.wmnet
  • 21:06 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1002.eqiad.wmnet on all recursors
  • 21:06 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1002.eqiad.wmnet on all recursors
  • 21:06 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:03 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 21:03 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1002.eqiad.wmnet
  • 20:49 mutante: lvs2010, lvs2008 - systemctl restart pybal.service ; ipvsadm -Dt '208.80.153.250:22' ; ipvsadm -Dt '[2620:0:860:ed1a::3:fa]:22' - T296022
  • 20:44 mutante: lvs1020, lvs1018 - systemctl restart pybal.service ; ipvsadm -Dt '208.80.154.250:22' ; ipvsadm -Dt '[2620:0:861:ed1a::3:16]:22' - T296022
  • 20:38 mutante: puppetmaster1001/puppetmaster2001 - delete .git-*.err files in /var/run/confd-template T296022
  • 20:33 TheresNoTime: closing UTC late backport window
  • 20:31 ejegg: updated payments-wiki from 4e1f308b to 9fa4abd7
  • 20:27 mutante: lvs2010 - restarted pybal, removed git-ssh IP with ipvsadm
  • 20:08 samtar@deploy1002: samtar and samtar: Backport for Hooks: Log to statsd when a page is noindex'd (T310974) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:07 samtar@deploy1002: Started scap: Backport for Hooks: Log to statsd when a page is noindex'd (T310974)
  • 19:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T318955)', diff saved to https://phabricator.wikimedia.org/P35617 and previous config saved to /var/cache/conftool/dbconfig/20221019-194546-ladsgroup.json
  • 19:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P35616 and previous config saved to /var/cache/conftool/dbconfig/20221019-193039-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P35615 and previous config saved to /var/cache/conftool/dbconfig/20221019-191533-ladsgroup.json
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T318955)', diff saved to https://phabricator.wikimedia.org/P35614 and previous config saved to /var/cache/conftool/dbconfig/20221019-190026-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T318955)', diff saved to https://phabricator.wikimedia.org/P35613 and previous config saved to /var/cache/conftool/dbconfig/20221019-185813-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T318955)', diff saved to https://phabricator.wikimedia.org/P35612 and previous config saved to /var/cache/conftool/dbconfig/20221019-185752-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P35611 and previous config saved to /var/cache/conftool/dbconfig/20221019-184245-ladsgroup.json
  • 18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P35610 and previous config saved to /var/cache/conftool/dbconfig/20221019-182739-ladsgroup.json
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T318955)', diff saved to https://phabricator.wikimedia.org/P35609 and previous config saved to /var/cache/conftool/dbconfig/20221019-181232-ladsgroup.json
  • 18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T318955)', diff saved to https://phabricator.wikimedia.org/P35608 and previous config saved to /var/cache/conftool/dbconfig/20221019-181019-ladsgroup.json
  • 18:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 18:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 18:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T318955)', diff saved to https://phabricator.wikimedia.org/P35607 and previous config saved to /var/cache/conftool/dbconfig/20221019-180958-ladsgroup.json
  • 18:07 mutante: aphlict1001 - manually gzip large logfile, logrotate did not run for a day - T321209
  • 18:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P35606 and previous config saved to /var/cache/conftool/dbconfig/20221019-175451-ladsgroup.json
  • 17:54 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P35605 and previous config saved to /var/cache/conftool/dbconfig/20221019-173945-ladsgroup.json
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T318955)', diff saved to https://phabricator.wikimedia.org/P35604 and previous config saved to /var/cache/conftool/dbconfig/20221019-172438-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T318955)', diff saved to https://phabricator.wikimedia.org/P35603 and previous config saved to /var/cache/conftool/dbconfig/20221019-170002-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 16:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 16:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35602 and previous config saved to /var/cache/conftool/dbconfig/20221019-163534-ladsgroup.json
  • 16:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-db1002.eqiad.wmnet with OS bullseye
  • 16:27 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 16:27 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 16:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage
  • 16:22 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 16:21 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 16:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P35601 and previous config saved to /var/cache/conftool/dbconfig/20221019-162028-ladsgroup.json
  • 16:19 mutante: wikitech - added herron to 'content administrators'
  • 16:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1002.eqiad.wmnet with reason: host reimage
  • 16:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1002.eqiad.wmnet with reason: host reimage
  • 16:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 16:08 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P35600 and previous config saved to /var/cache/conftool/dbconfig/20221019-160521-ladsgroup.json
  • 16:01 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-db1002.eqiad.wmnet with OS bullseye
  • 15:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 15:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 15:51 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1001.eqiad.wmnet
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35599 and previous config saved to /var/cache/conftool/dbconfig/20221019-155015-ladsgroup.json
  • 15:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 15:28 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1001.eqiad.wmnet on all recursors
  • 15:28 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1001.eqiad.wmnet on all recursors
  • 15:28 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35598 and previous config saved to /var/cache/conftool/dbconfig/20221019-152318-ladsgroup.json
  • 15:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T318955)', diff saved to https://phabricator.wikimedia.org/P35597 and previous config saved to /var/cache/conftool/dbconfig/20221019-152256-ladsgroup.json
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T318950)', diff saved to https://phabricator.wikimedia.org/P35596 and previous config saved to /var/cache/conftool/dbconfig/20221019-151204-ladsgroup.json
  • 15:08 bd808: Forcing puppet runs on cloudweb100[34] to deploy a new version of Striker
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P35595 and previous config saved to /var/cache/conftool/dbconfig/20221019-150749-ladsgroup.json
  • 15:00 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 15:00 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 15:00 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 14:59 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:59 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 14:57 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 14:57 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1001.eqiad.wmnet
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P35594 and previous config saved to /var/cache/conftool/dbconfig/20221019-145658-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P35593 and previous config saved to /var/cache/conftool/dbconfig/20221019-145242-ladsgroup.json
  • 14:50 jnuche@deploy1002: Installation of scap version "4.27.1" completed for 1 hosts
  • 14:50 jnuche@deploy1002: Installing scap version "4.27.1" for 1 hosts
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P35592 and previous config saved to /var/cache/conftool/dbconfig/20221019-144150-ladsgroup.json
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T318955)', diff saved to https://phabricator.wikimedia.org/P35591 and previous config saved to /var/cache/conftool/dbconfig/20221019-143736-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T318955)', diff saved to https://phabricator.wikimedia.org/P35590 and previous config saved to /var/cache/conftool/dbconfig/20221019-143523-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 14:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35589 and previous config saved to /var/cache/conftool/dbconfig/20221019-143501-ladsgroup.json
  • 14:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2516
  • 14:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2516
  • 14:34 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 1239
  • 14:34 hashar@deploy1002: Finished scap: Backport for Downgrade lcobucci/jwt (4.2.1 => 4.1.5) (T321160) (duration: 04m 25s)
  • 14:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1239
  • 14:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 701
  • 14:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 701
  • 14:29 hashar@deploy1002: hashar and hashar: Backport for Downgrade lcobucci/jwt (4.2.1 => 4.1.5) (T321160) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 14:29 hashar@deploy1002: Started scap: Backport for Downgrade lcobucci/jwt (4.2.1 => 4.1.5) (T321160)
  • 14:29 hashar@deploy1002: backport aborted: (duration: 05m 09s)
  • 14:29 hashar@deploy1002: sync-world aborted: Backport for Downgrade lcobucci/jwt (4.2.1 => 4.1.5) (T321160) (duration: 03m 27s)
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T318950)', diff saved to https://phabricator.wikimedia.org/P35588 and previous config saved to /var/cache/conftool/dbconfig/20221019-142643-ladsgroup.json
  • 14:25 hashar@deploy1002: hashar and hashar: Backport for Downgrade lcobucci/jwt (4.2.1 => 4.1.5) (T321160) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:25 hashar@deploy1002: Started scap: Backport for Downgrade lcobucci/jwt (4.2.1 => 4.1.5) (T321160)
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T318950)', diff saved to https://phabricator.wikimedia.org/P35587 and previous config saved to /var/cache/conftool/dbconfig/20221019-142433-ladsgroup.json
  • 14:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T318950)', diff saved to https://phabricator.wikimedia.org/P35586 and previous config saved to /var/cache/conftool/dbconfig/20221019-142411-ladsgroup.json
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P35585 and previous config saved to /var/cache/conftool/dbconfig/20221019-141955-ladsgroup.json
  • 14:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-db1001.eqiad.wmnet with OS bullseye
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P35584 and previous config saved to /var/cache/conftool/dbconfig/20221019-140905-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P35583 and previous config saved to /var/cache/conftool/dbconfig/20221019-140449-ladsgroup.json
  • 14:02 matthiasmullie: UTC afternoon backports done
  • 14:00 mlitn@deploy1002: Finished scap: Backport for Add SearchVue to extension-list and config var (T310367) (duration: 26m 03s)
  • 13:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1001.eqiad.wmnet with reason: host reimage
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P35582 and previous config saved to /var/cache/conftool/dbconfig/20221019-135358-ladsgroup.json
  • 13:52 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1001.eqiad.wmnet with reason: host reimage
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35581 and previous config saved to /var/cache/conftool/dbconfig/20221019-134942-ladsgroup.json
  • 13:43 mlitn@deploy1002: mlitn and mlitn: Backport for Add SearchVue to extension-list and config var (T310367) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:40 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-db1001.eqiad.wmnet with OS bullseye
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T318950)', diff saved to https://phabricator.wikimedia.org/P35580 and previous config saved to /var/cache/conftool/dbconfig/20221019-133852-ladsgroup.json
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T318950)', diff saved to https://phabricator.wikimedia.org/P35579 and previous config saved to /var/cache/conftool/dbconfig/20221019-133642-ladsgroup.json
  • 13:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 13:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35578 and previous config saved to /var/cache/conftool/dbconfig/20221019-133620-ladsgroup.json
  • 13:34 mlitn@deploy1002: Started scap: Backport for Add SearchVue to extension-list and config var (T310367)
  • 13:32 mlitn@deploy1002: Finished scap: Backport for Add default value for search-thumbnail-extra-namespaces (T320337) (duration: 04m 41s)
  • 13:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8220
  • 13:28 hashar@deploy1002: backport Cancelled
  • 13:28 mlitn@deploy1002: mlitn and mlitn: Backport for Add default value for search-thumbnail-extra-namespaces (T320337) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:27 mlitn@deploy1002: Started scap: Backport for Add default value for search-thumbnail-extra-namespaces (T320337)
  • 13:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8220
  • 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35577 and previous config saved to /var/cache/conftool/dbconfig/20221019-132527-ladsgroup.json
  • 13:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T318955)', diff saved to https://phabricator.wikimedia.org/P35576 and previous config saved to /var/cache/conftool/dbconfig/20221019-132505-ladsgroup.json
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P35575 and previous config saved to /var/cache/conftool/dbconfig/20221019-132114-ladsgroup.json
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P35574 and previous config saved to /var/cache/conftool/dbconfig/20221019-130959-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P35573 and previous config saved to /var/cache/conftool/dbconfig/20221019-130607-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T314041)', diff saved to https://phabricator.wikimedia.org/P35572 and previous config saved to /var/cache/conftool/dbconfig/20221019-130459-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P35571 and previous config saved to /var/cache/conftool/dbconfig/20221019-125452-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35570 and previous config saved to /var/cache/conftool/dbconfig/20221019-125101-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P35569 and previous config saved to /var/cache/conftool/dbconfig/20221019-124952-ladsgroup.json
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T318955)', diff saved to https://phabricator.wikimedia.org/P35568 and previous config saved to /var/cache/conftool/dbconfig/20221019-123946-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P35567 and previous config saved to /var/cache/conftool/dbconfig/20221019-123446-ladsgroup.json
  • 12:27 XioNoX: remove cr4-ulsfo SV8 RS sessions
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T314041)', diff saved to https://phabricator.wikimedia.org/P35566 and previous config saved to /var/cache/conftool/dbconfig/20221019-121939-ladsgroup.json
  • 12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T318955)', diff saved to https://phabricator.wikimedia.org/P35565 and previous config saved to /var/cache/conftool/dbconfig/20221019-121506-ladsgroup.json
  • 12:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 12:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35564 and previous config saved to /var/cache/conftool/dbconfig/20221019-121444-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P35563 and previous config saved to /var/cache/conftool/dbconfig/20221019-115938-ladsgroup.json
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35562 and previous config saved to /var/cache/conftool/dbconfig/20221019-115443-ladsgroup.json
  • 11:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35561 and previous config saved to /var/cache/conftool/dbconfig/20221019-115421-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P35560 and previous config saved to /var/cache/conftool/dbconfig/20221019-114431-ladsgroup.json
  • 11:43 jnuche@deploy1002: Installation of scap version "4.27.1" completed for 552 hosts
  • 11:43 jnuche@deploy1002: Installing scap version "4.27.1" for 552 hosts
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P35559 and previous config saved to /var/cache/conftool/dbconfig/20221019-113915-ladsgroup.json
  • 11:30 jnuche@deploy1002: Installing scap version "4.27.1" for 553 hosts
  • 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35558 and previous config saved to /var/cache/conftool/dbconfig/20221019-112925-ladsgroup.json
  • 11:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P35557 and previous config saved to /var/cache/conftool/dbconfig/20221019-112409-ladsgroup.json
  • 11:23 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:22 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:21 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:14 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:13 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:12 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:11 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:10 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:10 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35556 and previous config saved to /var/cache/conftool/dbconfig/20221019-110902-ladsgroup.json
  • 11:07 Emperor: upload wmf-beamer-style 0.2 to apt
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T314041)', diff saved to https://phabricator.wikimedia.org/P35555 and previous config saved to /var/cache/conftool/dbconfig/20221019-110635-ladsgroup.json
  • 11:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35554 and previous config saved to /var/cache/conftool/dbconfig/20221019-110552-ladsgroup.json
  • 11:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35553 and previous config saved to /var/cache/conftool/dbconfig/20221019-110308-ladsgroup.json
  • 11:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 10:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:18 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:17 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:17 claime: Deploying mediawiki helm chart v0.2.4 on k8s-experimental mwdebug - T321042
  • 10:08 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:58 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:23 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 09:21 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 09:21 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 09:18 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 09:14 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:13 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 09:01 XioNoX: remove DHCP server and access zone on mr1-eqiad - T320962
  • 08:15 hashar@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.6 refs T320511 (duration: 03m 37s)
  • 08:12 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.6 refs T320511
  • 07:45 urbanecm@deploy1002: Finished scap: Backport for [growth] Turn mentorship off by default (T321056) (duration: 05m 14s)
  • 07:41 urbanecm@deploy1002: urbanecm and urbanecm: Backport for [growth] Turn mentorship off by default (T321056) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:40 urbanecm@deploy1002: Started scap: Backport for [growth] Turn mentorship off by default (T321056)
  • 07:39 urbanecm@deploy1002: Finished scap: Backport for Remove GEHomepageImpactModuleEnabled (duration: 04m 27s)
  • 07:35 urbanecm@deploy1002: urbanecm and kharlan: Backport for Remove GEHomepageImpactModuleEnabled synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:34 urbanecm@deploy1002: Started scap: Backport for Remove GEHomepageImpactModuleEnabled
  • 07:09 kartik@deploy1002: Finished scap: Backport for Enable specialcontribute campaign (T319306) (duration: 06m 11s)
  • 07:03 kartik@deploy1002: kartik and kartik: Backport for Enable specialcontribute campaign (T319306) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:03 kartik@deploy1002: Started scap: Backport for Enable specialcontribute campaign (T319306)
  • 06:40 XioNoX: enabled graceful-shutdown on drmrs Arelion BGP
  • 01:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1011.eqiad.wmnet with OS bullseye
  • 01:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 01:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 01:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1011.eqiad.wmnet with OS bullseye
  • 01:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1009.eqiad.wmnet with OS bullseye
  • 00:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 00:46 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 00:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1009.eqiad.wmnet with OS bullseye
  • 00:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1007.eqiad.wmnet with OS bullseye
  • 00:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 00:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage

2022-10-18

  • 23:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
  • 22:18 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:17 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 21:11 mutante: otrs1001 - emptied exim paniclog
  • 21:03 TheresNoTime: closing UTC late backport window
  • 21:02 samtar@deploy1002: Finished scap: Backport for arwiki: Fix editeditorprotected restriction level (T321111) (duration: 07m 08s)
  • 20:56 samtar@deploy1002: samtar and stang: Backport for arwiki: Fix editeditorprotected restriction level (T321111) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:55 samtar@deploy1002: Started scap: Backport for arwiki: Fix editeditorprotected restriction level (T321111)
  • 20:54 samtar@deploy1002: Finished scap: Backport for tumwiki: Update project logo (T320473) (duration: 05m 19s)
  • 20:50 mutante: phabricator - on new machines, find / -uid 497 -exec chown phd {}\; to fix privileges. (and then the same for -gid 498) The user phd used to be 497:498 (pid:gid) on old hosts but has been replaced with proper systemd system user using 920:920 T313360
  • 20:49 samtar@deploy1002: samtar and stang: Backport for tumwiki: Update project logo (T320473) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:48 samtar@deploy1002: Started scap: Backport for tumwiki: Update project logo (T320473)
  • 20:46 samtar@deploy1002: Finished scap: Backport for i18n: Fix typo and simplify preference description (T321038) (duration: 16m 31s)
  • 20:34 samtar@deploy1002: samtar and jdlrobson: Backport for i18n: Fix typo and simplify preference description (T321038) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:29 samtar@deploy1002: Started scap: Backport for i18n: Fix typo and simplify preference description (T321038)
  • 20:14 samtar@deploy1002: Finished scap: Backport for Move icons to dedicated folder, Standardize wordmark names (duration: 11m 07s)
  • 20:03 samtar@deploy1002: samtar and jdlrobson: Backport for Move icons to dedicated folder, Standardize wordmark names synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:02 samtar@deploy1002: Started scap: Backport for Move icons to dedicated folder, Standardize wordmark names
  • 18:58 mutante: rsyncing phab dump file - pull from phab1000 to all other hosts T313360
  • 16:44 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:43 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 16:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 5650
  • 16:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 5650
  • 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P35549 and previous config saved to /var/cache/conftool/dbconfig/20221018-163219-ladsgroup.json
  • 16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P35548 and previous config saved to /var/cache/conftool/dbconfig/20221018-161714-ladsgroup.json
  • 16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P35547 and previous config saved to /var/cache/conftool/dbconfig/20221018-160209-ladsgroup.json
  • 15:55 hashar: Stopping Gerrit due to a mistake in deploying plugin (forgot to reinstall the builtin plugins)
  • 15:51 hashar@deploy1002: Finished deploy [gerrit/gerrit@da5de16]: gerrit1001: remove motd plugin and its config # T321075 (duration: 00m 08s)
  • 15:51 hashar@deploy1002: Started deploy [gerrit/gerrit@da5de16]: gerrit1001: remove motd plugin and its config # T321075
  • 15:50 hashar@deploy1002: Finished deploy [gerrit/gerrit@da5de16]: gerrit2002: remove motd plugin and its config # T321075 (duration: 00m 10s)
  • 15:49 hashar@deploy1002: Started deploy [gerrit/gerrit@da5de16]: gerrit2002: remove motd plugin and its config # T321075
  • 15:15 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:53 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:59 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:57 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:53 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: Enable AddLink backend for be_x_oldwiki (T304549) (duration: 04m 56s)
  • 13:48 kharlan@deploy1002: kharlan and kharlan: Backport for GrowthExperiments: Enable AddLink backend for be_x_oldwiki (T304549) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:48 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: Enable AddLink backend for be_x_oldwiki (T304549)
  • 13:38 kostajh: UTC afternoon backport+config window done \o/
  • 13:35 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: Enable AddLink backend for bat_smg (T304549) (duration: 04m 50s)
  • 13:30 kharlan@deploy1002: kharlan and kharlan: Backport for GrowthExperiments: Enable AddLink backend for bat_smg (T304549) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:30 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: Enable AddLink backend for bat_smg (T304549)
  • 13:07 kharlan@deploy1002: kharlan and matmarex: Backport for Add "Clear Affordances" to DiscussionTools beta feature on most wikis (T320683) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:06 kharlan@deploy1002: Started scap: Backport for Add "Clear Affordances" to DiscussionTools beta feature on most wikis (T320683)
  • 11:57 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 11:55 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 11:55 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 11:52 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 11:51 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 11:50 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 11:37 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.6 refs T320511
  • 11:08 claime: Nutcrackerd disabled on k8s-experimental mwdebug - T321042
  • 11:06 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:06 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:05 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:59 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:57 claime: Disabling nutcracker on k8s-experimental mwdebug - T321042
  • 10:17 urbanecm@deploy1002: Finished scap: Backport for Revert "Add multiple integration tests for Hooks.php" (T321041) (duration: 06m 24s)
  • 10:11 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Revert "Add multiple integration tests for Hooks.php" (T321041) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 10:11 urbanecm@deploy1002: Started scap: Backport for Revert "Add multiple integration tests for Hooks.php" (T321041)
  • 08:50 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.40.0-wmf.6" # T320511
  • 08:35 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.6 refs T320511
  • 08:28 hashar@deploy1002: Pruned MediaWiki: 1.40.0-wmf.4 (duration: 02m 11s)
  • 08:26 hashar: scap clean auto # T320511
  • 08:23 hashar@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.6 refs T320511 (duration: 36m 04s)
  • 07:47 hashar@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.6 refs T320511
  • 07:40 hashar: `scap stage-train 1.40.0-wmf.6` # T320511
  • 07:37 hashar: Scratched /srv/mediawiki-staging/php-1.40.0-wmf.6 entirely and doing `scap prep` instead
  • 07:35 hashar: Rebased /srv/mediawiki-staging/php-1.40.0-wmf.6 for de15f77 ( T321021 ) and 0f8be84 ( T319447 )

2022-10-17

  • 23:16 bblack@puppetmaster2001: conftool action : set/pooled=yes; selector: service=git-ssh
  • 23:16 bblack@puppetmaster2001: conftool action : set/weight=100; selector: service=git-ssh
  • 22:55 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for otrs1001.eqiad.wmnet
  • 22:55 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for otrs1001.eqiad.wmnet
  • 22:41 mutante: otrs1001 - systemctl reset-failed (clear alert for ifup@ens13.service)
  • 22:36 bblack: ganeti1027 - gnt-instance reboot otrs1001.eqiad.wmnet
  • 22:36 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on otrs1001.eqiad.wmnet with reason: reboot
  • 22:35 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on otrs1001.eqiad.wmnet with reason: reboot
  • 22:34 bblack: ganeti1027: executing gnt-instance modify -B maxmem=8192 -B memory=8192 otrs1001.eqiad.wmnet
  • 21:33 mutante: otrs1001 - after local exim queue has been drained, set MaxThreads for clamav to 12 again, restarted clamav
  • 21:33 mstyles@deploy1002: Synchronized php-1.40.0-wmf.5/extensions/CheckUser/src/Api/ApiQueryCheckUser.php: (no justification provided) (duration: 03m 37s)
  • 21:20 mutante: otrs1001 - re-enabling puppet, running puppet
  • 21:09 mutante: otrs1001 - changing MaxThreads from 6 to 1 in /etc/clamav/clamd.conf, starting clamav
  • 21:02 mutante: otrs1001 - temp disabled puppet, changing MaxThreads from 12 to 6 in /etc/clamav/clamd.conf
  • 20:40 mutante: mx1001 - exim4 -qf - trying to re-deliver mail in queue for info@ OTRS queue
  • 20:18 urbanecm@deploy1002: Finished scap: 6762292a4: e320d48c8: 6762292a4: DicsussionTools/WikimediaEvents backports (T315688, T315689, T320938) (duration: 04m 35s)
  • 20:13 urbanecm@deploy1002: Started scap: 6762292a4: e320d48c8: 6762292a4: DicsussionTools/WikimediaEvents backports (T315688, T315689, T320938)
  • 19:58 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=phab1001-vcs.eqiad.wmnet
  • 19:57 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 19:20 mutante: otrs1001 - started failed clamav-daemon service
  • 18:57 mutante: puppetmaster2001 - deleted confd-template .err files
  • 18:56 mutante: puppetmaster1001 - deleted confd-template .err files
  • 18:49 dzahn@cumin2002: conftool action : set/pooled=inactive; selector: name=phab1001-vcs.eqiad.wmnet
  • 18:48 dzahn@cumin2002: conftool action : set/pooled=inactive; selector: name=phab2001-vcs.codfw.wmnet
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T318955)', diff saved to https://phabricator.wikimedia.org/P35544 and previous config saved to /var/cache/conftool/dbconfig/20221017-181217-ladsgroup.json
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P35543 and previous config saved to /var/cache/conftool/dbconfig/20221017-175711-ladsgroup.json
  • 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P35542 and previous config saved to /var/cache/conftool/dbconfig/20221017-174204-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T318955)', diff saved to https://phabricator.wikimedia.org/P35541 and previous config saved to /var/cache/conftool/dbconfig/20221017-172658-ladsgroup.json
  • 17:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32787
  • 17:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32787
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T318955)', diff saved to https://phabricator.wikimedia.org/P35540 and previous config saved to /var/cache/conftool/dbconfig/20221017-171229-ladsgroup.json
  • 17:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 17:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T318955)', diff saved to https://phabricator.wikimedia.org/P35539 and previous config saved to /var/cache/conftool/dbconfig/20221017-171156-ladsgroup.json
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P35538 and previous config saved to /var/cache/conftool/dbconfig/20221017-165649-ladsgroup.json
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P35537 and previous config saved to /var/cache/conftool/dbconfig/20221017-164143-ladsgroup.json
  • 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T318955)', diff saved to https://phabricator.wikimedia.org/P35536 and previous config saved to /var/cache/conftool/dbconfig/20221017-162636-ladsgroup.json
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T318955)', diff saved to https://phabricator.wikimedia.org/P35535 and previous config saved to /var/cache/conftool/dbconfig/20221017-161843-ladsgroup.json
  • 16:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 16:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 16:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 16:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T318955)', diff saved to https://phabricator.wikimedia.org/P35534 and previous config saved to /var/cache/conftool/dbconfig/20221017-161806-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T318950)', diff saved to https://phabricator.wikimedia.org/P35533 and previous config saved to /var/cache/conftool/dbconfig/20221017-161330-ladsgroup.json
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P35532 and previous config saved to /var/cache/conftool/dbconfig/20221017-160259-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P35531 and previous config saved to /var/cache/conftool/dbconfig/20221017-155823-ladsgroup.json
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P35530 and previous config saved to /var/cache/conftool/dbconfig/20221017-154753-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P35529 and previous config saved to /var/cache/conftool/dbconfig/20221017-154317-ladsgroup.json
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T318955)', diff saved to https://phabricator.wikimedia.org/P35528 and previous config saved to /var/cache/conftool/dbconfig/20221017-153246-ladsgroup.json
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T318950)', diff saved to https://phabricator.wikimedia.org/P35527 and previous config saved to /var/cache/conftool/dbconfig/20221017-152810-ladsgroup.json
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T318950)', diff saved to https://phabricator.wikimedia.org/P35526 and previous config saved to /var/cache/conftool/dbconfig/20221017-152552-ladsgroup.json
  • 15:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35525 and previous config saved to /var/cache/conftool/dbconfig/20221017-152531-ladsgroup.json
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T318955)', diff saved to https://phabricator.wikimedia.org/P35524 and previous config saved to /var/cache/conftool/dbconfig/20221017-151808-ladsgroup.json
  • 15:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P35523 and previous config saved to /var/cache/conftool/dbconfig/20221017-151024-ladsgroup.json
  • 15:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 15:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T318955)', diff saved to https://phabricator.wikimedia.org/P35522 and previous config saved to /var/cache/conftool/dbconfig/20221017-150440-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P35521 and previous config saved to /var/cache/conftool/dbconfig/20221017-145517-ladsgroup.json
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35518 and previous config saved to /var/cache/conftool/dbconfig/20221017-143753-ladsgroup.json
  • 14:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35517 and previous config saved to /var/cache/conftool/dbconfig/20221017-143731-ladsgroup.json
  • 14:37 papaul: on going maintenance on cr1-eqiad
  • 14:36 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 14:36 claime: Depooling eventgate-main in eqiad - T303543
  • 14:35 claime: Repooling eventgate-main in codfw - T303543
  • 14:34 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main,name=codfw
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P35516 and previous config saved to /var/cache/conftool/dbconfig/20221017-143427-ladsgroup.json
  • 14:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 14:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 14:29 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=codfw
  • 14:29 claime: Depooling eventgate-main in codfw - T303543
  • 14:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 14:27 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 14:25 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:24 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:23 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics-external,name=eqiad
  • 14:23 claime: Repooling eventgate-analytics-external in eqiad - T303543
  • 14:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P35515 and previous config saved to /var/cache/conftool/dbconfig/20221017-142224-ladsgroup.json
  • 14:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T318955)', diff saved to https://phabricator.wikimedia.org/P35514 and previous config saved to /var/cache/conftool/dbconfig/20221017-141921-ladsgroup.json
  • 14:16 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-analytics-external,name=eqiad
  • 14:16 claime: Depooling eventgate-analytics-external in eqiad - T303543
  • 14:14 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics-external,name=codfw
  • 14:14 claime: Repooling eventgate-analytics-external in codfw - T303543
  • 14:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:09 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-analytics-external,name=codfw
  • 14:09 claime: Depooling eventgate-analytics-external in codfw - T303543
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P35513 and previous config saved to /var/cache/conftool/dbconfig/20221017-140717-ladsgroup.json
  • 14:05 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=eqiad
  • 14:05 claime: Repooling eventgate-analytics in eqiad - T303543
  • 14:04 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T318955)', diff saved to https://phabricator.wikimedia.org/P35512 and previous config saved to /var/cache/conftool/dbconfig/20221017-140452-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T318955)', diff saved to https://phabricator.wikimedia.org/P35511 and previous config saved to /var/cache/conftool/dbconfig/20221017-140430-ladsgroup.json
  • 14:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 14:00 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-analytics,name=eqiad
  • 14:00 claime: Depooling eventgate-analytics in eqiad - T303543
  • 13:57 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=codfw
  • 13:56 claime: Repooling eventgate-analytics in codfw - T303543
  • 13:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 13:56 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35510 and previous config saved to /var/cache/conftool/dbconfig/20221017-135211-ladsgroup.json
  • 13:51 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-analytics,name=codfw
  • 13:50 claime: Depooling eventgate-analytics in codfw - T303543
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35509 and previous config saved to /var/cache/conftool/dbconfig/20221017-134953-ladsgroup.json
  • 13:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T318950)', diff saved to https://phabricator.wikimedia.org/P35508 and previous config saved to /var/cache/conftool/dbconfig/20221017-134931-ladsgroup.json
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P35507 and previous config saved to /var/cache/conftool/dbconfig/20221017-134924-ladsgroup.json
  • 13:49 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:48 claime: Repooled eventgate-logging-external in equiad - T303543
  • 13:47 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 13:47 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:46 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 13:29 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: enable the vue version of mentee overview in all wikis (T300532) (duration: 06m 24s)
  • 13:28 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-logging-external,name=codfw
  • 13:27 claime: Depooling eventgate-logging-external in codfw - T303543
  • 13:23 urbanecm@deploy1002: urbanecm and sgimeno: Backport for GrowthExperiments: enable the vue version of mentee overview in all wikis (T300532) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:23 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: enable the vue version of mentee overview in all wikis (T300532)
  • 13:22 urbanecm@deploy1002: Finished scap: Backport for Enable Sandbox Extension at Bengali Wikiquote (T320903) (duration: 04m 54s)
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P35504 and previous config saved to /var/cache/conftool/dbconfig/20221017-131918-ladsgroup.json
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T318955)', diff saved to https://phabricator.wikimedia.org/P35503 and previous config saved to /var/cache/conftool/dbconfig/20221017-131911-ladsgroup.json
  • 13:18 urbanecm@deploy1002: urbanecm and mdsshakil: Backport for Enable Sandbox Extension at Bengali Wikiquote (T320903) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:17 urbanecm@deploy1002: Started scap: Backport for Enable Sandbox Extension at Bengali Wikiquote (T320903)
  • 13:17 urbanecm@deploy1002: Finished scap: 52821e09c: 35000a4b: dewiktionary: Update logo (T320891) (duration: 04m 03s)
  • 13:13 urbanecm@deploy1002: Started scap: 52821e09c: 35000a4b: dewiktionary: Update logo (T320891)
  • 13:10 urbanecm@deploy1002: Finished scap: b434c5a84: 9d10a60ea: Wordmark changes (T320944, T320840) (duration: 04m 32s)
  • 13:06 root@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 13:06 root@cumin1001: START - Cookbook sre.discovery.service-route
  • 13:05 urbanecm@deploy1002: Started scap: b434c5a84: 9d10a60ea: Wordmark changes (T320944, T320840)
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T318955)', diff saved to https://phabricator.wikimedia.org/P35502 and previous config saved to /var/cache/conftool/dbconfig/20221017-130440-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:04 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T318950)', diff saved to https://phabricator.wikimedia.org/P35501 and previous config saved to /var/cache/conftool/dbconfig/20221017-130412-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T318950)', diff saved to https://phabricator.wikimedia.org/P35500 and previous config saved to /var/cache/conftool/dbconfig/20221017-130154-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 12:36 XioNoX: re-enable BGP between cr1 and lsw1-e1 - T320566
  • 11:41 topranks: moving port et-2/0/49 out of ae1 bundle asw2-d-eqiad
  • 11:38 topranks: moving et-1/1/3 out of ae bundle on cr1-eqiad
  • 11:11 XioNoX: cr1-eqiad> request chassis fpc slot 1 offline - T320566
  • 11:09 topranks: shutting down BGP sessions from cr1-eqiad to lsw1-e1-eqiad in advance of linecard reboot
  • 10:27 XioNoX: disable cr1-eqiad:ae4 for recabling and troubleshooting - T320566
  • 09:48 jynus: powercycle db1202
  • 09:24 XioNoX: de-pref eqiad-drmrs GTT VPLS (latency between eqiad and drmrs will increase) - T320566
  • 09:09 XioNoX: de-pref cr1-eqiad wavelength transports (to codfw and drmrs) - T320566
  • 08:55 XioNoX: Move all eqiad VRRP mastership to cr2 - T320566
  • 08:03 jynus: restarting several bacula-related daemons to update its configuration
  • 07:57 urbanecm@deploy1002: Finished scap: Backport for Mentee filters: always use mw.user.options values to initialise the mentees store (T320728) (duration: 07m 22s)
  • 07:50 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Mentee filters: always use mw.user.options values to initialise the mentees store (T320728) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:50 urbanecm@deploy1002: Started scap: Backport for Mentee filters: always use mw.user.options values to initialise the mentees store (T320728)
  • 07:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:38 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:34 Emperor: set thanos ring replicas to 3.60 T311690
  • 07:01 elukey: powercycle parse1002 - serial console's tty not responding, OEM events registered in `racadm getsel`
  • 06:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 16276
  • 06:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 16276

2022-10-15

  • 23:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1131 T320879', diff saved to https://phabricator.wikimedia.org/P35497 and previous config saved to /var/cache/conftool/dbconfig/20221015-232716-ladsgroup.json
  • 23:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T320879', diff saved to https://phabricator.wikimedia.org/P35495 and previous config saved to /var/cache/conftool/dbconfig/20221015-232320-ladsgroup.json
  • 23:22 Amir1: Starting s6 eqiad failover from db1131 to db1173 - T320879
  • 23:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s6 T320879
  • 23:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s6 T320879

2022-10-14

  • 22:56 mutante: pcc-worker1003.puppet-diffs.eqiad1.wikimedia.cloud - out of disk space again - deleted 3.5GB job "1460" to unblock puppet compiling
  • 20:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 20:48 jhathaway@cumin1001: START - Cookbook sre.network.cf
  • 19:57 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 19:57 oblivian@cumin1001: START - Cookbook sre.network.cf
  • 19:55 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 19:55 oblivian@cumin1001: START - Cookbook sre.network.cf
  • 18:08 mutante: contint* - temp disabled puppet, deploying gerrit:834400, docker version upgrade on CI servers (T318382)
  • 15:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:46 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:45 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:40 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:48 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:43 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:42 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:32 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:31 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:29 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:28 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:27 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:27 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:27 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:27 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:21 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:19 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:17 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:09 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:09 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:06 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:59 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:57 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:57 jclark@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:55 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1202 - Degraded RAID (T320786)', diff saved to https://phabricator.wikimedia.org/P35487 and previous config saved to /var/cache/conftool/dbconfig/20221014-120155-ladsgroup.json
  • 10:22 godog: upgrade grafana to 8.5.14
  • 10:15 dcausse: Deployed patch for T320785
  • 08:47 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts elastic[2028-2030].codfw.wmnet
  • 08:47 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:46 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 08:31 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic[2028-2030].codfw.wmnet
  • 08:29 moritzm: installing git security updates on buster
  • 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1008.eqiad.wmnet
  • 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:15 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts elastic[2025-2027].codfw.wmnet
  • 08:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:14 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 08:12 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:07 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1008.eqiad.wmnet
  • 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1007.eqiad.wmnet
  • 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1007.eqiad.wmnet
  • 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1006.eqiad.wmnet
  • 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:54 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1006.eqiad.wmnet
  • 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1005.eqiad.wmnet
  • 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:41 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:37 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic[2025-2027].codfw.wmnet
  • 07:36 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1005.eqiad.wmnet
  • 07:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1008.eqiad.wmnet with reason: Remove from cluster for eventual decom
  • 07:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1008.eqiad.wmnet with reason: Remove from cluster for eventual decom
  • 06:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 7843
  • 06:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 7843
  • 06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Not working well
  • 06:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Not working well
  • 03:42 oblivian@cumin1001: dbctl commit (dc=all): 'depool db1143, lagging', diff saved to https://phabricator.wikimedia.org/P35485 and previous config saved to /var/cache/conftool/dbconfig/20221014-034223-oblivian.json
  • 02:24 tstarling@deploy1002: Synchronized wmf-config: clean up deleted file (duration: 03m 46s)
  • 02:11 ryankemper: T300943 Decom of elastic20[25-36] complete. Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/842547. This is done
  • 02:11 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[2034,2036].codfw.wmnet
  • 02:10 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:05 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 02:01 ryankemper: T300943 Final batch of decom'ing `elastic20[25-36]` => already decommissioned rows A/B/C; starting final row D (corresponding to `203[4,6]`)
  • 01:59 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic[2034,2036].codfw.wmnet
  • 01:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[2031-2033].codfw.wmnet
  • 01:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:40 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 01:32 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic[2031-2033].codfw.wmnet
  • 01:26 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 03m 36s)
  • 01:20 tstarling@deploy1002: Synchronized wmf-config/UcfirstOverrides.php: for T292552, should have no effect at this stage (duration: 03m 46s)
  • 01:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[2028-2030]
  • 01:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:13 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 00:50 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic[2028-2030]
  • 00:49 ryankemper: [Elastic] `ryankemper@elastic1083:~$ sudo systemctl restart elasticsearch_7*` to clear `CirrusSearchJVMGCYoungPoolInsufficient`
  • 00:48 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[2025-2027]
  • 00:48 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:45 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=elastic2026.codfw.wmnet
  • 00:44 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 00:43 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=elastic2025*
  • 00:36 ryankemper: T300943 Decom'ing elastic20[25-36]. Decommissioning in batches by row, starting with row A (2025-27)
  • 00:29 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic[2025-2027]

2022-10-13

  • 22:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:35 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b5b51fa]: 0.3.117 and adding eu knowledge graph to whitelist (duration: 12m 02s)
  • 20:33 TheresNoTime: close UTC late backport window
  • 20:31 samtar@deploy1002: Finished scap: Backport for testcommonswiki: Add editcontentmodel to interface-admin (T320752) (duration: 05m 24s)
  • 20:26 samtar@deploy1002: samtar and lucaswerkmeister: Backport for testcommonswiki: Add editcontentmodel to interface-admin (T320752) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:25 samtar@deploy1002: Started scap: Backport for testcommonswiki: Add editcontentmodel to interface-admin (T320752)
  • 20:23 samtar@deploy1002: Finished scap: Backport for commonswiki: add editcontentmodel right to interface-admin group (T320752) (duration: 05m 03s)
  • 20:22 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b5b51fa]: 0.3.117 and adding eu knowledge graph to whitelist
  • 20:19 samtar@deploy1002: samtar and nn1l2: Backport for commonswiki: add editcontentmodel right to interface-admin group (T320752) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:18 samtar@deploy1002: Started scap: Backport for commonswiki: add editcontentmodel right to interface-admin group (T320752)
  • 20:13 samtar@deploy1002: Finished scap: Backport for cirrus: Drop client side connect timeout config (T143553), cirrus: remove cross-dc poolcounter increases (duration: 05m 31s)
  • 20:08 samtar@deploy1002: samtar and ebernhardson: Backport for cirrus: Drop client side connect timeout config (T143553), cirrus: remove cross-dc poolcounter increases synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:08 samtar@deploy1002: Started scap: Backport for cirrus: Drop client side connect timeout config (T143553), cirrus: remove cross-dc poolcounter increases
  • 19:38 mutante: rsyncing /srv/repos from phab1001 to 3 other phab servers (with bw limit) - T313360
  • 18:08 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.5 refs T314194
  • 17:12 dduvall: disabling puppet on gitlab-runner1002 to debug jwt auth failure
  • 17:11 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:10 sukhe: disable Puppet and stop Pybal on lvs1017: T286881
  • 17:10 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:50 sukhe: disable Puppet and stop Pybal on lvs1020: T286881
  • 16:26 moritzm: draining ganeti1008 T320419
  • 16:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:59 topranks: Adjusting MTU on link from lsw1-e3-eqiad to lsw1-f1-eqiad (drained in advance)
  • 15:29 vgutierrez: partitioning the ATS cache in cp[2027-2028], cp[1075-1076], cp5007, cp[3050-3051] - T317748
  • 15:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:19 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:03 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:59 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=varnish-fe
  • 14:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=ats-tls
  • 14:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=ats-be
  • 14:36 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4045.ulsfo.wmnet,service=varnish-fe
  • 14:36 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4045.ulsfo.wmnet,service=ats-tls
  • 14:36 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4045.ulsfo.wmnet,service=ats-be
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T318950)', diff saved to https://phabricator.wikimedia.org/P35482 and previous config saved to /var/cache/conftool/dbconfig/20221013-142730-ladsgroup.json
  • 14:26 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P35481 and previous config saved to /var/cache/conftool/dbconfig/20221013-141224-ladsgroup.json
  • 14:06 sukhe: running puppet/utils/pcc_update_facts.py to update nodes
  • 14:06 urbanecm@deploy1002: backport aborted: (duration: 00m 04s)
  • 14:04 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:04 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "GrowthExperiments: enable the Vue version of the mentee overview in all wikis" (duration: 05m 49s)
  • 13:59 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and trainbranchbot: Backport for Revert "GrowthExperiments: enable the Vue version of the mentee overview in all wikis" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:58 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "GrowthExperiments: enable the Vue version of the mentee overview in all wikis"
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P35480 and previous config saved to /var/cache/conftool/dbconfig/20221013-135718-ladsgroup.json
  • 13:56 lucaswerkmeister-wmde@deploy1002: Sync cancelled.
  • 13:53 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 13:52 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 13:47 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and sgimeno: Backport for GrowthExperiments: enable the Vue version of the mentee overview in all wikis (T300532) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:47 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for GrowthExperiments: enable the Vue version of the mentee overview in all wikis (T300532)
  • 13:45 mlitn@deploy1002: Finished scap: Backport for Commons files can have thumbnails too (duration: 04m 53s)
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T318950)', diff saved to https://phabricator.wikimedia.org/P35479 and previous config saved to /var/cache/conftool/dbconfig/20221013-134211-ladsgroup.json
  • 13:41 mlitn@deploy1002: mlitn and mlitn: Backport for Commons files can have thumbnails too synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:40 mlitn@deploy1002: Started scap: Backport for Commons files can have thumbnails too
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T318950)', diff saved to https://phabricator.wikimedia.org/P35478 and previous config saved to /var/cache/conftool/dbconfig/20221013-133953-ladsgroup.json
  • 13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T318950)', diff saved to https://phabricator.wikimedia.org/P35477 and previous config saved to /var/cache/conftool/dbconfig/20221013-133931-ladsgroup.json
  • 13:27 mlitn@deploy1002: Finished scap: Backport for Commons files can have thumbnails too (duration: 05m 15s)
  • 13:24 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P35473 and previous config saved to /var/cache/conftool/dbconfig/20221013-132425-ladsgroup.json
  • 13:24 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:22 mlitn@deploy1002: mlitn and mlitn: Backport for Commons files can have thumbnails too synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:22 mlitn@deploy1002: Started scap: Backport for Commons files can have thumbnails too
  • 13:16 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:16 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 13:15 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:15 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P35472 and previous config saved to /var/cache/conftool/dbconfig/20221013-130918-ladsgroup.json
  • 12:57 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:56 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T318950)', diff saved to https://phabricator.wikimedia.org/P35471 and previous config saved to /var/cache/conftool/dbconfig/20221013-125412-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T318950)', diff saved to https://phabricator.wikimedia.org/P35470 and previous config saved to /var/cache/conftool/dbconfig/20221013-125154-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 12:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T318950)', diff saved to https://phabricator.wikimedia.org/P35469 and previous config saved to /var/cache/conftool/dbconfig/20221013-125133-ladsgroup.json
  • 12:45 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin1001"
  • 12:43 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
  • 12:37 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
  • 12:37 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P35468 and previous config saved to /var/cache/conftool/dbconfig/20221013-123626-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P35467 and previous config saved to /var/cache/conftool/dbconfig/20221013-122120-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T318950)', diff saved to https://phabricator.wikimedia.org/P35466 and previous config saved to /var/cache/conftool/dbconfig/20221013-120613-ladsgroup.json
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T318950)', diff saved to https://phabricator.wikimedia.org/P35465 and previous config saved to /var/cache/conftool/dbconfig/20221013-120356-ladsgroup.json
  • 12:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 12:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T318950)', diff saved to https://phabricator.wikimedia.org/P35464 and previous config saved to /var/cache/conftool/dbconfig/20221013-120334-ladsgroup.json
  • 12:02 moritzm: restarting FPM/Apache on mediawiki canaries to pick up new curl
  • 11:58 moritzm: installing curl security updates on buster
  • 11:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P35462 and previous config saved to /var/cache/conftool/dbconfig/20221013-114827-ladsgroup.json
  • 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubetcd1005.eqiad.wmnet to plain
  • 11:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubetcd1005.eqiad.wmnet to plain
  • 11:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubetcd1005.eqiad.wmnet to drbd
  • 11:37 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels (duration: 04m 41s)
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P35461 and previous config saved to /var/cache/conftool/dbconfig/20221013-113320-ladsgroup.json
  • 11:33 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels
  • 11:31 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels (duration: 03m 12s)
  • 11:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubetcd1005.eqiad.wmnet to drbd
  • 11:28 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels
  • 11:28 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels (duration: 03m 34s)
  • 11:27 Lucas_WMDE: 11:18 ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2120 (T318950)', diff saved to https://phabricator.wikimedia.org/P35460 and previous config saved to /var/cache/conftool/dbconfig/20221013-111814-ladsgroup.json
  • 11:27 Lucas_WMDE: 11:15 ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2120 (T318950)', diff saved to https://phabricator.wikimedia.org/P35458 and previous config saved to /var/cache/conftool/dbconfig/20221013-111556-ladsgroup.json
  • 11:27 Lucas_WMDE: 11:15 ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 11:27 Lucas_WMDE: 11:15 ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 11:27 Lucas_WMDE: 11:15 ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2108 (T318950)', diff saved to https://phabricator.wikimedia.org/P35457 and previous config saved to /var/cache/conftool/dbconfig/20221013-111534-ladsgroup.json
  • 11:26 Lucas_WMDE: repeating five messages that got missed due to stashbot quit
  • 11:24 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels
  • 11:24 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels (duration: 18m 30s)
  • 11:06 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels
  • 11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P35455 and previous config saved to /var/cache/conftool/dbconfig/20221013-110028-ladsgroup.json
  • 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P35454 and previous config saved to /var/cache/conftool/dbconfig/20221013-104521-ladsgroup.json
  • 10:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T318950)', diff saved to https://phabricator.wikimedia.org/P35453 and previous config saved to /var/cache/conftool/dbconfig/20221013-103015-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T318950)', diff saved to https://phabricator.wikimedia.org/P35452 and previous config saved to /var/cache/conftool/dbconfig/20221013-102757-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 10:16 moritzm: draining ganeti1008 T320419
  • 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1017.eqiad.wmnet to cluster eqiad and group B
  • 09:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1017.eqiad.wmnet to cluster eqiad and group B
  • 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
  • 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:28 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:25 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
  • 09:18 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:16 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:15 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:15 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:15 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext (duration: 02m 04s)
  • 09:14 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:14 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:13 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext
  • 09:11 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1017.eqiad.wmnet with OS bullseye
  • 08:48 vgutierrez: partitioning the ATS cache in cp[2029-2030], cp[6001,6009], cp[1077-1078], cp[5002,5008], cp[3052-3053], cp4022 - T317748
  • 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1017.eqiad.wmnet with reason: host reimage
  • 08:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1017.eqiad.wmnet with reason: host reimage
  • 08:28 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:28 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:27 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1017.eqiad.wmnet with OS bullseye
  • 08:13 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS buster
  • 08:13 oblivian@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:12 oblivian@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:11 oblivian@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1017.eqiad.wmnet with reason: Remove from cluster for reimage
  • 08:04 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1017.eqiad.wmnet with reason: Remove from cluster for reimage
  • 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1007.eqiad.wmnet with reason: Remove from cluster for eventual decom
  • 08:02 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1007.eqiad.wmnet with reason: Remove from cluster for eventual decom
  • 07:50 matthiasmullie: UTC morning backports done
  • 07:37 mlitn@deploy1002: Finished scap: Backport for Enable NS_MAIN thumbnails only on wikipedias (T320510) (duration: 08m 24s)
  • 07:29 mlitn@deploy1002: mlitn and mlitn: Backport for Enable NS_MAIN thumbnails only on wikipedias (T320510) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:29 mlitn@deploy1002: Started scap: Backport for Enable NS_MAIN thumbnails only on wikipedias (T320510)
  • 00:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS buster
  • 00:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 00:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 00:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster

2022-10-12

  • 21:06 cwhite: clean up old db backups on grafana2001
  • 20:41 TheresNoTime: closing UTC late backport window
  • 20:40 samtar@deploy1002: Finished scap: Backport for yiwiktionary: Adjust width-height ratio of logo to fix display issue (T310961) (duration: 05m 17s)
  • 20:35 samtar@deploy1002: samtar and stang: Backport for yiwiktionary: Adjust width-height ratio of logo to fix display issue (T310961) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:35 samtar@deploy1002: Started scap: Backport for yiwiktionary: Adjust width-height ratio of logo to fix display issue (T310961)
  • 20:21 samtar@deploy1002: Finished scap: Backport for Drop unused wordmark/tagline (T307705), Re-download and optimize wordmark/tagline svg file (T307705) (duration: 04m 53s)
  • 20:16 samtar@deploy1002: samtar and stang: Backport for Drop unused wordmark/tagline (T307705), Re-download and optimize wordmark/tagline svg file (T307705) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:16 samtar@deploy1002: Started scap: Backport for Drop unused wordmark/tagline (T307705), Re-download and optimize wordmark/tagline svg file (T307705)
  • 20:14 samtar@deploy1002: Finished scap: Backport for Set $wgSitename for bnwikiquote (T319183) (duration: 04m 40s)
  • 20:10 samtar@deploy1002: samtar and zabe: Backport for Set $wgSitename for bnwikiquote (T319183) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:09 samtar@deploy1002: Started scap: Backport for Set $wgSitename for bnwikiquote (T319183)
  • 20:08 samtar@deploy1002: Finished scap: Backport for Register the editattempt_block schema (T310390) (duration: 05m 42s)
  • 20:03 samtar@deploy1002: samtar and kemayo: Backport for Register the editattempt_block schema (T310390) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:02 samtar@deploy1002: Started scap: Backport for Register the editattempt_block schema (T310390)
  • 19:29 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS buster
  • 19:16 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 19:15 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4045.ulsfo.wmnet with OS buster
  • 18:41 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 18:40 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS buster
  • 18:26 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 18:12 dduvall@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.5 refs T314194 (duration: 03m 38s)
  • 18:09 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.5 refs T314194
  • 18:03 dduvall@deploy1002: deploy-promote aborted: (duration: 00m 07s)
  • 17:07 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 17:07 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 17:06 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 17:06 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 17:02 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 17:00 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 17:00 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:55 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 16:55 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 16:55 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 16:55 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 16:19 volans@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:46 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cp4045.ulsfo.wmnet with OS buster
  • 15:45 vgutierrez: partitioning the ATS cache in cp[2031-2032], cp[6002,6010], cp[1079-1080], cp[5003,5009], cp[3054-3055], cp[4023,4032] - T317748
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318955)', diff saved to https://phabricator.wikimedia.org/P35445 and previous config saved to /var/cache/conftool/dbconfig/20221012-154230-ladsgroup.json
  • 15:37 urbanecm@deploy1002: Finished scap: Backport for eswiki: Deploy mentorship to only 15% of users (T285235) (duration: 04m 23s)
  • 15:33 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams-internal,name=eqiad
  • 15:33 urbanecm@deploy1002: urbanecm and urbanecm: Backport for eswiki: Deploy mentorship to only 15% of users (T285235) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 15:33 urbanecm@deploy1002: Started scap: Backport for eswiki: Deploy mentorship to only 15% of users (T285235)
  • 15:31 hnowlan@deploy1002: Finished deploy [restbase/deploy@2d002b3]: Add ig,bcl,bn,tl wikiquote, ig wiktionary T314641 (duration: 16m 02s)
  • 15:30 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:30 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 15:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P35444 and previous config saved to /var/cache/conftool/dbconfig/20221012-152724-ladsgroup.json
  • 15:26 ottomata: remove materialized .json files from schemas/event/primary - this should be a no-op as no clients should actually be using the json files. - T315674
  • 15:25 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 15:24 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:24 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 15:23 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:23 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 15:23 claime: redeploying eventstreams-internal eqiad - T310721
  • 15:16 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventstreams-internal,name=eqiad
  • 15:16 claime: depooling eventstreams-internal eqiad - T310721
  • 15:15 hnowlan@deploy1002: Started deploy [restbase/deploy@2d002b3]: Add ig,bcl,bn,tl wikiquote, ig wiktionary T314641
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P35443 and previous config saved to /var/cache/conftool/dbconfig/20221012-151217-ladsgroup.json
  • 15:09 claime: repooled eventstreams-internal codfw - T310721
  • 15:09 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams-internal,name=codfw
  • 15:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:07 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 15:07 claime: redeploying eventstreams-internal codfw - T310721
  • 15:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 14:57 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventstreams-internal,name=codfw
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318955)', diff saved to https://phabricator.wikimedia.org/P35442 and previous config saved to /var/cache/conftool/dbconfig/20221012-145711-ladsgroup.json
  • 14:57 claime: depooling eventstreams-internal codfw - T310721
  • 14:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS buster
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T318955)', diff saved to https://phabricator.wikimedia.org/P35441 and previous config saved to /var/cache/conftool/dbconfig/20221012-145445-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35440 and previous config saved to /var/cache/conftool/dbconfig/20221012-145423-ladsgroup.json
  • 14:39 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P35439 and previous config saved to /var/cache/conftool/dbconfig/20221012-143917-ladsgroup.json
  • 14:39 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 14:35 ladsgroup@deploy1002: Finished scap: Backport for Revert "rdbms: Instead of reconfiguring all of LB, just remove depooled db" (duration: 04m 37s)
  • 14:31 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Revert "rdbms: Instead of reconfiguring all of LB, just remove depooled db" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:30 ladsgroup@deploy1002: Started scap: Backport for Revert "rdbms: Instead of reconfiguring all of LB, just remove depooled db"
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P35438 and previous config saved to /var/cache/conftool/dbconfig/20221012-142410-ladsgroup.json
  • 14:19 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 14:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS buster
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35436 and previous config saved to /var/cache/conftool/dbconfig/20221012-140903-ladsgroup.json
  • 14:08 ladsgroup@deploy1002: Sync cancelled.
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db1175', diff saved to https://phabricator.wikimedia.org/P35435 and previous config saved to /var/cache/conftool/dbconfig/20221012-140746-ladsgroup.json
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P35434 and previous config saved to /var/cache/conftool/dbconfig/20221012-140626-ladsgroup.json
  • 14:04 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 13:53 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for rdbms: Instead of reconfiguring all of LB, just remove depooled db (T298485) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:53 ladsgroup@deploy1002: Started scap: Backport for rdbms: Instead of reconfiguring all of LB, just remove depooled db (T298485)
  • 13:49 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 13:47 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35433 and previous config saved to /var/cache/conftool/dbconfig/20221012-134306-ladsgroup.json
  • 13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 13:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T318955)', diff saved to https://phabricator.wikimedia.org/P35432 and previous config saved to /var/cache/conftool/dbconfig/20221012-134245-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P35431 and previous config saved to /var/cache/conftool/dbconfig/20221012-132738-ladsgroup.json
  • 13:27 urbanecm@deploy1002: Finished scap: Backport for Remove Research Incentive survey from eswiki (T318331) (duration: 05m 21s)
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts d-i-test.eqiad.wmnet
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:22 urbanecm@deploy1002: urbanecm and dani: Backport for Remove Research Incentive survey from eswiki (T318331) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:21 urbanecm@deploy1002: Started scap: Backport for Remove Research Incentive survey from eswiki (T318331)
  • 13:21 urbanecm@deploy1002: Finished scap: Backport for Move wmgSiteLogoWordmark and wmgSiteLogoTagline to logos.php (T307705) (duration: 07m 06s)
  • 13:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:14 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts d-i-test.eqiad.wmnet
  • 13:14 urbanecm@deploy1002: urbanecm and stang: Backport for Move wmgSiteLogoWordmark and wmgSiteLogoTagline to logos.php (T307705) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:14 urbanecm@deploy1002: Started scap: Backport for Move wmgSiteLogoWordmark and wmgSiteLogoTagline to logos.php (T307705)
  • 13:13 urbanecm@deploy1002: Finished scap: Backport for Enable show nearby feature on a small group of wikis (T316782) (duration: 07m 03s)
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P35430 and previous config saved to /var/cache/conftool/dbconfig/20221012-131232-ladsgroup.json
  • 13:09 moritzm: draining ganeti1007 T320419
  • 13:06 urbanecm@deploy1002: urbanecm and wmde-fisch: Backport for Enable show nearby feature on a small group of wikis (T316782) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:06 urbanecm@deploy1002: Started scap: Backport for Enable show nearby feature on a small group of wikis (T316782)
  • 13:05 urbanecm@deploy1002: backport aborted: (duration: 00m 09s)
  • 13:04 urbanecm@deploy1002: Backport cancelled.
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T318955)', diff saved to https://phabricator.wikimedia.org/P35429 and previous config saved to /var/cache/conftool/dbconfig/20221012-125725-ladsgroup.json
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T318955)', diff saved to https://phabricator.wikimedia.org/P35428 and previous config saved to /var/cache/conftool/dbconfig/20221012-123223-ladsgroup.json
  • 12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 12:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35427 and previous config saved to /var/cache/conftool/dbconfig/20221012-123201-ladsgroup.json
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 12:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P35426 and previous config saved to /var/cache/conftool/dbconfig/20221012-121655-ladsgroup.json
  • 12:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS buster
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P35425 and previous config saved to /var/cache/conftool/dbconfig/20221012-120148-ladsgroup.json
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1005.eqiad.wmnet with reason: Remove from cluster for eventual decom
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1005.eqiad.wmnet with reason: Remove from cluster for eventual decom
  • 11:51 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams,name=eqiad
  • 11:50 claime: repooling eventstreams in eqiad - T310721
  • 11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35424 and previous config saved to /var/cache/conftool/dbconfig/20221012-114642-ladsgroup.json
  • 11:46 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 11:45 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 11:44 claime: redeploying eventstreams eqiad - T310721
  • 11:24 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventstreams,name=eqiad
  • 11:24 claime: depooling eventstreams in eqiad - T310721
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35423 and previous config saved to /var/cache/conftool/dbconfig/20221012-112146-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T318955)', diff saved to https://phabricator.wikimedia.org/P35422 and previous config saved to /var/cache/conftool/dbconfig/20221012-112124-ladsgroup.json
  • 11:11 moritzm: installing bind9 security updates on buster (client side tools/libs)
  • 11:07 jgiannelos@deploy1002: Finished deploy [restbase/deploy@0474832]: Update restbase to 1a02cdfb (duration: 25m 48s)
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P35421 and previous config saved to /var/cache/conftool/dbconfig/20221012-110617-ladsgroup.json
  • 11:02 claime: repooled eventstreams in codfw - T310721
  • 11:01 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams,name=codfw
  • 10:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 10:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 10:57 claime: redeploying eventstreams codfw - T310721
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P35420 and previous config saved to /var/cache/conftool/dbconfig/20221012-105111-ladsgroup.json
  • 10:49 moritzm: installing dbus security updates
  • 10:41 jgiannelos@deploy1002: Started deploy [restbase/deploy@0474832]: Update restbase to 1a02cdfb
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T318955)', diff saved to https://phabricator.wikimedia.org/P35419 and previous config saved to /var/cache/conftool/dbconfig/20221012-103604-ladsgroup.json
  • 10:35 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:33 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventstreams,name=codfw
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T318955)', diff saved to https://phabricator.wikimedia.org/P35418 and previous config saved to /var/cache/conftool/dbconfig/20221012-103338-ladsgroup.json
  • 10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 10:33 claime: depooling eventstreams in codfw - T310721
  • 10:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 10:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 10:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 20115
  • 10:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 20115
  • 10:01 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 10:01 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 09:26 moritzm: draining ganeti1017 T311687
  • 09:25 urbanecm@deploy1002: Finished scap: Backport for Replace wordmark/tagline with correct naming style (T307705) (duration: 04m 20s)
  • 09:21 urbanecm@deploy1002: urbanecm and stang: Backport for Replace wordmark/tagline with correct naming style (T307705) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:21 urbanecm@deploy1002: Started scap: Backport for Replace wordmark/tagline with correct naming style (T307705)
  • 09:12 jayme: re-enabled puppet on all kubernetes masters (incl. ml & dse)
  • 09:11 urbanecm@deploy1002: Finished scap: Backport for SVG resources: Run svgo (T320447) (duration: 04m 38s)
  • 09:07 urbanecm@deploy1002: urbanecm and urbanecm: Backport for SVG resources: Run svgo (T320447) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 09:07 urbanecm@deploy1002: Started scap: Backport for SVG resources: Run svgo (T320447)
  • 09:05 jayme: disabling puppet on all kubernetes masters (incl. ml & dse)
  • 08:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to plain
  • 08:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to plain
  • 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagetcd1004.eqiad.wmnet to plain
  • 08:53 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagetcd1004.eqiad.wmnet to plain
  • 08:52 vgutierrez: partitioning the ATS cache in cp[2033-2034], cp[6003,6011], cp[1081-1082], cp[5004,5010], cp[3056-3057], cp[4024,4028] - T317748
  • 08:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to drbd
  • 08:33 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to drbd
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagetcd1004.eqiad.wmnet to drbd
  • 08:18 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagetcd1004.eqiad.wmnet to drbd
  • 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of testvm2001.codfw.wmnet to drbd
  • 08:07 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of testvm2001.codfw.wmnet to drbd
  • 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of testvm2001.codfw.wmnet to plain
  • 08:01 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of testvm2001.codfw.wmnet to plain
  • 07:25 matthiasmullie: UTC morning backports done
  • 07:25 mlitn@deploy1002: Finished scap: Backport for Rescale images based on width alone (T320406) (duration: 05m 19s)
  • 07:20 mlitn@deploy1002: mlitn and mlitn: Backport for Rescale images based on width alone (T320406) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:19 mlitn@deploy1002: Started scap: Backport for Rescale images based on width alone (T320406)
  • 06:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4826
  • 06:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4826

2022-10-11

  • 21:36 mutante: phab1001 / phab2001 - temp. disabled puppet; stopped ssh-phab service; scheduled icinga downtimes for ssh-phab pybal backend alerts - effectively "soft shutting down" the service - T296022
  • 21:22 mutante: phab2001 - systemctl stop ssh-phab; temp disable puppet
  • 21:12 mutante: puppetmaster1001: rm .*.err in /var/run/confd-template
  • 21:10 mutante: puppetmaster2001: rm .*.err in /var/run/confd-template
  • 20:56 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 20:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=phab1001-vcs.eqiad.wmnet
  • 20:35 dzahn@cumin2002: conftool action : set/pooled=inactive; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 20:27 mutante: depooling git-ssh service backends with confctl - T296022
  • 20:26 TheresNoTime: close UTC late backport window
  • 20:26 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 20:25 mutante: depooling git-ssh service backends - checking if monitoring will alert
  • 20:25 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=phab1001-vcs.eqiad.wmnet
  • 20:11 samtar@deploy1002: Finished scap: Backport for Undeploy the GDI wave 3 survey from PROD (T320495) (duration: 06m 29s)
  • 20:05 samtar@deploy1002: samtar and essexigyan: Backport for Undeploy the GDI wave 3 survey from PROD (T320495) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:05 samtar@deploy1002: Started scap: Backport for Undeploy the GDI wave 3 survey from PROD (T320495)
  • 19:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:39 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:16 dcausse: restarting blazegraph on wdqs1013 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 18:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T318959)', diff saved to https://phabricator.wikimedia.org/P35414 and previous config saved to /var/cache/conftool/dbconfig/20221011-181348-ladsgroup.json
  • 18:11 XioNoX: re-enable cr1-eqiad<->asw2-d-eqiad link for re-cabling - T313463
  • 18:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti4001.ulsfo.wmnet
  • 18:09 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:07 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.5 refs T314194
  • 18:07 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:05 ejegg: updated fundraising python tools from 14d60435 to 4c143d97
  • 18:01 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4001.ulsfo.wmnet
  • 18:00 sukhe: sudo gnt-node remove ganeti4001.ulsfo.wmnet
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P35413 and previous config saved to /var/cache/conftool/dbconfig/20221011-175842-ladsgroup.json
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T318955)', diff saved to https://phabricator.wikimedia.org/P35412 and previous config saved to /var/cache/conftool/dbconfig/20221011-174641-ladsgroup.json
  • 17:37 sukhe: completed homer run for "cr*-ulsfo*" commit 841533
  • 17:35 sukhe: running homer "cr*-ulsfo*" commit "Gerrit 841533: sites.yaml: add dns4004 to anycast_neighbors"
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P35411 and previous config saved to /var/cache/conftool/dbconfig/20221011-173134-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T318959)', diff saved to https://phabricator.wikimedia.org/P35410 and previous config saved to /var/cache/conftool/dbconfig/20221011-172822-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T318959)', diff saved to https://phabricator.wikimedia.org/P35409 and previous config saved to /var/cache/conftool/dbconfig/20221011-172608-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 17:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P35408 and previous config saved to /var/cache/conftool/dbconfig/20221011-171627-ladsgroup.json
  • 17:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T318955)', diff saved to https://phabricator.wikimedia.org/P35407 and previous config saved to /var/cache/conftool/dbconfig/20221011-170121-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T318955)', diff saved to https://phabricator.wikimedia.org/P35406 and previous config saved to /var/cache/conftool/dbconfig/20221011-165955-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T318955)', diff saved to https://phabricator.wikimedia.org/P35405 and previous config saved to /var/cache/conftool/dbconfig/20221011-165933-ladsgroup.json
  • 16:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on elastic2052.codfw.wmnet with reason: T320482
  • 16:54 sukhe: depool and reboot doh1001
  • 16:54 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on elastic2052.codfw.wmnet with reason: T320482
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P35404 and previous config saved to /var/cache/conftool/dbconfig/20221011-164427-ladsgroup.json
  • 16:40 rzl: gitlab-runner[1002-1004,2002-2004] - systemctl restart buildkitd - T317997
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P35403 and previous config saved to /var/cache/conftool/dbconfig/20221011-162920-ladsgroup.json
  • 16:26 dduvall@deploy1002: Pruned MediaWiki: 1.40.0-wmf.3 (duration: 02m 00s)
  • 16:23 dduvall@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.5 refs T314194 (duration: 33m 55s)
  • 16:23 volans@cumin2002: conftool action : set/pooled=no; selector: name=elastic2052..*
  • 16:16 ebernhardson: depool elastic2052. failing to join cluster due to `PROBLEM - MD RAID on elastic2052 is CRITICAL: CRITICAL: State: degraded, Active: 5, Working: 5, Failed: 1, Spare: 0`
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T318955)', diff saved to https://phabricator.wikimedia.org/P35402 and previous config saved to /var/cache/conftool/dbconfig/20221011-161414-ladsgroup.json
  • 16:00 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:50 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dns4004.wikimedia.org with OS buster
  • 15:50 dduvall@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.5 refs T314194
  • 15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T318955)', diff saved to https://phabricator.wikimedia.org/P35401 and previous config saved to /var/cache/conftool/dbconfig/20221011-154934-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 15:44 ottomata: remove materialized .json files from schemas/event/secondary - this should be a no-op as no clients should actually be using the json files. - T315674
  • 15:38 sukhe: sudo gnt-node evacuate -s ganeti4001.ulsfo.wmnet
  • 15:35 sukhe: sudo gnt-node migrate -f ganeti4001.ulsfo.wmnet
  • 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:31 TheresNoTime: deployed beta cluster only change, gerrit:841547, for T314294
  • 15:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
  • 15:23 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 15:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:09 XioNoX: disable cr1-eqiad<->asw2-d-eqiad link for re-cabling - T313463
  • 15:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS buster
  • 15:04 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
  • 15:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:56 XioNoX: re-enable cr1-eqiad<->asw2-c-eqiad link after optic replacement
  • 14:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
  • 14:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
  • 14:50 XioNoX: disable cr1-eqiad<->asw2-c-eqiad link for optic replacement
  • 14:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 14:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:15 sukhe: completed homer run for Gerrit 841501
  • 14:14 sukhe: homer "cr*-ulsfo*" commit "Gerrit 841501: sites.yaml: decom dns4002"
  • 14:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:01 hoo@deploy1002: Finished scap: Backport for updateQueryServiceLag: Add lb(-pool) options for forward compatibility (T315423 T238751) (duration: 04m 57s)
  • 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:57 hoo@deploy1002: hoo and hoo: Backport for updateQueryServiceLag: Add lb(-pool) options for forward compatibility (T315423 T238751) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:56 hoo@deploy1002: Started scap: Backport for updateQueryServiceLag: Add lb(-pool) options for forward compatibility (T315423 T238751)
  • 13:50 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 13:19 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 13:18 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:18 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:17 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 13:17 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:17 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:14 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/extensions/Wikistories/extension.json: Backport: Make discovery mode config default to 'off' (T314582) (duration: 03m 48s)
  • 13:14 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 13:13 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:02 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 13:01 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:01 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:00 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:59 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:58 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:46 vgutierrez: partitioning the ATS cache in cp[2035-2036], cp[6004,6012], cp[1083-1084], cp[5005,5011], cp[3058-3059], cp[4025,4029] - T317748
  • 12:39 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T314041)', diff saved to https://phabricator.wikimedia.org/P35397 and previous config saved to /var/cache/conftool/dbconfig/20221011-120514-ladsgroup.json
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P35396 and previous config saved to /var/cache/conftool/dbconfig/20221011-115007-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P35395 and previous config saved to /var/cache/conftool/dbconfig/20221011-113501-ladsgroup.json
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1032.eqiad.wmnet to cluster eqiad and group A
  • 11:26 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1032.eqiad.wmnet to cluster eqiad and group A
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T314041)', diff saved to https://phabricator.wikimedia.org/P35394 and previous config saved to /var/cache/conftool/dbconfig/20221011-111954-ladsgroup.json
  • 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
  • 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
  • 10:41 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 10:13 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 10:12 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 10:08 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 10:07 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 10:06 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 10:02 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 09:57 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 09:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1006.eqiad.wmnet with reason: Remove from cluster for decom
  • 09:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1006.eqiad.wmnet with reason: Remove from cluster for decom
  • 08:53 vgutierrez: partitioning the ATS cache in cp1085, cp1086, cp2037, cp2038, cp3060, cp3061, cp4026, cp4030, cp5006, cp5012, cp6005, cp6013 - T317748
  • 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti4008.ulsfo.wmnet
  • 07:41 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 07:40 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 07:31 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 07:30 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 07:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 07:22 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 07:21 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 07:21 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 07:18 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 07:18 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 07:17 ryankemper: [Elastic] Forcing recheck of elastic settings check alerts; expecting a bit of noise as the alerts resolve (hopefully)
  • 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 07:17 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 07:16 ryankemper: [Elastic] Updated cross-cluster remote seeds (masters): `ryankemper@mwmaint1002:~/elastic$ python push_cross_cluster_conf.py https://search.svc.eqiad.wmnet:9[2,4,6]43/_cluster/settings --ccc chi=chi_eqiad_masters.lst psi=psi_eqiad_masters.lst omega=omega_eqiad_masters.lst`
  • 07:15 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 07:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 07:11 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 07:09 kartik@deploy1002: Finished scap: Backport for ContentTranslation: Make Mongolian Wikipedia MT stricter by 10% (T319156) (duration: 08m 56s)
  • 07:02 kartik@deploy1002: kartik and kartik: Backport for ContentTranslation: Make Mongolian Wikipedia MT stricter by 10% (T319156) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:01 kartik@deploy1002: Started scap: Backport for ContentTranslation: Make Mongolian Wikipedia MT stricter by 10% (T319156)
  • 06:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:44 elukey: kill leftover process of jmads on stat1005 to allow user cleanup via puppet
  • 06:43 elukey: kill leftover process of nokafor on stat1004 to allow user cleanup via puppet
  • 06:37 elukey: kill leftover process of bmansurov on stat1007 to allow user cleanup via puppet
  • 06:35 XioNoX: delete now unused VC ports on asw2-c4-eqiad - T313384
  • 06:34 elukey: kill leftover process of bmansurov on an-airflow1002 to allow user cleanup via puppet
  • 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-10-10

  • 21:19 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bullseye
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:44 urbanecm@deploy1002: Finished scap: Backport for Resize wordmark and tagline of Bengali Wikibooks (T319320) (duration: 07m 29s)
  • 19:16 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:14 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 19:14 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns4004
  • 19:14 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4004.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:13 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns4004
  • 19:07 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4004.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:39 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns4002.wikimedia.org
  • 18:39 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:38 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bullseye
  • 18:35 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:30 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns4002.wikimedia.org
  • 18:21 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
  • 18:18 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
  • 17:20 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:17 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:17 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:09 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:01 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ganeti4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:00 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:58 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:56 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:56 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4008
  • 16:55 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4008
  • 16:08 urbanecm@deploy1002: urbanecm and urbanecm: Backport for eswiki: Enable Growth mentorship for 25% of new accounts (T285235) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 16:08 urbanecm@deploy1002: Started scap: Backport for eswiki: Enable Growth mentorship for 25% of new accounts (T285235)
  • 15:34 mforns@deploy1002: Finished deploy [airflow-dags/analytics@60aa96c]: (no justification provided) (duration: 00m 12s)
  • 15:33 mforns@deploy1002: Started deploy [airflow-dags/analytics@60aa96c]: (no justification provided)
  • 14:56 claime: Updating helm3 to 3.9.4-1 on chartmuseum2001.codfw.wmnet,chartmuseum1001.eqiad.wmnet,contint[1001,2001].wikimedia.org,deploy2002.codfw.wmnet,deploy1002.eqiad.wmnet,releases2002.codfw.wmnet,releases1002.eqiad.wmnet
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T314041)', diff saved to https://phabricator.wikimedia.org/P35389 and previous config saved to /var/cache/conftool/dbconfig/20221010-140635-ladsgroup.json
  • 14:02 Lucas_WMDE: UTC afternoon backport+config window done # likewise
  • 14:01 Lucas_WMDE: 13:51 ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P35388 and previous config saved to /var/cache/conftool/dbconfig/20221010-135128-ladsgroup.json # re-logging due to stashbot issue
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P35387 and previous config saved to /var/cache/conftool/dbconfig/20221010-133621-ladsgroup.json
  • 13:31 vgutierrez: partitioning the ATS cache in cp1087, cp1088, cp2039, cp2040, cp3062, cp3063, cp4033, cp4035, cp5013, cp5015, cp6006, cp6014 - T317748
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:21 moritzm: draining ganeti1006 T320419
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T314041)', diff saved to https://phabricator.wikimedia.org/P35386 and previous config saved to /var/cache/conftool/dbconfig/20221010-132115-ladsgroup.json
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:20 samtar@deploy1002: Finished scap: Backport for trwikivoyage: Install WikiLove extension (T319537) (duration: 06m 55s)
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:13 samtar@deploy1002: samtar and stang: Backport for trwikivoyage: Install WikiLove extension (T319537) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:13 samtar@deploy1002: Started scap: Backport for trwikivoyage: Install WikiLove extension (T319537)
  • 13:11 TheresNoTime: [samtar@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php trwikivoyage wikilove
  • 12:35 moritzm: installing puma security updates
  • 12:28 moritzm: installing jetty9 security updates
  • 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1031.eqiad.wmnet to cluster eqiad and group A
  • 12:18 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1031.eqiad.wmnet to cluster eqiad and group A
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
  • 12:14 moritzm: installing ruby-rack security updates
  • 12:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
  • 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1014.eqiad.wmnet to cluster eqiad and group B
  • 11:23 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1014.eqiad.wmnet to cluster eqiad and group B
  • 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:01 urbanecm@deploy1002: Finished scap: Backport for Update interwiki cache (duration: 04m 13s)
  • 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
  • 11:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:57 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Update interwiki cache synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 10:57 urbanecm@deploy1002: Started scap: Backport for Update interwiki cache
  • 10:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:55 urbanecm@deploy1002: Finished scap: Creating igwiktionary (T314635) (duration: 04m 13s)
  • 10:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
  • 10:51 urbanecm@deploy1002: Started scap: Creating igwiktionary (T314635)
  • 10:49 urbanecm@deploy1002: Finished scap: Creating igwikiquote (T314636) (duration: 04m 24s)
  • 10:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:44 urbanecm@deploy1002: Started scap: Creating igwikiquote (T314636)
  • 10:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:40 urbanecm@deploy1002: Finished scap: Creating bclwikiquote (T316453) (duration: 04m 11s)
  • 10:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:36 urbanecm@deploy1002: Started scap: Creating bclwikiquote (T316453)
  • 10:35 urbanecm@deploy1002: scap failed: FileNotFoundError [Errno 2] Invalid/unavailable version dir: '/srv/mediawiki-staging/php-1.40-0-wmf.4' (duration: 00m 00s)
  • 10:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:32 urbanecm@deploy1002: Finished scap: Creating tlwikiquote (T317107) (duration: 04m 04s)
  • 10:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:28 urbanecm@deploy1002: Started scap: Creating tlwikiquote (T317107)
  • 10:24 urbanecm@deploy1002: Finished scap: Creating bnwikiquote (T319183) (duration: 04m 56s)
  • 10:19 urbanecm@deploy1002: Started scap: Creating bnwikiquote (T319183)
  • 10:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:04 vgutierrez: rolling upgrade to HAProxy 2.4.19 on both text and upload caching clusters
  • 09:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1026.eqiad.wmnet to cluster eqiad and group A
  • 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1026.eqiad.wmnet to cluster eqiad and group A
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 09:35 claime: Imported helm3 3.9.4-1 to buster-wikimedia and bullseye-wikimedia
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T314041)', diff saved to https://phabricator.wikimedia.org/P35384 and previous config saved to /var/cache/conftool/dbconfig/20221010-093334-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T314041)', diff saved to https://phabricator.wikimedia.org/P35383 and previous config saved to /var/cache/conftool/dbconfig/20221010-093041-ladsgroup.json
  • 09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 09:26 vgutierrez: partitioning the ATS cache in cp1089, cp1090, cp2041, cp2042, cp3064, cp3065, cp4034, cp4036, cp5014, cp5016, cp6007, cp6015 - T317748
  • 08:28 Emperor: set thanos ring replicas to 3.68 T311690
  • 08:23 jynus: online resizefs of backup1003 bacula partition
  • 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1030.eqiad.wmnet to cluster eqiad and group A
  • 08:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1030.eqiad.wmnet to cluster eqiad and group A
  • 08:09 jynus: online resizefs of backup2003 bacula partition
  • 08:05 jynus: restarting db2100:s7 to apply new buffer pool config
  • 07:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
  • 07:51 jayme: importes kubernetes 1.23.12 to component/kubernetes123 for buster-wikimedia, bullseye-wikimedia - T307943
  • 07:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 07:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 07:43 godog: bounce thanos-compact on thanos-fe2001
  • 07:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
  • 07:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 07:37 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 07:35 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 07:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 07:31 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 07:26 elukey: kill hanging process for user bmansurov on deploy1002 to allow proper user cleanup
  • 06:58 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bmansurov out of all services on: 1211 hosts
  • 06:58 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bmansurov out of all services on: 1211 hosts
  • 06:56 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bmansurov out of all services on: 797 hosts
  • 06:56 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bmansurov out of all services on: 797 hosts
  • 06:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 397715
  • 06:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 397715

2022-10-08

  • 06:56 hashar: Restarting Gerrit to fix up replicaton to GitHub - T320305

2022-10-07

  • 21:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: debugging
  • 21:28 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: debugging
  • 19:46 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ganeti4004.ulsfo.wmnet
  • 19:46 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:42 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 19:37 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4004.ulsfo.wmnet
  • 19:07 sukhe: decommission ganeti4004.ulsfo.wmnet: T317249
  • 19:05 sukhe: sudo gnt-node remove ganeti4004.ulsfo.wmnet T317249
  • 17:51 ryankemper: [Elastic] Updated list of cross-cluster remote seeds for all eqiad/codfw elastic clusters; should resolve `ElasticSearch setting check` alerts
  • 17:20 sukhe: sudo gnt-node evacuate -s ganeti4004.ulsfo.wmnet
  • 17:13 sukhe: migrate ganeti4004: T317249
  • 17:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:59 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.4 refs T314193
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:50 brennen@deploy1002: Finished scap: Backport for RecentSignificantEditStore: Force section titles to be an index array (T319799) (duration: 06m 41s)
  • 16:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:44 brennen@deploy1002: brennen and kartik: Backport for RecentSignificantEditStore: Force section titles to be an index array (T319799) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 16:43 brennen@deploy1002: Started scap: Backport for RecentSignificantEditStore: Force section titles to be an index array (T319799)
  • 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:42 brennen@deploy1002: Finished scap: Backport for Check whether title actually exists (T319798) (duration: 05m 47s)
  • 16:36 brennen@deploy1002: brennen and brennen: Backport for Check whether title actually exists (T319798) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 16:36 brennen@deploy1002: Started scap: Backport for Check whether title actually exists (T319798)
  • 16:15 brennen: train 1.40.0-wmf.4 (T314193) blockers have patches; after discussion in releng, going ahead with friday deploy in interest of avoiding a scramble during the coming holiday week
  • 15:09 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 14:57 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 14:35 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS buster
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 13:24 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 13:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS buster
  • 13:08 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS buster
  • 12:40 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
  • 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:57 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS buster
  • 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:56 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS buster
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1001.eqiad.wmnet with OS buster
  • 11:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1001.eqiad.wmnet with OS buster
  • 11:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:49 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS buster
  • 11:27 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:48 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 09:26 elukey: delete calico pods in CrashLoop on dse-k8s-codfw (probably due to the incorrect docker settings)
  • 08:59 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:52 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:44 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:43 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:39 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:37 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:36 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:35 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:35 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:33 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:33 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1029.eqiad.wmnet to cluster eqiad and group A
  • 08:26 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1029.eqiad.wmnet to cluster eqiad and group A
  • 08:23 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudnet1004.eqiad.wmnet with OS bullseye
  • 08:23 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1014.eqiad.wmnet with OS bullseye
  • 08:22 vgutierrez: partition ats-be cache in cp6016 - T317748
  • 08:21 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:19 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:19 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:19 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:19 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:11 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:11 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1014.eqiad.wmnet with reason: host reimage
  • 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1014.eqiad.wmnet with reason: host reimage
  • 07:54 elukey: re-initialize docker on dse-k8s-worker1004 - wrong storage type set (devicemapper instead of overlay2)
  • 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1014.eqiad.wmnet with OS bullseye
  • 07:49 elukey: re-initialize docker on dse-k8s-worker100[5-8] - wrong storage type set (devicemapper instead of overlay2)
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1026.eqiad.wmnet with OS bullseye
  • 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1026.eqiad.wmnet with reason: host reimage
  • 07:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1026.eqiad.wmnet with reason: host reimage
  • 07:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1029.eqiad.wmnet to cluster eqiad and group A
  • 07:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1029.eqiad.wmnet to cluster eqiad and group A
  • 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1014.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
  • 07:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1014.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
  • 07:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1026.eqiad.wmnet with OS bullseye
  • 03:54 ejegg: civicrm upgraded from 6156f7cc to 4b9e981a
  • 03:52 ejegg: updated SmashPig standalone deployment from a8bb2212 to f36143f0

2022-10-06

  • 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:08 thcipriani@deploy1002: Finished scap: Backport for Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396) (duration: 06m 08s)
  • 21:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudnet1004.eqiad.wmnet
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:02 thcipriani@deploy1002: thcipriani and jdlrobson: Backport for Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:01 thcipriani@deploy1002: Started scap: Backport for Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396)
  • 20:58 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:45 samtar@deploy1002: Finished scap: Backport for Replace promise handling when AfD'ing pages (T238025), Replace promise handling when AfD'ing pages (T238025) (duration: 07m 56s)
  • 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:39 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudnet1004.eqiad.wmnet
  • 20:37 samtar@deploy1002: samtar and samtar: Backport for Replace promise handling when AfD'ing pages (T238025), Replace promise handling when AfD'ing pages (T238025) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:37 samtar@deploy1002: Started scap: Backport for Replace promise handling when AfD'ing pages (T238025), Replace promise handling when AfD'ing pages (T238025)
  • 20:36 samtar@deploy1002: Backport cancelled.
  • 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:34 thcipriani@deploy1002: Finished scap: Backport for Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396) (duration: 09m 51s)
  • 20:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudnet1003.eqiad.wmnet
  • 20:33 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:32 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:27 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudnet1003.eqiad.wmnet
  • 20:25 thcipriani@deploy1002: thcipriani and jdlrobson: Backport for Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:24 thcipriani@deploy1002: Started scap: Backport for Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396)
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:05 samtar@deploy1002: backport aborted: (duration: 03m 13s)
  • 20:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:51 SandraEbele: Started airflow projectview_hourly_dag
  • 19:50 SandraEbele: killed Oozie projectview-hourly job
  • 19:41 SandraEbele: deployed airflow to fix projectview_hourly_dag
  • 19:34 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@cbdc509]: (no justification provided) (duration: 00m 14s)
  • 19:34 ebysans@deploy1002: Started deploy [airflow-dags/analytics@cbdc509]: (no justification provided)
  • 19:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:29 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: T313431
  • 19:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: T313431
  • 19:28 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.3 refs T314193
  • 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:21 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.4 refs T314193
  • 19:15 brennen: train 1.40.0-wmf.4 (T314193) no current blockers, rolling train to all wikis
  • 19:03 inflatador: 'bking@elastic restarted elastic2025, 2031, 2061, 2084 T313431
  • 18:52 gehel@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on elastic[2025,2031].codfw.wmnet with reason: restarting for config reload - T313431
  • 18:52 gehel@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on elastic[2025,2031].codfw.wmnet with reason: restarting for config reload - T313431
  • 18:51 gehel@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on elastic2084.codfw.wmnet with reason: restarting for config reload - T313431
  • 18:50 gehel@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on elastic2084.codfw.wmnet with reason: restarting for config reload - T313431
  • 18:50 gehel@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on elastic2061.codfw.wmnet with reason: restarting for config reload - T313431
  • 18:50 gehel@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on elastic2061.codfw.wmnet with reason: restarting for config reload - T313431
  • 18:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudnet1003.eqiad.wmnet
  • 18:39 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:35 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 18:29 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudnet1003.eqiad.wmnet
  • 16:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 16:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 15:57 topranks: Applying explicit BFD mode configuration to cr4-ulsfo for Anycast BGP groups.
  • 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:49 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:48 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1004.eqiad.wmnet with OS bullseye
  • 15:47 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 15:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 15:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1009.eqiad.wmnet
  • 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:17 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 15:16 jynus: reload haproxy config on dbproxy1016, dbproxy1017
  • 15:11 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1009.eqiad.wmnet
  • 15:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1008.eqiad.wmnet
  • 15:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:08 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:08 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 15:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 15:01 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1008.eqiad.wmnet
  • 15:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 14:56 bblack: eqiad front edge depooled in DNS
  • 14:49 XioNoX: move asw2-d-eqiad<->cr1 link to new 40G link - T313385
  • 14:45 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 14:43 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudnet1005.eqiad.wmnet on all recursors
  • 14:43 cmooney@cumin1001: START - Cookbook sre.dns.wipe-cache cloudnet1005.eqiad.wmnet on all recursors
  • 14:42 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:40 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) failoid2001.codfw.wmnet on codfw recursors
  • 14:40 volans@cumin1001: START - Cookbook sre.dns.wipe-cache failoid2001.codfw.wmnet on codfw recursors
  • 14:40 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:30 XioNoX: moving eqiad row C vrrp mastership to cr1-eqiad
  • 14:28 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 14:16 hashar: Gerrit upgraded from 3.4.5 to 3.4.6 # T319513
  • 14:13 XioNoX: move asw2-c-eqiad<->cr1 link to new 40G link - T313385
  • 14:12 hashar@deploy1002: Finished deploy [gerrit/gerrit@132ac68]: Gerrit to 3.4.6 on gerrit1001 (duration: 00m 08s)
  • 14:12 hashar@deploy1002: Started deploy [gerrit/gerrit@132ac68]: Gerrit to 3.4.6 on gerrit1001
  • 14:12 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 14:12 hashar: Upgrading primary Gerrit # T319513
  • 14:08 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 14:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@132ac68]: Gerrit to 3.4.6 on gerrit2002 (duration: 00m 10s)
  • 14:08 hashar@deploy1002: Started deploy [gerrit/gerrit@132ac68]: Gerrit to 3.4.6 on gerrit2002
  • 14:07 vgutierrez: updating HAProxy to version 2.4.19 in ulsfo
  • 14:03 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts aqs1007.eqiad.wmnet
  • 14:03 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:01 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 13:48 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1007.eqiad.wmnet
  • 13:41 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 13:20 urbanecm: UTC afternoon backport window done
  • 13:20 moritzm: draining ganeti1014 T311687
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 urbanecm@deploy1002: Finished scap: Backport for Show thumbnails on Special:Search for NS_FILE + PageImages (T306883) (duration: 05m 12s)
  • 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:17 vgutierrez: partition ats-be cache in cp6008 - T317748
  • 13:16 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 13:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1006.eqiad.wmnet
  • 13:16 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:15 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 13:14 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:14 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:13 urbanecm@deploy1002: urbanecm and mlitn: Backport for Show thumbnails on Special:Search for NS_FILE + PageImages (T306883) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:13 urbanecm@deploy1002: Started scap: Backport for Show thumbnails on Special:Search for NS_FILE + PageImages (T306883)
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:12 urbanecm@deploy1002: Finished scap: Backport for Explicit config for Wikistories discovery module (T314582) (duration: 06m 37s)
  • 13:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:08 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 13:06 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 13:06 urbanecm@deploy1002: urbanecm and sbisson: Backport for Explicit config for Wikistories discovery module (T314582) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:06 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 13:05 urbanecm@deploy1002: Started scap: Backport for Explicit config for Wikistories discovery module (T314582)
  • 12:59 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:58 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1026.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
  • 12:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1026.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
  • 12:54 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1006.eqiad.wmnet
  • 12:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1029.eqiad.wmnet
  • 12:43 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:42 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:40 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 12:39 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:36 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:34 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 12:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1005.eqiad.wmnet
  • 12:24 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:21 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 12:15 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1005.eqiad.wmnet
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1012.eqiad.wmnet to cluster eqiad and group C
  • 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1004.eqiad.wmnet
  • 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:28 jbond: enable puppet post deploy puppetdb change 814824
  • 11:27 jbond: switch puppetdb replication to use replications slots
  • 11:27 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 11:27 btullis: cold-reset the BMC on analytics1076
  • 11:22 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1004.eqiad.wmnet
  • 10:58 jbond: disable puppet temporarily to deploy a puppetdb change 814824
  • 10:51 _joe_: installing the upgraded php package everywhere, T318918
  • 10:30 elukey: restart kafka on kafka-logging1003 to reload the conifg (cleanup old super.users related to past keystore)
  • 10:16 moritzm: installing ruby-rack security updates
  • 10:11 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all remaining wikis
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging NOkafor out of all services on: 1213 hosts
  • 10:07 jmm@cumin2002: START - Cookbook sre.idm.logout Logging NOkafor out of all services on: 1213 hosts
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging NOkafor out of all services on: 799 hosts
  • 10:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging NOkafor out of all services on: 799 hosts
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jmads out of all services on: 799 hosts
  • 10:05 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jmads out of all services on: 799 hosts
  • 10:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:02 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for all wikis (duration: 03m 39s)
  • 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jmads out of all services on: 1213 hosts
  • 10:00 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jmads out of all services on: 1213 hosts
  • 09:57 moritzm: installing glib2.0 security updates on buster
  • 09:52 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for itwiki, arzwiki, ptwiki
  • 09:41 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1005.eqiad.wmnet
  • 09:34 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
  • 09:32 moritzm: installing python-oslo.utils security updates
  • 09:28 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for viwiki, metawiki, frwiktionary
  • 09:22 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for nlwiktionary, ruwiki, jawiki
  • 09:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:21 _joe_: installed the upgraded php package to mw1414, T318918
  • 09:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:18 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for nine wikis (duration: 03m 41s)
  • 09:05 topranks: re-pooling esams after cr2-esams line card reboot
  • 09:04 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for cebwiki
  • 09:04 hoo: Ran extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for specieswiki
  • 09:04 hoo: Ran extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for ruwiktionary
  • 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:59 _joe_: uploaded new php 7.4 packages T318918
  • 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:54 topranks: rebooting line card fpc 0 on cr2-esams (T318783)
  • 08:53 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for three wikis (duration: 04m 03s)
  • 08:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:48 moritzm: installing jetty9 security updates
  • 08:42 moritzm: installing rails security updates
  • 08:37 moritzm: installing puma security updates
  • 08:27 topranks: disabling OSPF on cr2-esams
  • 08:24 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-esams,cr2-esams IPv6,re0.cr2-esams.mgmt with reason: line card reboot
  • 08:24 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cr2-esams,cr2-esams IPv6,re0.cr2-esams.mgmt with reason: line card reboot
  • 08:21 topranks: disabling external BGP sessions on cr2-esams prior to line card reboot
  • 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 08:10 elukey: restart kafka on kafka-logging1002 to reload the conifg (cleanup old super.users related to past keystore)
  • 08:10 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
  • 08:09 elukey: kafka logging old cert cleanup - `cumin 'A:kafka-logging' 'rm -f /etc/kafka/ssl/kafka_logging-eqiad_broker.keystore.jks'`
  • 08:01 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1012.eqiad.wmnet to cluster eqiad and group C
  • 08:00 elukey: delete /etc/kafka/ssl/kafka_logging-eqiad_broker.keystore.jks on kafka-logging1001 and restart (old puppet cert + settings deleted)
  • 07:50 topranks: De-pooling esams in advance of cr2-esams line card reboot
  • 07:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
  • 07:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
  • 07:36 moritzm: draining ganeti1026 T311687
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1012.eqiad.wmnet with OS bullseye
  • 07:15 moritzm: draining ganeti1005 T311687
  • 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1012.eqiad.wmnet with reason: host reimage
  • 07:11 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1012.eqiad.wmnet with reason: host reimage
  • 06:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1012.eqiad.wmnet with OS bullseye
  • 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 6079
  • 06:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 6079
  • 06:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 22616
  • 06:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 22616
  • 01:12 reedy@deploy1002: Finished deploy [integration/docroot@dc380cb]: Update jQuery (duration: 00m 11s)
  • 01:12 reedy@deploy1002: Started deploy [integration/docroot@dc380cb]: Update jQuery
  • 01:03 reedy@deploy1002: Finished deploy [integration/docroot@5cd2243]: Minor fixes (duration: 00m 12s)
  • 01:03 reedy@deploy1002: Started deploy [integration/docroot@5cd2243]: Minor fixes
  • 00:35 reedy@deploy1002: Finished deploy [integration/docroot@13687ed]: More minor updates (duration: 00m 30s)
  • 00:35 reedy@deploy1002: Started deploy [integration/docroot@13687ed]: More minor updates

2022-10-05

  • 22:27 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: Cleanup and timestamps (duration: 00m 07s)
  • 22:27 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: Cleanup and timestamps
  • 22:21 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: (no justification provided) (duration: 00m 06s)
  • 22:21 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: (no justification provided)
  • 22:19 reedy@deploy1002: deploy aborted: Cleanup and timestamps (duration: 00m 22s)
  • 22:19 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: Cleanup and timestamps
  • 22:18 dancy@deploy1002: Finished deploy [integration/docroot@a136ce6]: (no justification provided) (duration: 00m 10s)
  • 22:17 dancy@deploy1002: Started deploy [integration/docroot@a136ce6]: (no justification provided)
  • 22:17 dancy@deploy1002: Installation of scap version "4.27.0" completed for 559 hosts
  • 22:17 dancy@deploy1002: Installing scap version "4.27.0" for 559 hosts
  • 21:41 dancy@deploy1002: Installation of scap version "4.26.0" completed for 559 hosts
  • 21:41 dancy@deploy1002: Installing scap version "4.26.0" for 559 hosts
  • 20:33 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 01m 05s)
  • 20:32 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
  • 20:27 sukhe: running authdns-update for CR 838882
  • 20:26 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 00m 10s)
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:26 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
  • 20:25 sukhe: homer "cr*-ulsfo*" commit "Gerrit 838239: sites.yaml: add dns4003 to anycast_neighbors"
  • 20:24 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 00m 06s)
  • 20:23 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
  • 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:22 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 00m 31s)
  • 20:22 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:19 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 00m 42s)
  • 20:19 urbanecm@deploy1002: Finished scap: Backport for Remove Research Incentive survey from arwiki (T318328) (duration: 05m 13s)
  • 20:19 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:14 urbanecm@deploy1002: urbanecm and dani: Backport for Remove Research Incentive survey from arwiki (T318328) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:14 urbanecm@deploy1002: Started scap: Backport for Remove Research Incentive survey from arwiki (T318328)
  • 20:11 urbanecm@deploy1002: Finished scap: Backport for Deploy Research Incentive survey on eswiki (T318331) (duration: 06m 51s)
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:05 urbanecm@deploy1002: urbanecm and dani: Backport for Deploy Research Incentive survey on eswiki (T318331) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:05 urbanecm@deploy1002: Started scap: Backport for Deploy Research Incentive survey on eswiki (T318331)
  • 20:03 mutante: registry* (4 servers) - disabling puppet, deploying gerrit:838859 - T308501
  • 19:57 reedy@deploy1002: Finished deploy [integration/docroot@09eb565]: T319461 and cleanup (duration: 00m 10s)
  • 19:56 reedy@deploy1002: Started deploy [integration/docroot@09eb565]: T319461 and cleanup
  • 18:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4003.wikimedia.org with OS buster
  • 18:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:27 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.4 refs T314193 (duration: 03m 40s)
  • 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:23 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.4 refs T314193
  • 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:18 brennen: train 1.40.0-wmf.4 (T314193) no current blockers, rolling train to group1
  • 18:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 18:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 17:54 ejegg: payments-wiki upgraded from aeee9676 to 4e1f308b
  • 17:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS buster
  • 17:20 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a] (duration: 00m 14s)
  • 17:20 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a]
  • 17:18 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a] (duration: 00m 18s)
  • 17:18 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a]
  • 17:17 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a] (duration: 04m 24s)
  • 17:12 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a]
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:53 cjming: deployed labs-only config
  • 15:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1012.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
  • 15:39 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1012.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
  • 15:29 moritzm: installing gdal security updates
  • 15:27 SandraEbele: deployed refinery source
  • 14:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on cloudnet[1005-1006].eqiad.wmnet with reason: migrating
  • 14:39 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudnet1004.eqiad.wmnet with reason: decom
  • 14:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudnet1004.eqiad.wmnet with reason: decom
  • 14:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudnet1003.eqiad.wmnet with reason: decom
  • 14:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudnet1003.eqiad.wmnet with reason: decom
  • 14:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8359
  • 14:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8359
  • 14:30 papaul: on going maintenance on msw1-eqiad
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1032.eqiad.wmnet with OS bullseye
  • 14:20 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a] (duration: 04m 24s)
  • 14:16 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a]
  • 14:16 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a]: Regular analytics weekly train [analytics/refinery@7e16d2a] (duration: 10m 27s)
  • 14:15 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 14:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1032.eqiad.wmnet with reason: host reimage
  • 14:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1032.eqiad.wmnet with reason: host reimage
  • 14:07 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:06 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:05 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a]: Regular analytics weekly train [analytics/refinery@7e16d2a]
  • 13:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1032.eqiad.wmnet with OS bullseye
  • 13:37 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@f7a68c2]: (no justification provided) (duration: 00m 12s)
  • 13:36 ebysans@deploy1002: Started deploy [airflow-dags/analytics@f7a68c2]: (no justification provided)
  • 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:22 SandraEbele: deploying fix for projectview dags on airflow
  • 13:21 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for enwiktionary/frwiki (duration: 03m 38s)
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1031.eqiad.wmnet with OS bullseye
  • 13:07 moritzm: draining ganeti1012 T311687
  • 13:04 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for zhwiki
  • 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1031.eqiad.wmnet with reason: host reimage
  • 13:00 vgutierrez: test HAProxy 2.4.19 in cp4026 && cp4032
  • 12:59 vgutierrez: vgutierrez@apt1001:~$ sudo -i reprepro --component thirdparty/haproxy24 update buster-wikimedia # fetch HAProxy 2.4.19
  • 12:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1031.eqiad.wmnet with reason: host reimage
  • 12:48 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 12:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1031.eqiad.wmnet with OS bullseye
  • 12:47 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1031.eqiad.wmnet with OS bullseye
  • 12:46 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1031.eqiad.wmnet with OS bullseye
  • 12:45 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 12:41 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for enwiki
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1030.eqiad.wmnet with OS bullseye
  • 12:30 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:28 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for enwiki/zhwiki (duration: 03m 46s)
  • 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1030.eqiad.wmnet with reason: host reimage
  • 12:16 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1030.eqiad.wmnet with reason: host reimage
  • 12:13 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 12:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1030.eqiad.wmnet with OS bullseye
  • 11:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 11:53 XioNoX: fix MTU between eqiad core routers and cloudsw - T315838
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1029.eqiad.wmnet with OS bullseye
  • 11:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 11:49 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 11:49 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1029.eqiad.wmnet with reason: host reimage
  • 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 11:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1029.eqiad.wmnet with reason: host reimage
  • 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1029.eqiad.wmnet with OS bullseye
  • 11:04 moritzm: running "gnt-cluster upgrade --to 3.0" for ganeti/eqiad T311687
  • 11:01 vgutierrez: repool cp2036 - T319394
  • 10:53 vgutierrez: powercycle cp2036 - T319394
  • 10:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:48 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2036.codfw.wmnet
  • 10:46 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for commonswiki
  • 10:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:44 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for commonswiki (duration: 03m 51s)
  • 10:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:36 moritzm: installing gdk-pixbuf security updates
  • 09:52 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of ruwikinews
  • 09:51 hoo: Ran extensions/Wikibase/client/maintenance/PopulateUnexpectedUnconnectedPagePageProp.php for all of arwiki
  • 09:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:31 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for ruwikinews (duration: 03m 39s)
  • 09:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:21 moritzm: upgrading ganeti/eqiad nodes to Ganeti 3 T311687
  • 09:20 dcausse: restarting blazegraph on wdqs1014 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 09:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:09 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for arwiki (duration: 03m 49s)
  • 09:06 moritzm: reimport ganeti 3.0.1-1~bpo10+1 to component/ganeti3 (got removed alongside via a reprepro bug/misfeature when the bullseye component was removed)
  • 07:54 elukey: restart kafka on kafka-logging1003 to pick up new PKI TLS settings
  • 07:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1003.eqiad.wmnet with reason: Kafka PKI upgrade
  • 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1003.eqiad.wmnet with reason: Kafka PKI upgrade
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35360 and previous config saved to /var/cache/conftool/dbconfig/20221005-065519-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35359 and previous config saved to /var/cache/conftool/dbconfig/20221005-064014-root.json
  • 06:30 elukey: restart kafka on kafka-logging1002 to pick up the new cert+settings for PKI
  • 06:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1002.eqiad.wmnet with reason: Kafka PKI upgrade
  • 06:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1002.eqiad.wmnet with reason: Kafka PKI upgrade
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35358 and previous config saved to /var/cache/conftool/dbconfig/20221005-062509-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35357 and previous config saved to /var/cache/conftool/dbconfig/20221005-061004-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35356 and previous config saved to /var/cache/conftool/dbconfig/20221005-055459-root.json
  • 05:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62044
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35355 and previous config saved to /var/cache/conftool/dbconfig/20221005-053954-root.json
  • 05:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62044
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35354 and previous config saved to /var/cache/conftool/dbconfig/20221005-052449-root.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35353 and previous config saved to /var/cache/conftool/dbconfig/20221005-050944-root.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P35352 and previous config saved to /var/cache/conftool/dbconfig/20221005-050018-root.json
  • 02:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1023.eqiad.wmnet
  • 02:21 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1023.eqiad.wmnet
  • 02:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1023.eqiad.wmnet
  • 02:20 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1023.eqiad.wmnet
  • 02:19 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1023.eqiad.wmnet
  • 02:19 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1023.eqiad.wmnet
  • 00:05 sukhe: disable puppet on dns4003 till we resolve the puppet failures

2022-10-04

  • 23:09 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 22:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 21:28 cjming: end of UTC late backport window
  • 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:25 cjming@deploy1002: Finished scap: Backport for Revert "Revert "Add wordmark and tagline for Bengali Wikibooks"" (duration: 05m 06s)
  • 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:21 cjming@deploy1002: cjming and cjming: Backport for Revert "Revert "Add wordmark and tagline for Bengali Wikibooks"" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:20 cjming@deploy1002: Started scap: Backport for Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""
  • 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:07 cjming@deploy1002: Finished scap: Backport for Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317) (duration: 05m 40s)
  • 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:01 cjming@deploy1002: cjming and mdsshakil: Backport for Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:01 cjming@deploy1002: Started scap: Backport for Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)
  • 20:59 cjming@deploy1002: Finished scap: Backport for Revert "Add wordmark and tagline for Bengali Wikibooks" (duration: 06m 35s)
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:53 cjming@deploy1002: cjming and trainbranchbot: Backport for Revert "Add wordmark and tagline for Bengali Wikibooks" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:52 cjming@deploy1002: Started scap: Backport for Revert "Add wordmark and tagline for Bengali Wikibooks"
  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:49 cjming@deploy1002: Sync cancelled.
  • 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:42 cjming@deploy1002: cjming and aishik: Backport for Add wordmark and tagline for Bengali Wikibooks (T319320) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:41 cjming@deploy1002: Started scap: Backport for Add wordmark and tagline for Bengali Wikibooks (T319320)
  • 20:39 cjming@deploy1002: Finished scap: Backport for ParsoidHandler: use metrics from SiteConfig (duration: 14m 29s)
  • 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:25 cjming@deploy1002: cjming and d3r1ck01: Backport for ParsoidHandler: use metrics from SiteConfig synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:25 cjming@deploy1002: Started scap: Backport for ParsoidHandler: use metrics from SiteConfig
  • 19:54 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS buster
  • 18:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 18:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 18:34 mutante: gerrit - deploying puppet refactoring change
  • 18:34 tzatziki: removing 1 file for legal compliance
  • 18:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS buster
  • 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:24 tzatziki: removing 1 file for legal compliance
  • 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:21 moritzm: installing gdk-pixbuf security updates
  • 18:19 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.4 refs T314193
  • 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:59 ejegg: turned fundraising scheduled jobs back on
  • 17:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:57 urbanecm@deploy1002: Finished scap: Backport for Mentee table: fix wrong less import (T319321) (duration: 06m 58s)
  • 17:55 moritzm: installing libsndfile security updates
  • 17:50 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Mentee table: fix wrong less import (T319321) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 17:50 urbanecm@deploy1002: Started scap: Backport for Mentee table: fix wrong less import (T319321)
  • 17:49 ejegg: turned off fundraising scheduled jobs for civi deploy
  • 17:28 tzatziki: removing 4 files for legal compliance
  • 17:04 mutante: gerrit - deployed 832345 - scap and daemon users became decoupled (T317412)
  • 17:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:25 brennen@deploy1002: Pruned MediaWiki: 1.40.0-wmf.2 (duration: 02m 02s)
  • 16:24 brennen@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.4 refs T314193 (duration: 28m 55s)
  • 16:21 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dns4003.wikimedia.org with OS bullseye
  • 16:03 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 16:00 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 16:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2003.codfw.wmnet with OS buster
  • 15:54 brennen@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.4 refs T314193
  • 15:53 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 15:53 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
  • 15:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
  • 15:51 brennen: restarting `/usr/bin/scap stage-train --yes auto` after failed staging (T314193), cc: ^demon
  • 15:48 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
  • 15:47 sukhe: disable Puppet on A:cp and A:eqiad for T309651
  • 15:42 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
  • 15:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
  • 15:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
  • 15:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:25 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:16 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2003.codfw.wmnet with OS buster
  • 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2003.codfw.wmnet with reason: Prep for reimage
  • 15:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2003.codfw.wmnet with reason: Prep for reimage
  • 15:10 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2002.codfw.wmnet with OS buster
  • 15:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
  • 15:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
  • 15:06 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
  • 15:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:02 moritzm: installing snakeyaml security updates
  • 14:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 14:55 papaul: maintenance complete on msw1-codfw
  • 14:51 sukhe: disable Puppet on A:cp and A:esams for T309651
  • 14:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
  • 14:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
  • 14:40 moritzm: installing maven-shared-utils security updates
  • 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2002.codfw.wmnet with OS buster
  • 14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2002.codfw.wmnet with reason: Prep for reimage
  • 14:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2002.codfw.wmnet with reason: Prep for reimage
  • 14:30 papaul: on going maintenance on msw1-codfw
  • 14:29 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 14:27 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 14:22 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 14:14 XioNoX: netbox - Move VRRP IPs to FHRP group feature - T311218
  • 14:13 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 14:12 filippo@cumin1001: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 14:12 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/tests/phpunit/: Backport: Revert "Introduce LanguageVariantConverter" (T319282) (2/2; no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 52s)
  • 14:12 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 14:08 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/includes/: Backport: Revert "Introduce LanguageVariantConverter" (T319282) (1/2; no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 43s)
  • 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:03 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/extensions/Kartographer/modules/dialog: Backport: Log basic nearby and fullscreen events (T315972, T318678) (no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 42s)
  • 14:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:55 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 13:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
  • 13:54 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
  • 13:49 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35347 and previous config saved to /var/cache/conftool/dbconfig/20221004-134947-root.json
  • 13:49 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
  • 13:48 sukhe: disable Puppet on A:cp and A:eqsin for T309651
  • 13:47 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
  • 13:42 awight: EU backport window finished.
  • 13:40 filippo@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 13:38 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 awight@deploy1002: Finished scap: Backport for Wire new event stream for maps interactions (T315972 T318678) (duration: 06m 49s)
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:35 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
  • 13:35 filippo@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "filippo test - filippo@cumin1001"
  • 13:34 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "filippo test - filippo@cumin1001"
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35346 and previous config saved to /var/cache/conftool/dbconfig/20221004-133442-root.json
  • 13:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbox - try 2 - CR826559 - ayounsi@cumin1001
  • 13:31 jbond: re-enable puppet post deploy a puppetmaster change 838144
  • 13:30 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbox - try 2 - CR826559 - ayounsi@cumin1001
  • 13:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbx CR826559 - ayounsi@cumin1001
  • 13:30 awight@deploy1002: awight and awight: Backport for Wire new event stream for maps interactions (T315972 T318678) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:29 awight@deploy1002: Started scap: Backport for Wire new event stream for maps interactions (T315972 T318678)
  • 13:28 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbx CR826559 - ayounsi@cumin1001
  • 13:27 awight@deploy1002: Finished scap: Backport for ukwiki: Create flood group (T319243) (duration: 05m 16s)
  • 13:24 jbond: disable puppet to deploy a puppetmaster change 838144
  • 13:22 awight@deploy1002: awight and stang: Backport for ukwiki: Create flood group (T319243) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:21 awight@deploy1002: Started scap: Backport for ukwiki: Create flood group (T319243)
  • 13:21 awight@deploy1002: Finished scap: Backport for throttle: Add throttle rule for 2022-10-13 (T319244) (duration: 12m 48s)
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35345 and previous config saved to /var/cache/conftool/dbconfig/20221004-131937-root.json
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:11 awight@deploy1002: awight and stang: Backport for throttle: Add throttle rule for 2022-10-13 (T319244) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:08 awight@deploy1002: Started scap: Backport for throttle: Add throttle rule for 2022-10-13 (T319244)
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35343 and previous config saved to /var/cache/conftool/dbconfig/20221004-130432-root.json
  • 12:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 12:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 12:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 12:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35342 and previous config saved to /var/cache/conftool/dbconfig/20221004-124927-root.json
  • 12:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35341 and previous config saved to /var/cache/conftool/dbconfig/20221004-123422-root.json
  • 12:31 cgoubert@deploy1002: Finished deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # T310458 (duration: 00m 58s)
  • 12:30 cgoubert@deploy1002: Started deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # T310458
  • 12:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
  • 12:26 cgoubert@deploy1002: Finished deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # T310458 (duration: 00m 14s)
  • 12:26 cgoubert@deploy1002: Started deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # T310458
  • 12:21 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35340 and previous config saved to /var/cache/conftool/dbconfig/20221004-121917-root.json
  • 12:14 volans: uploaded python3-gjson_0.1.0 to apt.wikimedia.org bullseye-wikimedia
  • 12:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:10 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 12:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:08 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host sessionstore2001.codfw.wmnet with OS buster
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35339 and previous config saved to /var/cache/conftool/dbconfig/20221004-120413-root.json
  • 11:55 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
  • 11:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
  • 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 11:22 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 11:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 11:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 11:05 jayme: published calico 3.23.3 debian packages in bullseye component/calico323 as well as corresponding docker images - T307943
  • 11:04 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 10:58 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 10:58 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 10:56 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS buster
  • 10:55 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 10:54 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 10:54 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2001.codfw.wmnet with OS buster
  • 10:53 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 10:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 135158
  • 10:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 135158
  • 10:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9119
  • 10:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9119
  • 10:41 moritzm: installing expat security updates
  • 09:59 jmm@cumin2002: END (FAIL) - Cookbook sre.maps.roll-restart (exit_code=1) rolling restart_daemons on A:maps-codfw
  • 09:47 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 09:46 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 09:46 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 09:46 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 09:45 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 09:44 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 09:44 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 09:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 09:42 jayme: deployed istio-ingressgateway with additional envoy native metrics to wikikube codfw and eqiad
  • 09:40 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS buster
  • 09:37 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-codfw
  • 09:36 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2001.codfw.wmnet with reason: Prep for reimage
  • 09:36 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2001.codfw.wmnet with reason: Prep for reimage
  • 09:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 20 hosts
  • 09:35 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 20 hosts
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35338 and previous config saved to /var/cache/conftool/dbconfig/20221004-093530-root.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35337 and previous config saved to /var/cache/conftool/dbconfig/20221004-092025-root.json
  • 09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35336 and previous config saved to /var/cache/conftool/dbconfig/20221004-090520-root.json
  • 08:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: php7.2 removal
  • 08:55 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: php7.2 removal
  • 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35335 and previous config saved to /var/cache/conftool/dbconfig/20221004-085015-root.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35334 and previous config saved to /var/cache/conftool/dbconfig/20221004-083511-root.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35333 and previous config saved to /var/cache/conftool/dbconfig/20221004-082005-root.json
  • 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35332 and previous config saved to /var/cache/conftool/dbconfig/20221004-080500-root.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2181', diff saved to https://phabricator.wikimedia.org/P35331 and previous config saved to /var/cache/conftool/dbconfig/20221004-080338-root.json
  • 07:52 moritzm: installing libdatetime-timezone-perl updates (catching up with latest timezone changes)
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35330 and previous config saved to /var/cache/conftool/dbconfig/20221004-074955-root.json
  • 07:36 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
  • 07:36 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35329 and previous config saved to /var/cache/conftool/dbconfig/20221004-072158-root.json
  • 07:16 elukey: restart kafka on kafka-logging1001 to pick up its new PKI TLS cert
  • 07:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade
  • 07:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35328 and previous config saved to /var/cache/conftool/dbconfig/20221004-070653-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35327 and previous config saved to /var/cache/conftool/dbconfig/20221004-065148-root.json
  • 06:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35326 and previous config saved to /var/cache/conftool/dbconfig/20221004-063643-root.json
  • 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 25885
  • 06:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 25885
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35325 and previous config saved to /var/cache/conftool/dbconfig/20221004-062138-root.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35324 and previous config saved to /var/cache/conftool/dbconfig/20221004-060633-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35323 and previous config saved to /var/cache/conftool/dbconfig/20221004-055128-root.json
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35322 and previous config saved to /var/cache/conftool/dbconfig/20221004-053623-root.json
  • 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-10-03

  • 21:45 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:44 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:44 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
  • 21:18 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
  • 19:41 ryankemper: [Elastic] Unbanned `elastic1066`
  • 19:37 ryankemper: [Elastic] Restarted psi on `elastic1066`; will unban host after process is up and running
  • 19:32 robh: msw1-ulsfo swap successful, mgmt recovering in icinga and tested connection with 3 servers all work
  • 19:25 robh: msw1-ulsfo swap, some mgmt flapping expected, swap complete but not powered back up yet
  • 19:22 ryankemper: [Elastic] Banned `elastic1066` (`curl -H 'Content-Type: application/json' -XPUT http://localhost:9600/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude":{"_host": "","_name": "elastic1066-production-search-psi-eqiad"}'`); will restart elasticsearch-psi after shards drain}}
  • 19:15 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
  • 18:48 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
  • 18:41 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
  • 18:34 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
  • 18:30 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:30 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4045.ulsfo.wmnet with OS buster
  • 18:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:12 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:06 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:04 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:00 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:52 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:42 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:41 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns4003
  • 17:41 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns4003
  • 17:40 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:37 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:29 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 17:29 sukhe: running homer "cr*-ulsfo*" commit "Gerrit 837727: remove dns4001 for anycast neighbors."
  • 17:13 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns4001.wikimedia.org
  • 17:13 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:08 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns4001.wikimedia.org
  • 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 30781
  • 16:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 30781
  • 16:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:24 urbanecm@deploy1002: Finished scap: Backport for throttle: Remove out of date rules (duration: 04m 16s)
  • 16:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:20 urbanecm@deploy1002: urbanecm and urbanecm: Backport for throttle: Remove out of date rules synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 16:20 urbanecm@deploy1002: Started scap: Backport for throttle: Remove out of date rules
  • 16:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cae49b8: throttle: Add throttle rule for 2022-10-06 (T319212) (duration: 04m 21s)
  • 16:14 sukhe: disable Puppet on cp hosts in codfw: rolling out T309651
  • 15:15 sukhe: disable Puppet on cp hosts in ulsfo: rolling out T309651
  • 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35320 and previous config saved to /var/cache/conftool/dbconfig/20221003-151438-root.json
  • 15:06 papaul: maintenance complete on mr1-esams
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35319 and previous config saved to /var/cache/conftool/dbconfig/20221003-145933-root.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35318 and previous config saved to /var/cache/conftool/dbconfig/20221003-144428-root.json
  • 14:35 sukhe: upgrade A:cp and A:drmrs to ATS 9.1.3-1wm2 from 9.1.3-1wm1: T309651
  • 14:31 papaul: on going maintenance on mr1-esams
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35317 and previous config saved to /var/cache/conftool/dbconfig/20221003-142923-root.json
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35316 and previous config saved to /var/cache/conftool/dbconfig/20221003-141417-root.json
  • 14:08 sukhe: upgrade cp4026, cp4032 to ATS 9.1.3-1wm2 from 9.1.3-1wm1: T309651
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35315 and previous config saved to /var/cache/conftool/dbconfig/20221003-135912-root.json
  • 13:57 sukhe: reprepro -C component/trafficserver9 include buster-wikimedia trafficserver_9.1.3-1wm2_amd64.changes: T309651
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35314 and previous config saved to /var/cache/conftool/dbconfig/20221003-134407-root.json
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35313 and previous config saved to /var/cache/conftool/dbconfig/20221003-134024-root.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35312 and previous config saved to /var/cache/conftool/dbconfig/20221003-132902-root.json
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35311 and previous config saved to /var/cache/conftool/dbconfig/20221003-132519-root.json
  • 13:18 vgutierrez: enforcing origin-form|asterisk-form for request-target on varnish (could trigger spikes of HTTP 400 errors) - T318676
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35310 and previous config saved to /var/cache/conftool/dbconfig/20221003-131014-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35308 and previous config saved to /var/cache/conftool/dbconfig/20221003-125509-root.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35307 and previous config saved to /var/cache/conftool/dbconfig/20221003-124004-root.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35306 and previous config saved to /var/cache/conftool/dbconfig/20221003-122459-root.json
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35305 and previous config saved to /var/cache/conftool/dbconfig/20221003-120954-root.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123', diff saved to https://phabricator.wikimedia.org/P35303 and previous config saved to /var/cache/conftool/dbconfig/20221003-120208-root.json
  • 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
  • 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
  • 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
  • 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35302 and previous config saved to /var/cache/conftool/dbconfig/20221003-115449-root.json
  • 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
  • 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
  • 11:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
  • 11:28 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
  • 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1003.eqiad.wmnet with OS buster
  • 11:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
  • 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
  • 11:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
  • 11:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
  • 10:52 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1003.eqiad.wmnet with OS buster
  • 10:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
  • 10:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
  • 10:41 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
  • 10:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1002.eqiad.wmnet with OS buster
  • 10:40 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
  • 10:40 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
  • 10:39 hnowlan: starting cassandra on reimaged sessionstore1002
  • 10:37 _joe_: remove stale druid.svc.eqiad.wmnet certificate from the puppetmaster CA; it was expired anyways
  • 10:32 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
  • 10:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 10:31 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 10:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
  • 10:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
  • 10:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1002.eqiad.wmnet with OS buster
  • 10:00 hnowlan: c-foreach-nt drain on sessionstore1002
  • 10:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
  • 10:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35300 and previous config saved to /var/cache/conftool/dbconfig/20221003-092519-root.json
  • 09:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31133
  • 09:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31133
  • 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62044
  • 09:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62044
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35299 and previous config saved to /var/cache/conftool/dbconfig/20221003-091014-root.json
  • 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
  • 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2157', diff saved to https://phabricator.wikimedia.org/P35297 and previous config saved to /var/cache/conftool/dbconfig/20221003-085840-root.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35296 and previous config saved to /var/cache/conftool/dbconfig/20221003-085509-root.json
  • 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12975
  • 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12975
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35295 and previous config saved to /var/cache/conftool/dbconfig/20221003-085007-root.json
  • 08:40 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5001.eqsin.wmnet
  • 08:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35294 and previous config saved to /var/cache/conftool/dbconfig/20221003-084004-root.json
  • 08:39 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3303
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3303
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35293 and previous config saved to /var/cache/conftool/dbconfig/20221003-083729-root.json
  • 08:36 vgutierrez@cumin1001: START - Cookbook sre.dns.netbox
  • 08:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35292 and previous config saved to /var/cache/conftool/dbconfig/20221003-083502-root.json
  • 08:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
  • 08:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5001.eqsin.wmnet
  • 08:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15557
  • 08:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15557
  • 08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12975
  • 08:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12975
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35291 and previous config saved to /var/cache/conftool/dbconfig/20221003-082459-root.json
  • 08:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 30781
  • 08:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 30781
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35290 and previous config saved to /var/cache/conftool/dbconfig/20221003-082224-root.json
  • 08:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 39386
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35289 and previous config saved to /var/cache/conftool/dbconfig/20221003-081955-root.json
  • 08:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 39386
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35288 and previous config saved to /var/cache/conftool/dbconfig/20221003-080954-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35287 and previous config saved to /var/cache/conftool/dbconfig/20221003-080719-root.json
  • 08:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 16509
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35286 and previous config saved to /var/cache/conftool/dbconfig/20221003-080556-root.json
  • 08:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16509
  • 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
  • 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35285 and previous config saved to /var/cache/conftool/dbconfig/20221003-080451-root.json
  • 07:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
  • 07:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2178', diff saved to https://phabricator.wikimedia.org/P35284 and previous config saved to /var/cache/conftool/dbconfig/20221003-075643-root.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35283 and previous config saved to /var/cache/conftool/dbconfig/20221003-075449-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35282 and previous config saved to /var/cache/conftool/dbconfig/20221003-075214-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35281 and previous config saved to /var/cache/conftool/dbconfig/20221003-075051-root.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35280 and previous config saved to /var/cache/conftool/dbconfig/20221003-074946-root.json
  • 07:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16637
  • 07:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16637
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35279 and previous config saved to /var/cache/conftool/dbconfig/20221003-073944-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35278 and previous config saved to /var/cache/conftool/dbconfig/20221003-073709-root.json
  • 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
  • 07:36 XioNoX: cr2-drmrs# set chassis fpc 0 sampling-instance pmacct
  • 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35277 and previous config saved to /var/cache/conftool/dbconfig/20221003-073627-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1200', diff saved to https://phabricator.wikimedia.org/P35276 and previous config saved to /var/cache/conftool/dbconfig/20221003-073556-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35275 and previous config saved to /var/cache/conftool/dbconfig/20221003-073546-root.json
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35274 and previous config saved to /var/cache/conftool/dbconfig/20221003-073441-root.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35273 and previous config saved to /var/cache/conftool/dbconfig/20221003-072741-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35272 and previous config saved to /var/cache/conftool/dbconfig/20221003-072204-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35271 and previous config saved to /var/cache/conftool/dbconfig/20221003-072122-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35270 and previous config saved to /var/cache/conftool/dbconfig/20221003-072041-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35269 and previous config saved to /var/cache/conftool/dbconfig/20221003-071936-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35268 and previous config saved to /var/cache/conftool/dbconfig/20221003-071236-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 3%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35267 and previous config saved to /var/cache/conftool/dbconfig/20221003-070659-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35266 and previous config saved to /var/cache/conftool/dbconfig/20221003-070617-root.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35265 and previous config saved to /var/cache/conftool/dbconfig/20221003-070536-root.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35264 and previous config saved to /var/cache/conftool/dbconfig/20221003-070431-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2175', diff saved to https://phabricator.wikimedia.org/P35263 and previous config saved to /var/cache/conftool/dbconfig/20221003-065844-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35262 and previous config saved to /var/cache/conftool/dbconfig/20221003-065731-root.json
  • 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 6128
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 1%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35261 and previous config saved to /var/cache/conftool/dbconfig/20221003-065154-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35260 and previous config saved to /var/cache/conftool/dbconfig/20221003-065112-root.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35259 and previous config saved to /var/cache/conftool/dbconfig/20221003-065031-root.json
  • 06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 6128
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P35258 and previous config saved to /var/cache/conftool/dbconfig/20221003-064638-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35257 and previous config saved to /var/cache/conftool/dbconfig/20221003-064226-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35256 and previous config saved to /var/cache/conftool/dbconfig/20221003-063607-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35255 and previous config saved to /var/cache/conftool/dbconfig/20221003-063527-root.json
  • 06:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11039
  • 06:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11039
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35254 and previous config saved to /var/cache/conftool/dbconfig/20221003-062721-root.json
  • 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 5400
  • 06:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 5400
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35253 and previous config saved to /var/cache/conftool/dbconfig/20221003-062102-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35252 and previous config saved to /var/cache/conftool/dbconfig/20221003-062022-root.json
  • 06:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3300
  • 06:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3300
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35251 and previous config saved to /var/cache/conftool/dbconfig/20221003-061216-root.json
  • 06:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35250 and previous config saved to /var/cache/conftool/dbconfig/20221003-060557-root.json
  • 06:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15133
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35249 and previous config saved to /var/cache/conftool/dbconfig/20221003-055711-root.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P35248 and previous config saved to /var/cache/conftool/dbconfig/20221003-055401-root.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35247 and previous config saved to /var/cache/conftool/dbconfig/20221003-055052-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167', diff saved to https://phabricator.wikimedia.org/P35246 and previous config saved to /var/cache/conftool/dbconfig/20221003-054245-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35245 and previous config saved to /var/cache/conftool/dbconfig/20221003-054206-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P35244 and previous config saved to /var/cache/conftool/dbconfig/20221003-052927-root.json

2022-10-02

  • 08:13 elukey: `apt-get clean` on an-airflow1001 to free some space on the root partition

2022-10-01

  • 13:24 fab@deploy1002: Finished deploy [airflow-dags/research@44a1158]: (no justification provided) (duration: 00m 08s)
  • 13:24 fab@deploy1002: Started deploy [airflow-dags/research@44a1158]: (no justification provided)
  • 13:12 fab@deploy1002: Finished deploy [airflow-dags/research@d6b3e82]: (no justification provided) (duration: 03m 35s)
  • 13:08 fab@deploy1002: Started deploy [airflow-dags/research@d6b3e82]: (no justification provided)

Other archives

2000s

2010s

2020s