Jump to content

Server Admin Log/Archive 43

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2020-12-29

  • 15:52 vgutierrez: reloading nginx on cloudelastic1005 and cloudelastic1006
  • 15:48 vgutierrez: triggering a puppet run on cp nodes
  • 15:45 vgutierrez: restarting acme-chief on acmechief1001

2020-12-28

  • 09:54 elukey: reboot an-coord1002 (puppet in D state after issues with broken disk - host in standby, no traffic)

2020-12-24

  • 12:41 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:37 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 12:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:23 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 11:22 volans: running on cumin1001: homer asw2-*-eqiad.mgmt.eqiad.wmnet commit "Fix numbering of an-worker hosts - T260445"
  • 11:08 hashar: gerrit2001 (replica) restarting Gerrit server
  • 00:45 legoktm: reset maxmind password

2020-12-23

  • 21:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:51 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:15 cdanis: disabling puppet on alert1001 for klaxon rollout
  • 09:59 hashar: gerrit: removed old gerrit directory /srv/var-lib-gerrit2-cobalt.wikimedia.org/.gerritcodereview/ (was some tmp dirs for Gerrit jars )
  • 09:54 volans: upgraded python3-wmflib to 0.0.5 on cumin1001
  • 05:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix typo in autoreview right of eliminators in fawiki (duration: 00m 57s)

2020-12-22

  • 21:57 mutante: apt1001 - sudo systemctl status rsync-aptrepo-apt2001.wikimedia.org.service - confirmed timer job is working like the cron before
  • 21:31 mutante: deploy1002/deploy2002 - apt-get remove --purge php-readline and let puppet reinstall it (7.2 vs 7.3 after gerrit 651158) T265963
  • 21:26 andrewbogott: upgrading wikitech-static: mediawiki to 1.35.1 and general apt upgrade
  • 20:26 eileen: civicrm revision changed from e86e756807 to 6150267979, config revision is 52f1cbc5dd
  • 19:34 mutante: restarting gerrit to pick up config change in gitiles for T269300
  • 18:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labstore1004.eqiad.wmnet with reason: REIMAGE
  • 18:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labstore1004.eqiad.wmnet with reason: REIMAGE
  • 17:27 andrewbogott: shutting down labstore1004 in preparation for move and reimage
  • 16:51 mforns@deploy1001: Finished deploy [analytics/refinery@21c0c89] (thin): Regular analytics weekly train THIN [analytics/refinery@Ie7bce02179547ee4c6756d52f9956f492c5b4df6] (duration: 00m 08s)
  • 16:51 mforns@deploy1001: Started deploy [analytics/refinery@21c0c89] (thin): Regular analytics weekly train THIN [analytics/refinery@Ie7bce02179547ee4c6756d52f9956f492c5b4df6]
  • 16:48 volans: restarted ferm on ms-be1026 (failed with DNS query for 'ms-be1055.eqiad.wmnet' failed: query timed out )
  • 16:15 bstorm: downtimed and stopped puppet on labstore1004 and labstore1005 for failover T266202
  • 15:23 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:12 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:08 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 11:52 marostegui: Set db1151 to writable T269324
  • 11:10 jbond42: upload puppet 5.5.22 to jessie-wikimedia
  • 11:02 jbond42: upload puppet 5.5.22 to stretch-wikimedia
  • 10:51 volans@cumin2001: test SAL message from wmflib, please ignore
  • 10:06 volans: upgraded python3-wmflib to 0.0.5 on cumin2001
  • 08:52 hashar: gerrit: running jhat heap analyzer on gerrit2001 # T263008
  • 07:27 elukey: reboot stat100[4-8] (analytics hadoop clients) for kernel upgrades
  • 00:20 crusnov@deploy1001: Finished deploy [netbox/deploy@b17db99]: Redeploy of 2.9.10 to netbox-dev for dep test (duration: 00m 54s)
  • 00:19 crusnov@deploy1001: Started deploy [netbox/deploy@b17db99]: Redeploy of 2.9.10 to netbox-dev for dep test

2020-12-21

  • 23:20 legoktm@deploy1001: Synchronized docroot/noc/conf/index.php: noc: Fix "Currently active MediaWiki versions" (T235338) (duration: 00m 54s)
  • 22:26 crusnov@deploy1001: Finished deploy [netbox/deploy@0362a12]: Deploy of 2.9.10 to netbox-dev for script testing p2 (duration: 00m 05s)
  • 22:26 crusnov@deploy1001: Started deploy [netbox/deploy@0362a12]: Deploy of 2.9.10 to netbox-dev for script testing p2
  • 22:26 crusnov@deploy1001: Finished deploy [netbox/deploy@0362a12]: Deploy of 2.9.10 to netbox-dev for script testing (duration: 01m 01s)
  • 22:25 sbassett: Deployed security patch T270453
  • 22:25 crusnov@deploy1001: Started deploy [netbox/deploy@0362a12]: Deploy of 2.9.10 to netbox-dev for script testing
  • 22:18 chaomodus: Re-enabling puppet on Netbox production instances after havintg tested netbox2001 with new puppet code T266487
  • 21:42 legoktm: manually imported debs to buster-wikimedia thirdparty/pyall component (T241195)
  • 21:09 chaomodus: merging change 643354 for Netbox 2.9 support, puppet disabled on production machines until testing completed T266487
  • 19:47 dcausse: repool wdqs1011
  • 18:30 dancy@deploy1001: Finished scap: Backport of l10n changes for T270619 (duration: 21m 12s)
  • 18:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
  • 18:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
  • 18:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
  • 18:16 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
  • 18:14 volans: uploaded python3-wmflib_0.0.5 to apt.wikimedia.org buster-wikimedia
  • 18:09 dancy@deploy1001: Started scap: Backport of l10n changes for T270619
  • 17:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2024.codfw.wmnet with reason: REIMAGE
  • 17:56 legoktm@deploy1001: Synchronized /srv/mediawiki-staging/php-1.36.0-wmf.22/extensions/FeaturedFeeds/includes/FeaturedFeeds.php: Don't load entire feed just to output the link to it (T266900) (duration: 01m 01s)
  • 17:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1024.eqiad.wmnet with reason: REIMAGE
  • 17:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2024.codfw.wmnet with reason: REIMAGE
  • 17:54 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1024.eqiad.wmnet with reason: REIMAGE
  • 17:33 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
  • 17:31 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
  • 14:43 jbond42: disable puppet to upgrade puppet master packages
  • 14:43 jbond42: upload puppet_5.5.22-1 to wikimedia-buster
  • 14:20 jbond42: update puppet on puppetmaster1001
  • 14:16 jbond42: update puppet on puppetmaster1003
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After cloning db1154:3313', diff saved to https://phabricator.wikimedia.org/P13616 and previous config saved to /var/cache/conftool/dbconfig/20201221-141555-root.json
  • 14:15 moritzm: installung sleuthkit security updates on buster
  • 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After cloning db1154:3313', diff saved to https://phabricator.wikimedia.org/P13615 and previous config saved to /var/cache/conftool/dbconfig/20201221-140051-root.json
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After cloning db1154:3313', diff saved to https://phabricator.wikimedia.org/P13614 and previous config saved to /var/cache/conftool/dbconfig/20201221-134548-root.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After cloning db1154:3313', diff saved to https://phabricator.wikimedia.org/P13613 and previous config saved to /var/cache/conftool/dbconfig/20201221-133044-root.json
  • 12:57 hashar: Gerrit briefly paused due to erroneous run of `jmap -clstats`
  • 12:22 hashar: Running jhat on gerrit1001 to analyze a heap dump, expect CPU usage
  • 11:48 moritzm: installing libxstream-java security updates on buster
  • 11:31 moritzm: installing php-pear security updates on buster
  • 09:47 _joe_: logging out of the long-running root screen session on maps1010
  • 09:46 _joe_: logging out of the long-running root screen session on maps1001
  • 09:46 _joe_: systemctl reset-failed on deneb, timeout downloading a docker image from the registry
  • 09:24 dcausse: depooling wdqs1011 (lag)
  • 09:19 _joe_: powercycling wdqs1011, unresponsive to ssh
  • 08:31 dcausse@deploy1001: Finished deploy [wdqs/wdqs@512d713]: GUI updates (T269224+i18n updates) (duration: 08m 57s)
  • 08:22 dcausse@deploy1001: Started deploy [wdqs/wdqs@512d713]: GUI updates (T269224+i18n updates)
  • 08:15 marostegui: Add ips to the x2 instances on dbctl T269324
  • 07:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE
  • 07:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1023.eqiad.wmnet with reason: REIMAGE
  • 07:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE
  • 07:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1023.eqiad.wmnet with reason: REIMAGE
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 T268742 ', diff saved to https://phabricator.wikimedia.org/P13609 and previous config saved to /var/cache/conftool/dbconfig/20201221-070748-marostegui.json
  • 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:35 marostegui: Compress clouddb1017:3313 clouddb1013:3313 T270473

2020-12-18

  • 21:52 thcipriani: flushing gerrit project cache
  • 20:50 ejegg: updated payments-wiki from 3d3055c478 to c3e6c5f9f4
  • 19:41 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 19:41 nskaggs@cumin1001: Added views for new wiki: skrwiki T268412
  • 19:31 mutante: restarted wikibugs (phab, gerrit and irc jobs)
  • 19:30 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 19:24 mutante: tools.wikibugs@tools-sgebastion-07:~/wikibugs2$ qdel 1766104
  • 19:22 mutante: bug: wikibugs stopped reporting bugs, attempting to restart bug bot to continue reporting bugs
  • 19:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 19:07 nskaggs@cumin1001: Added views for new wiki: madwiki T269440
  • 18:56 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 18:39 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: update codfw1dev deploy (duration: 02m 17s)
  • 18:36 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: update codfw1dev deploy
  • 18:36 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: update codfw1dev deploy (duration: 00m 09s)
  • 18:36 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: update codfw1dev deploy
  • 18:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labstore1005.eqiad.wmnet with reason: REIMAGE
  • 18:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labstore1005.eqiad.wmnet with reason: REIMAGE
  • 17:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:40 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb2001-dev.wikimedia.org with reason: REIMAGE
  • 17:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb2001-dev.wikimedia.org with reason: REIMAGE
  • 16:19 shdubsh: restart logstash on logstash2004
  • 16:00 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2021.codfw.wmnet with reason: REIMAGE
  • 13:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1021.eqiad.wmnet with reason: REIMAGE
  • 13:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2021.codfw.wmnet with reason: REIMAGE
  • 13:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1021.eqiad.wmnet with reason: REIMAGE
  • 13:22 marostegui: Compress clouddb1018:3312 clouddb1014:3312 T270473
  • 10:59 jynus: starting test swift backup of enwiki on a single thread towards dbstore2003 T264189
  • 10:53 jynus: returning db2102 to its original state
  • 10:20 hashar@deploy1001: Finished deploy [integration/docroot@1166384]: noop: clear out proper env variable in tests (duration: 00m 07s)
  • 10:20 hashar@deploy1001: Started deploy [integration/docroot@1166384]: noop: clear out proper env variable in tests
  • 09:13 marostegui: Compress clouddb1018:3317 clouddb1014:3317 T270473
  • 08:26 jynus: temporarily taking db2102 offline for mysql testing
  • 07:54 elukey: on kafka-test10[08-10] - "ip addr flush dev ens5; systemctl restart ifup@ens5.service"
  • 07:06 marostegui: Stop mysql on db1124:3313 T268742
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1013 from dbctl T268436', diff saved to https://phabricator.wikimedia.org/P13600 and previous config saved to /var/cache/conftool/dbconfig/20201218-070235-marostegui.json
  • 07:00 marostegui: Compress clouddb1019:3316 clouddb1015:3316 T270473
  • 06:53 marostegui: Compress clouddb1020:3315 clouddb1016:3315 T270473
  • 01:34 legoktm: restarted gerrit (T270451)
  • 01:33 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on gerrit1001.wikimedia.org with reason: OOM
  • 01:33 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on gerrit1001.wikimedia.org with reason: OOM
  • 01:27 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.22/includes/EditPage.php: 4c224bb: SECURITY: Act like users dont exist if hidden from viewer (T120883) (duration: 00m 53s)
  • 01:22 mutante: signing puppet certs and installing buster on doc1002/doc2001 with "insetup" role
  • 00:51 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.22/resources/src/vue/index.js: ed8212b: Revert "vue: Log component errors" (duration: 00m 55s)
  • 00:30 foks: reset email for Sutton12
  • 00:23 mutante: DNS - new project language 'nia' added - The Nias language is an Austronesian language spoken on Nias Island and the Batu Islands off the west coast of Sumatra in Indonesia.
  • 00:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3cc5caa: Undeploy graphoid for arwiki. Phase 4. (T270443) (duration: 00m 55s)
  • 00:11 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)

2020-12-17

  • 23:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 23:04 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 22:51 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 22:47 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 22:09 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 21:06 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 21:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 21:06 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 21:01 ema: cp3052: ban 'req.http.host == "docker-registry.wikimedia.org"' T270270
  • 20:24 marxarelli: all wikis to 1.36.0-wmf.22 complete. no new errors or concerning rates (refs T267415)
  • 20:16 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.22
  • 19:41 Urbanecm: Morning B&C window done
  • 19:39 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.22/includes/page/Article.php: 6c97eed: Article: view from old revision cache - set correct revId (T270361) (duration: 01m 04s)
  • 19:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 87cce84: add wikitech to mediawiki import sources (T270284) (duration: 01m 04s)
  • 19:12 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 2c1b339: Load IPInfo extension in CommonSettings.php [noop for prod] (T260599) (duration: 01m 04s)
  • 19:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5404687: Add IPInfo config to InitialiseSettings.php [noop for prod] (T260599) (duration: 01m 03s)
  • 19:08 urbanecm@deploy1001: Synchronized wmf-config/extension-list: 5180f12: extension-list: Add IPInfo extension (T260599) (duration: 01m 03s)
  • 19:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 19:07 nskaggs@cumin1001: Added views for new wiki: eowikivoyage T269427
  • 19:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b24b6f5: Configure GrowthExperiments on Bangla Wikipedia [noop for prod] (T266020) (duration: 01m 03s)
  • 18:56 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 18:37 mutante: doc1001 rm -rf /srv/docroot
  • 17:55 mutante: doc1001 systemctl restart apache2
  • 17:55 mutante: doc1001 - mv /srv/docroot/org/wikimedia/doc/ /srv/doc
  • 17:55 mutante: doc1001 - systemctl restart rsync
  • 17:40 mutante: restarted apache2 on doc1001
  • 17:38 mutante: restarted apache2 on doc1001
  • 16:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2020.codfw.wmnet with reason: REIMAGE
  • 16:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2020.codfw.wmnet with reason: REIMAGE
  • 16:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1020.eqiad.wmnet with reason: REIMAGE
  • 16:36 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1020.eqiad.wmnet with reason: REIMAGE
  • 16:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092 with its original weight into API', diff saved to https://phabricator.wikimedia.org/P13590 and previous config saved to /var/cache/conftool/dbconfig/20201217-161453-marostegui.json
  • 16:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092 into API', diff saved to https://phabricator.wikimedia.org/P13589 and previous config saved to /var/cache/conftool/dbconfig/20201217-161052-marostegui.json
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1092 (re)pooling @ 100%: Repooling after cloning db1092 after serving as vslow/dump', diff saved to https://phabricator.wikimedia.org/P13588 and previous config saved to /var/cache/conftool/dbconfig/20201217-160930-root.json
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1092 (re)pooling @ 75%: Repooling after cloning db1092 after serving as vslow/dump', diff saved to https://phabricator.wikimedia.org/P13586 and previous config saved to /var/cache/conftool/dbconfig/20201217-155427-root.json
  • 15:54 ppchelko@deploy1001: Finished deploy [restbase/deploy@4c2e0b6]: Add various wikis T268459 T269428 T269433 T268413 T269441 (duration: 17m 51s)
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1092 (re)pooling @ 50%: Repooling after cloning db1092 after serving as vslow/dump', diff saved to https://phabricator.wikimedia.org/P13585 and previous config saved to /var/cache/conftool/dbconfig/20201217-153923-root.json
  • 15:36 ppchelko@deploy1001: Started deploy [restbase/deploy@4c2e0b6]: Add various wikis T268459 T269428 T269433 T268413 T269441
  • 15:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@8e9253b]: Add various wikis T268459 T269428 T269433 T268413 T269441 (duration: 31m 57s)
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1092 (re)pooling @ 25%: Repooling after cloning db1092 after serving as vslow/dump', diff saved to https://phabricator.wikimedia.org/P13584 and previous config saved to /var/cache/conftool/dbconfig/20201217-152420-root.json
  • 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully depool db1092', diff saved to https://phabricator.wikimedia.org/P13583 and previous config saved to /var/cache/conftool/dbconfig/20201217-152347-marostegui.json
  • 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 after cloning db1154:3318', diff saved to https://phabricator.wikimedia.org/P13582 and previous config saved to /var/cache/conftool/dbconfig/20201217-152233-marostegui.json
  • 15:00 ppchelko@deploy1001: Started deploy [restbase/deploy@8e9253b]: Add various wikis T268459 T269428 T269433 T268413 T269441
  • 14:08 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php wikidatawiki --property-id P920 --new-data-type external-id # T269205
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into API', diff saved to https://phabricator.wikimedia.org/P13579 and previous config saved to /var/cache/conftool/dbconfig/20201217-134513-marostegui.json
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1089 (re)pooling @ 100%: Repool db1089 after helping out on db1106', diff saved to https://phabricator.wikimedia.org/P13578 and previous config saved to /var/cache/conftool/dbconfig/20201217-134134-root.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1089 (re)pooling @ 75%: Repool db1089 after helping out on db1106', diff saved to https://phabricator.wikimedia.org/P13577 and previous config saved to /var/cache/conftool/dbconfig/20201217-132631-root.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repooling after cloning db1154:3315 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13576 and previous config saved to /var/cache/conftool/dbconfig/20201217-132603-root.json
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1089 (re)pooling @ 50%: Repool db1089 after helping out on db1106', diff saved to https://phabricator.wikimedia.org/P13575 and previous config saved to /var/cache/conftool/dbconfig/20201217-131127-root.json
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repooling after cloning db1154:3315 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13574 and previous config saved to /var/cache/conftool/dbconfig/20201217-131059-root.json
  • 13:01 marostegui: Stop mysql on db1087 to clone db1154
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 to clone db1154:3318 add db1092 as vslow,dump service for s8 T268742 ', diff saved to https://phabricator.wikimedia.org/P13571 and previous config saved to /var/cache/conftool/dbconfig/20201217-130101-marostegui.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1089 (re)pooling @ 25%: Repool db1089 after helping out on db1106', diff saved to https://phabricator.wikimedia.org/P13570 and previous config saved to /var/cache/conftool/dbconfig/20201217-125624-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repooling after cloning db1154:3315 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13569 and previous config saved to /var/cache/conftool/dbconfig/20201217-125556-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1089 weights', diff saved to https://phabricator.wikimedia.org/P13568 and previous config saved to /var/cache/conftool/dbconfig/20201217-125535-marostegui.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after cloning db1154:3311 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13567 and previous config saved to /var/cache/conftool/dbconfig/20201217-125446-marostegui.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repooling after cloning db1154:3315 as sanitarium T268742', diff saved to https://phabricator.wikimedia.org/P13566 and previous config saved to /var/cache/conftool/dbconfig/20201217-124052-root.json
  • 12:36 jbond42: disable puppet fleet wide for condif master vhost change
  • 12:23 matthiasmullie: EU backport+config window done
  • 12:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE
  • 12:22 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: f3a50cb06: Enable ContentTranslation as default tool for ceb, km, mg, tg and yi WPs (duration: 01m 02s)
  • 12:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE
  • 12:17 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a29fec312: Add Wikidocumentaries campaign for ContentTranslation (duration: 01m 02s)
  • 12:07 mlitn@deploy1001: Synchronized wmf-config/SearchSettingsForSDC.php: 68ac6fa61: Media Search: Remove license map from config (duration: 01m 04s)
  • 11:38 kart_: Updated cxserver to 2020-12-17-111820-production (T262192)
  • 11:36 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 11:34 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 11:32 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 11:27 godog: bounce apache2 on grafana1002
  • 11:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE
  • 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE
  • 11:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE
  • 11:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE
  • 11:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE
  • 11:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE
  • 11:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE
  • 11:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE
  • 11:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE
  • 11:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE
  • 11:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:08 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:50 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
  • 10:45 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
  • 10:21 jbond42: updating RemoteIP on phabricator https://gerrit.wikimedia.org/r/c/operations/puppet/+/649872
  • 09:57 vgutierrez: repool ats-tls on cp5011
  • 09:00 marostegui: Sanitize s1 and s5 on db1154 T268742
  • 08:30 godog: swift codfw-prod: more weight to ms-be20[58-61] - T269337
  • 07:49 ryankemper: [wdqs deploy] (wdqs deploy complete)
  • 07:19 marostegui: Stop mysql on db1082 to clone db1154
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 for cloning db1154:3315 T268742 ', diff saved to https://phabricator.wikimedia.org/P13563 and previous config saved to /var/cache/conftool/dbconfig/20201217-071903-marostegui.json
  • 07:18 elukey: reboot an-airflow1001 for kernel upgrades
  • 07:08 elukey: update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/c/operations/homer/public/+/649706
  • 07:08 ryankemper: [wdqs] depooled `wdqs1013` while it catches up on lag
  • 07:06 ryankemper: [wdqs deploy] Restarting `wdqs-categories` across all wdqs instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 07:05 ryankemper: [wdqs deploy] Restarting `wdqs-categories` across all test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 07:05 ryankemper: [wdqs-deploy] Restarting `wdqs-updater` across all instances, 4 instances at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 07:04 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@90f9bdd]: 0.3.56 (duration: 10m 39s)
  • 06:54 ryankemper: [wdqs deploy] Tests passing on canary instance `wdqs1003` following canary deploy, proceeding to rest of fleet
  • 06:53 ryankemper@deploy1001: Started deploy [wdqs/wdqs@90f9bdd]: 0.3.56
  • 06:53 ryankemper: [wdqs deploy] All tests passing on canary instance `wdqs1003` prior to deploy
  • 06:52 kart_: Updated cxserver to 2020-12-16-164911-production (T234220, T269437)
  • 06:52 kart_: Updated cxserver to 2020-12-16-164911-production (T234220, T234220)
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1013 for decommissioning T268436', diff saved to https://phabricator.wikimedia.org/P13562 and previous config saved to /var/cache/conftool/dbconfig/20201217-062249-marostegui.json
  • 06:22 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:19 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:17 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 06:13 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 06:05 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:56 marostegui: Stop mysql on db1106 to clone db1154
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for cloning db1154:3311 T268742 ', diff saved to https://phabricator.wikimedia.org/P13560 and previous config saved to /var/cache/conftool/dbconfig/20201217-055556-marostegui.json
  • 01:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1019.eqiad.wmnet with reason: REIMAGE
  • 01:33 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1019.eqiad.wmnet with reason: REIMAGE
  • 01:01 twentyafterfour: preparing to update phabricator translations
  • 00:22 mutante: running puppet on mw2266, mw2370, mw2354

2020-12-16

  • 23:56 bstorm: bootstrapped meta_p database for the new s7 replicas T269427
  • 20:12 marxarelli: group1 to 1.36.0-wmf.22 complete. no new errors or concerning rates (refs T267415)
  • 20:06 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.22 (duration: 01m 01s)
  • 20:05 legoktm: added myself to the ops LDAP group
  • 20:05 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.22
  • 19:23 dcausse: Morning backport window deploy done
  • 19:21 dcausse@deploy1001: Synchronized php-1.36.0-wmf.22/extensions/WikimediaEvents/: T266027: Revert [cirrus] setup perfield builder A/B test on spaceless languages (duration: 01m 03s)
  • 19:18 dcausse@deploy1001: Synchronized php-1.36.0-wmf.21/extensions/WikimediaEvents/: T266027: Revert [cirrus] setup perfield builder A/B test on spaceless languages (duration: 01m 03s)
  • 19:09 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T266359: wgMinervaCountErrors config was removed (duration: 01m 03s)
  • 17:52 effie: uploading python-thumbor-wikimedia_2.9-1 to stretch-wikimedia/component/thumbor
  • 16:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE
  • 16:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE
  • 16:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1019.eqiad.wmnet with reason: REIMAGE
  • 16:36 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1019.eqiad.wmnet with reason: REIMAGE
  • 16:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
  • 16:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 16:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 16:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 16:23 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:03 jiji@deploy1001: Synchronized wmf-config/ProductionServices.php: Swap Redis lock managers with upgraded ones - T265643 (duration: 01m 03s)
  • 15:55 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 15:55 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 15:53 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 15:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 15:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 15:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 15:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 15:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 15:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:41 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 15:41 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 15:40 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 15:40 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 15:40 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 15:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 15:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 15:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 15:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 15:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
  • 15:36 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'plain' .
  • 15:36 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 15:36 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 15:36 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 15:21 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:21 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:11 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
  • 15:11 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 15:10 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 15:10 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 15:10 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 15:10 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:09 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 15:09 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
  • 15:07 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 15:07 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 14:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 14:54 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:46 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 14:46 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 14:46 akosiaris: deploy all services to staging codfw cluster
  • 14:45 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 14:45 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 14:02 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5c9c6c1: wawikisource: Add English aliases for Author NS (T269431) (duration: 01m 02s)
  • 13:59 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ebfd84b: wawikisource: Translate NS_PROJECT (T269431) (duration: 01m 01s)
  • 13:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 80dc989: wawikisource: Add author NS (T269431) (duration: 01m 02s)
  • 13:05 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,service=kartotherian-ssl,name=maps2010.codfw.wmnet
  • 13:05 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,service=kartotherian-ssl,name=maps2009.codfw.wmnet
  • 13:05 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,service=kartotherian-ssl,name=maps2008.codfw.wmnet
  • 13:05 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,service=kartotherian-ssl,name=maps2006.codfw.wmnet
  • 13:04 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,service=kartotherian-ssl,name=maps2005.codfw.wmnet
  • 12:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1010.eqiad.wmnet
  • 12:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1008.eqiad.wmnet
  • 12:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1007.eqiad.wmnet
  • 12:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1006.eqiad.wmnet
  • 12:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1265.eqiad.wmnet
  • 12:21 urbanecm@deploy1001: Synchronized wmf-config/MetaContactPages.php: 0c651a6: MetaContactPages: Remove licenseabuse contact page (T269781) (duration: 01m 03s)
  • 12:21 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1265.eqiad.wmnet
  • 12:20 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: dc=eqiad,cluster=appserver,service=apache2,name=mw1265.eqiad.wmnet
  • 12:20 jayme: imported kubernetes 1.16.15-2 into component/kubernetes-future stretch-wikimedia
  • 11:52 marostegui: Stop s1, s3, s5 and s8 on db1124 to copy it to db1154 (this will generate lag on wikireplicas) T268742
  • 11:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
  • 11:19 jiji@deploy1001: Synchronized wmf-config/ProductionServices.php: Swap mc1019 with mc1031 for Redis lock manager - T265643 (duration: 01m 17s)
  • 11:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2022.codfw.wmnet with reason: REIMAGE
  • 11:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1022.eqiad.wmnet with reason: REIMAGE
  • 11:15 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2022.codfw.wmnet with reason: REIMAGE
  • 11:14 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
  • 11:13 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1022.eqiad.wmnet with reason: REIMAGE
  • 11:10 jynus: stopping and restarting dbstore1004 to mitigate (short term) T270112
  • 10:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 10:35 jbond42: reboot rpki2001
  • 10:35 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 10:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:30 jbond42: reboot rpki1001
  • 10:30 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:05 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 10:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 10:00 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 09:49 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
  • 09:32 _joe_: reset-failed for docker report jobs on deneb, failed because of a registry gateway timeout
  • 09:29 elukey: force execution of cumin-check-aliases.service on cumin[12]001 hosts to clear alarms
  • 08:35 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 08:23 vgutierrez: acme-chief and acme-chief-api restarts for openssl upgrades (CVE-2020-1971)
  • 07:55 gehel: depool wdqs1005 (catching up on lag)
  • 07:20 marostegui: Stop mysql on db2142 to clone db1151 - T269324

2020-12-15

  • 23:47 dduvall@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 23:45 dduvall@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 23:34 dduvall@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 22:10 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEvents: Promote SessionTick to group1 T248987 (duration: 01m 04s)
  • 20:29 marxarelli: group0 to 1.36.0-wmf.22 complete. no new errors or concerning rates (refs T267415)
  • 20:26 tzatziki: reset email for User:Cnk1220
  • 20:06 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.22
  • 19:32 joal@deploy1001: Finished deploy [analytics/refinery@2202db5] (thin): Regular analytics weekly train - THIN [analytics/refinery@2202db5] (duration: 00m 08s)
  • 19:32 joal@deploy1001: Started deploy [analytics/refinery@2202db5] (thin): Regular analytics weekly train - THIN [analytics/refinery@2202db5]
  • 19:31 joal@deploy1001: Finished deploy [analytics/refinery@2202db5]: Regular analytics weekly train [analytics/refinery@2202db5] (duration: 16m 36s)
  • 19:14 joal@deploy1001: Started deploy [analytics/refinery@2202db5]: Regular analytics weekly train [analytics/refinery@2202db5]
  • 18:48 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.20 (duration: 04m 19s)
  • 18:41 dduvall@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.22 (duration: 46m 41s)
  • 17:55 dduvall@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.22
  • 16:47 ottomata: bumped eventate-main memory limits from 300M to 600M - T249745
  • 16:47 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 16:47 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 16:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1265.eqiad.wmnet with reason: REIMAGE
  • 16:44 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 16:44 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 16:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1265.eqiad.wmnet with reason: REIMAGE
  • 16:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 16:31 Amir1: end of rebuilding sites table across wikis (T269443 T269435 T269430 T268461 T268415)
  • 16:18 hnowlan: reimaging mw1265 to buster
  • 16:12 Amir1: start of rebuilding sites table across wikis (T269443 T269435 T269430 T268461 T268415)
  • 15:43 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:649432 T268075 Enable old revision parser cache on all wikis IS.php (duration: 00m 54s)
  • 15:41 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:649432 T268075 Enable old revision parser cache on all wikis CS.php (duration: 00m 56s)
  • 15:39 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 15:37 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 15:27 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 57s)
  • 15:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating skrwiktionary (T268448) (duration: 00m 54s)
  • 15:23 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating skrwiktionary (T268448) (duration: 00m 54s)
  • 15:21 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating skrwiktionary (T268448)
  • 15:20 urbanecm@deploy1001: Synchronized dblists: Creating skrwiktionary (T268448) (duration: 00m 57s)
  • 15:19 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating skrwiktionary (T268448) (duration: 00m 54s)
  • 15:18 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating skrwiktionary (T268448) (duration: 00m 55s)
  • 15:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix timezone config for skrwiki to unbreak it (T268410) (duration: 00m 54s)
  • 15:05 urbanecm@deploy1001: Synchronized langlist: Creating skrwiki (T268410) (duration: 00m 54s)
  • 15:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating skrwiki (T268410) (duration: 00m 54s)
  • 15:03 godog: reboot ms-be2031 - T269337
  • 15:03 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating skrwiki (T268410) (duration: 00m 54s)
  • 15:02 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating skrwiki (T268410)
  • 15:00 urbanecm@deploy1001: Synchronized dblists: Creating skrwiki (T268410) (duration: 00m 55s)
  • 14:59 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating skrwiki (T268410) (duration: 00m 54s)
  • 14:58 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating skrwiki (T268410) (duration: 00m 54s)
  • 14:53 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating eowikivoyage (T269426) (duration: 00m 54s)
  • 14:52 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating eowikivoyage (T269426)
  • 14:51 urbanecm@deploy1001: Synchronized dblists: Creating eowikivoyage (T269426) (duration: 00m 54s)
  • 14:50 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating eowikivoyage (T269426) (duration: 00m 54s)
  • 14:48 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating eowikivoyage (T269426) (duration: 00m 58s)
  • 14:40 marostegui: Configure replication on x2 codfw hosts T269324
  • 14:25 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 14:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating wawikisource (T269431) (duration: 00m 54s)
  • 14:12 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating wawikisource (T269431) (duration: 00m 54s)
  • 14:11 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating wawikisource (T269431)
  • 14:09 urbanecm@deploy1001: Synchronized dblists: Creating wawikisource (T269431) (duration: 00m 54s)
  • 14:08 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating wawikisource (T269431) (duration: 00m 54s)
  • 14:07 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating wawikisource (T269431) (duration: 00m 55s)
  • 13:58 urbanecm@deploy1001: Synchronized langlist: Creating madwiki (T269437) (duration: 00m 54s)
  • 13:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating madwiki (T269437) (duration: 00m 53s)
  • 13:55 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating madwiki (T269437)
  • 13:41 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: REVERT: Creating madwiki (T269437)
  • 13:40 urbanecm@deploy1001: Scap failed!: 5/8 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 13:40 urbanecm@deploy1001: Synchronized dblists: Creating madwiki (T269437) (duration: 00m 54s)
  • 13:39 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating madwiki (T269437) (duration: 00m 54s)
  • 13:36 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating madwiki (T269437) (duration: 00m 55s)
  • 13:12 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 12:59 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 12:59 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:57 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 12:57 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 12:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 12:55 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
  • 12:55 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'plain' .
  • 12:55 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 12:46 akosiaris: deploy zotero to staging-codfw for testing purposes
  • 12:45 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 12:45 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 12:45 akosiaris: Add tiller ClusterRole to staging-codfw
  • 12:45 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:44 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:33 Lucas_WMDE: EU backport+config window done
  • 12:32 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove propagatePageDeletion setting (2/2) (duration: 00m 54s)
  • 12:30 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Remove propagatePageDeletion setting (1/2) (duration: 00m 54s)
  • 12:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 2058fa5: Enable RC patrol for papwiki (T268924) (duration: 00m 54s)
  • 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0861bbb: Add wordmark and tagline for kawiki (T267776; 3/3) (duration: 00m 54s)
  • 12:15 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ka.svg: 0861bbb: Add wordmark and tagline for kawiki (T267776; 2/3) (duration: 00m 54s)
  • 12:14 Urbanecm: Purge tiwiki and tiwiktionary logos (T263504)
  • 12:12 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-tagline-ka.svg: 0861bbb: Add wordmark and tagline for kawiki (T267776; 1/3) (duration: 00m 55s)
  • 12:10 urbanecm@deploy1001: Synchronized static/images/project-logos/: b7756de: Update tiwiktionary logos (T263504) (duration: 00m 55s)
  • 12:06 urbanecm@deploy1001: Synchronized static/images/project-logos/: e1bd906: Update tiwiki logos (T263504) (duration: 00m 54s)
  • 11:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2033.codfw.wmnet with reason: REIMAGE
  • 11:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1033.eqiad.wmnet with reason: REIMAGE
  • 11:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2033.codfw.wmnet with reason: REIMAGE
  • 11:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1033.eqiad.wmnet with reason: REIMAGE
  • 11:17 effie: disable puppet on mc1033, mc2033 for memcached upgrade
  • 11:09 marostegui: Create fake db to trigger data checks alerts for clouddb hosts T267090
  • 11:06 jiji@deploy1001: Synchronized wmf-config/ProductionServices.php: Swap mc1033 with mc1034 for Redis lock manager - T265643 (duration: 00m 59s)
  • 09:28 marostegui: Stop mysql on es1019 - T270159
  • 09:13 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:09 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1019 from dbctl', diff saved to https://phabricator.wikimedia.org/P13549 and previous config saved to /var/cache/conftool/dbconfig/20201215-075220-marostegui.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1019 for decommissioning', diff saved to https://phabricator.wikimedia.org/P13548 and previous config saved to /var/cache/conftool/dbconfig/20201215-074924-marostegui.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2131', diff saved to https://phabricator.wikimedia.org/P13547 and previous config saved to /var/cache/conftool/dbconfig/20201215-072513-marostegui.json
  • 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:16 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 05:29 kart_: Updated cxserver to 2020-12-12-101743-production (T268309)
  • 05:23 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:17 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:14 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 00:13 ejegg: updated payments-wiki from 63ae7413a8 to 3d3055c478

2020-12-14

  • 22:39 sbassett: Deployed security patch for T120883 (v8) to wmf.21
  • 21:05 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add analytics event stream mediawiki.mediasearch_interaction T258183 (duration: 00m 56s)
  • 20:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
  • 20:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
  • 20:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1031.eqiad.wmnet with reason: REIMAGE
  • 20:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1031.eqiad.wmnet with reason: REIMAGE
  • 19:50 effie: disable puppet on mc1031, mc2031 to install buster
  • 19:45 mutante: mwdebug1003 - removing zero.wikimedia.org include for testing
  • 19:43 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3b5974f: zhwikinews: Grant suppressredirect to autoconfirmed (T270023) (duration: 00m 55s)
  • 19:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cf36ad6: hrwiki: Add draft namespace (T268740) (duration: 00m 56s)
  • 19:31 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:649359 group1: Enable OldRevisionParserCache (duration: 00m 55s)
  • 19:28 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:644317 Remove wgParserCacheUseJson setting (duration: 00m 56s)
  • 19:24 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.21/extensions/Popups: Backport gerrit:649408 Revert Remove title attributes at init (duration: 00m 59s)
  • 18:25 ryankemper: T269204 Restarting `wdqs-blazegraph` prometheus exporter across all wdqs instances:`sudo cumin -b 12 'P{wdqs*}' 'sudo systemctl restart prometheus-blazegraph-exporter-wdqs-blazegraph.service'`
  • 18:05 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1265.eqiad.wmnet
  • 18:04 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Redeploy Netbox 2.8 to netbox-next T266488 p2 (duration: 00m 05s)
  • 18:04 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Redeploy Netbox 2.8 to netbox-next T266488 p2
  • 18:04 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Redeploy Netbox 2.8 to netbox-next T266488 p1 (duration: 00m 33s)
  • 18:03 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Redeploy Netbox 2.8 to netbox-next T266488 p1
  • 17:59 hnowlan: depooled mw1265 for reimaging
  • 16:55 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:55 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 14:57 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:54 root@cumin1001: START - Cookbook sre.dns.netbox
  • 14:52 root@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:46 root@cumin1001: START - Cookbook sre.dns.netbox
  • 14:09 jbond42: upload modified golang-cfssl to apt
  • 13:45 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Wikibase Repo ID generator logging on Wikidata (T268625) (duration: 00m 55s)
  • 12:55 Lucas_WMDE: EU backport+config window done
  • 12:54 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Enable Wikibase Repo ID generator logging on Test Wikidata (T268625) (2/2) (duration: 00m 54s)
  • 12:53 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Wikibase Repo ID generator logging on Test Wikidata (T268625) (1/2) (duration: 00m 54s)
  • 12:45 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Add log channel Wikibase.IdGenerator (T268625) (Beta-only sync to avoid drift) (duration: 00m 55s)
  • 12:43 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add log channel Wikibase.IdGenerator (T268625) (duration: 00m 54s)
  • 12:39 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable QuickSurveys on commonswiki (T258419) (duration: 00m 55s)
  • 12:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Media Search survey (T258419) (duration: 00m 55s)
  • 11:34 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 54s)
  • 11:33 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:34 godog: add 100G to prometheus 'global' in codfw
  • 10:32 akosiaris: Adding kubernetes codfw staging cluster configuration to cr*-codfw
  • 10:17 marostegui: Stop mysql on db2131 to clone db2142
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131 to clone db2142', diff saved to https://phabricator.wikimedia.org/P13542 and previous config saved to /var/cache/conftool/dbconfig/20201214-101611-marostegui.json
  • 10:12 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.21/extensions/Wikibase/client/includes: Avoid loading the whole item in every client page view (T269960) (duration: 00m 25s)
  • 10:03 ladsgroup@deploy1001: Scap failed!: 4/9 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 09:51 godog: swift codfw-prod: more weight to ms-be20[58-61] - T269337
  • 09:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on cloudvirt1024.eqiad.wmnet with reason: T269419
  • 09:45 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on cloudvirt1024.eqiad.wmnet with reason: T269419
  • 08:40 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435

2020-12-11

  • 22:05 dduvall@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 22:02 dduvall@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 21:59 dduvall@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 21:57 akosiaris: add docker-ce_18.06.3~ce~3-0~debian_amd64.deb to apt.wikimedia.org stretch-wikimedia/thirdparty/k8s
  • 21:46 Amir1: Running schema changes on wikitech database for T269348
  • 21:45 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 21:42 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 21:41 dduvall@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 21:38 dduvall@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 21:35 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 21:33 dduvall@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 20:27 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 20:11 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Un-migrtate Growth EventLogging schema HomepageVisit back to EventLogging-backend on all wikis (this is a server side event which is not yet ready to migrate) - T267333 (duration: 00m 58s)
  • 19:28 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:18 razzi@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 18:47 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:30 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
  • 18:19 elukey@cumin1001: START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
  • 18:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
  • 18:13 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
  • 18:13 mutante: doc1001 restarted apache2 just in case after DOC_PATH change
  • 17:53 razzi@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:52 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 17:48 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
  • 17:41 elukey@cumin1001: START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
  • 16:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
  • 16:28 elukey@cumin1001: START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
  • 16:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
  • 16:10 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
  • 15:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
  • 15:33 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
  • 15:20 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
  • 15:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 15:12 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 15:10 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:06 elukey@cumin1001: START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
  • 14:59 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 14:45 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 14:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 14:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 14:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
  • 14:23 elukey@cumin1001: START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
  • 14:16 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
  • 14:04 elukey@cumin1001: START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
  • 14:03 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
  • 14:00 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 13:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 13:57 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
  • 13:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 13:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 12:02 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 12:00 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 09:57 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubestage2001.codfw.wmnet with reason: REIMAGE
  • 09:55 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubestage2002.codfw.wmnet with reason: REIMAGE
  • 09:54 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: REIMAGE
  • 09:53 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: REIMAGE
  • 09:26 elukey: add thirdparty/bigtop15 to buster-wikimedia
  • 08:13 elukey: restart memcached on mwdebug1002 to pick up the correct port (11210 instead of the default 11211)
  • 07:12 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
  • 07:04 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
  • 01:24 ejegg: updated payments-wiki from df80a99b40 to 63ae7413a8

2020-12-10

  • 23:35 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript resetAuthenticationThrottle.php --wiki=enwiki --login --ip 'REDACTED' --user 'WP 1.0 bot' # T269898
  • 23:15 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.21
  • 23:06 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 23:01 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate Growth EventLogging schemas to Event Platform on all wikis - T267333 (duration: 01m 09s)
  • 22:32 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.21/resources/lib/ooui/oojs-ui-widgets-wikimediaui.css: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/647641 to fix T269477 and unblock T264801 (duration: 01m 04s)
  • 22:24 sbassett: Deployed security patch for T120883 (v7) to wmf.21
  • 22:23 sbassett: Deployed security patch for T120883 (v7) to wmf.20
  • 22:03 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 20:54 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate Growth EventLogging schemas to Event Platform on testwiki - T267333 (duration: 01m 03s)
  • 20:25 hashar@deploy1001: Finished deploy [integration/docroot@fdf0917]: (no justification provided) (duration: 00m 06s)
  • 20:25 hashar@deploy1001: Started deploy [integration/docroot@fdf0917]: (no justification provided)
  • 20:07 catrope@deploy1001: Synchronized php-1.36.0-wmf.21/extensions/GrowthExperiments/: Add banner module to the homepage (T269804) (duration: 01m 03s)
  • 20:06 catrope@deploy1001: Synchronized php-1.36.0-wmf.21/extensions/FlaggedRevs/: Guard more singleton() calls with globalArticleInstance() checks (T269608, to unbreak CI in wmf.21) (duration: 01m 04s)
  • 19:41 catrope@deploy1001: Synchronized php-1.36.0-wmf.21/resources/src/mediawiki.rcfilters/styles/mw.rcfilters.ui.FilterTagMultiselectWidget.less: Work around OOUI bug breaking RCFilters UI (T269477) (duration: 01m 04s)
  • 19:24 catrope@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: Revert PoolCounter settings for DPL (T263220) (duration: 01m 03s)
  • 19:20 ejegg: updated CiviCRM from acb87a092d to aefd10c4e6
  • 19:15 catrope@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: Add PoolCounter settings for DPL (T263220) (duration: 01m 05s)
  • 19:07 mutante: doc1001 - restarted apache after docroot change
  • 17:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1032.eqiad.wmnet with reason: REIMAGE
  • 17:54 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:647751 T269809 (duration: 01m 05s)
  • 17:54 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1032.eqiad.wmnet with reason: REIMAGE
  • 17:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2032.codfw.wmnet with reason: REIMAGE
  • 17:13 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2032.codfw.wmnet with reason: REIMAGE
  • 17:03 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:59 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:54 effie: upgrade mc1032, mc2032 to buster - T213089
  • 16:53 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:28 godog: power reset ms-be1022 - stuck after boot - T267870
  • 16:27 gehel: depooling wdqs1011, issues with categories endpoint
  • 16:13 elukey: add thirdparty/bigtop15 packages to stretch-wikimedia
  • 16:13 volans@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cumin2001.codfw.wmnet with reason: volans's test
  • 16:13 volans@cumin2001: START - Cookbook sre.hosts.downtime for 0:10:00 on cumin2001.codfw.wmnet with reason: volans's test
  • 15:58 mutante: mw2243 pooled - first jobrunner on buster
  • 15:55 cicalese@deploy1001: Synchronized wmf-config/CommonSettings.php: 645308 CommonSettings: OAuth 2.0 refresh tokens expire after 1 minute (duration: 01m 02s)
  • 15:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:54 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mwdebug1003.eqiad.wmnet
  • 15:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mwdebug1003.eqiad.wmnet
  • 15:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mwdebug1002.eqiad.wmnet
  • 15:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:53 moritzm: rebooting planet1002 (planet.wikimedia.org) for kernel update
  • 15:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2243.codfw.wmnet
  • 15:50 cicalese@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 646862 Configure API Portal permissions for launch (duration: 01m 03s)
  • 15:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:12 moritzm: restarting turnilo and hue to pick up OpenSSL security updates
  • 15:07 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:04 moritzm: restarting slapd on ldap replicas to pick up OpenSSL updates
  • 15:04 jbond42: reboot deneb.codfw.wmnet
  • 15:03 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:03 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 14:50 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:47 jbond42: re-enable puppet fleet wide
  • 14:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:38 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:17 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:10 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:10 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 14:10 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:07 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:50 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:50 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:46 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:32 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:26 jbond42: disable puppet fleet wide to reboot puppet managment infrastructre
  • 12:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:41 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:33 Lucas_WMDE: EU backport+config window done
  • 12:31 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 12:30 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.21/includes/specials/SpecialWhatLinksHere.php: Backport: Fix prev/next links on Special:WhatLinksHere (T269830) (duration: 01m 04s)
  • 12:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 042dd03: [huwiki] Set wgFlaggedRevsOverride back to true per community vote (T210224) (duration: 01m 07s)
  • 12:13 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 11:21 effie: upload rometheus-redis-exporter_0.13-1 to buster-wikimedia main
  • 11:20 moritzm: installing apt security updates on buster/stretch
  • 11:13 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:07 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 11:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:05 moritzm: rebooting failoid1001
  • 11:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:44 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
  • 10:42 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:38 effie: uploading prometheus-redis-exporter_0.13-1 in component/redis2 for buster
  • 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:37 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 10:33 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:33 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 10:33 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:32 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:29 volans: upgraded spicearack to 0.0.46 on cumin[12]001
  • 10:16 volans: uploaded spicerack_0.0.46 to apt.wikimedia.org buster-wikimedia
  • 09:57 ema: A:cp rolling ats-{tls,backend}-restart for openssl upgrades (CVE-2020-1971)
  • 09:28 ema: cp3054: downgrade varnish to 6.0.0-1wm1 T264398
  • 09:28 effie: disable puppet on all hosts running nutcracker for 647204 - T265643
  • 09:26 effie: disable puppet on all mw* hosts for 647204 - T265643
  • 09:11 effie: disable puppet on all hosts running redis - T265643
  • 08:22 godog: swift codfw-prod: more weight to ms-be20[58-61] - T269337
  • 06:38 kart_: Upgraded Apertium to 2020-12-09-115733-production
  • 06:35 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 06:35 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'plain' .
  • 06:30 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'plain' .
  • 06:30 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 06:27 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
  • 00:42 ejegg: updated payments-wiki from 756c2f7ce0 to df80a99b40
  • 00:26 robh: cr2-eqsin bad fan being swapped via T267544

2020-12-09

  • 23:21 mutante: repooling parse2001 after buster reimage - T268524
  • 23:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2243.codfw.wmnet
  • 23:16 mutante: repooling parse2001 after buster reimage - T245757
  • 23:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=parse2001.codfw.wmnet
  • 23:04 mutante: zero.wikimedia.beta.wmflabs.org removed from beta_sites (deployment-prep) T187716
  • 22:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:40 bstorm: shutting down labstore1006 for maintenance T268285
  • 20:27 mutante: mw1281,mw1282,mw1283 - scap pull
  • 20:26 mutante: repooling mw1281,mw1282,mw1283 - now in rack A8
  • 20:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw128[1-3].eqiad.wmnet
  • 20:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1283.eqiad.wmnet
  • 20:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1282.eqiad.wmnet
  • 20:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1281.eqiad.wmnet
  • 20:18 twentyafterfour: wmf.21 looks good on group1 wikis. Still seeing T269603 but not at an increased rate. (refs T264801)
  • 20:13 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.21 (duration: 01m 02s)
  • 20:12 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.21
  • 19:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ce01bbe: Enable ArticlePlaceholder at papwiki (T223693) (duration: 01m 02s)
  • 19:17 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.21/includes/page/Article.php: deploy 0d99fe6d54 Article::view - remove the old subtitle from doOutputFromParserCache. Bug: T269727 (duration: 01m 04s)
  • 18:59 mutante: testreduce1001 - installed make
  • 18:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:17 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2243.codfw.wmnet
  • 18:16 mutante: depooling mw2243 (jobrunner) for reimaging (T245757)
  • 18:15 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2243.codfw.wmnet
  • 18:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:05 mutante: mw1281,mw1282,mw1283 shut down for T266164
  • 18:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:59 clarakosi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 17:58 clarakosi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 17:57 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 17:49 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw128[1-3].eqiad.wmnet
  • 17:24 mutante: depooling 3 API appservers in eqiad to physically move to another rack
  • 17:23 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[1-3].eqiad.wmnet
  • 16:52 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 16:49 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 16:48 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 16:10 ema: deployment-cache-text06: deploy varnish 6.0.0-1wm1 T264398
  • 16:06 moritzm: updating mwdebug1003, parse2001, deploy1002, deploy2002 to wikidiff 1.10.0-1~wmf1+buster1
  • 16:05 moritzm: importing wikidiff2 1.10.0-1~wmf1+buster1 to component/php72 T250515
  • 15:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1001.eqiad.wmnet
  • 15:47 hnowlan: reimaging restbase2009 after disk replacement
  • 15:29 moritzm: restarting nginx on htmldump1001 to pick up OpenSSL security updates
  • 13:54 godog: experiment with rsync.service increased niceness on ms-be2057 - T269337
  • 13:27 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 13:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:03 XioNoX: standardize Private-Peer BGP group on all cr*
  • 12:30 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1001.eqiad.wmnet
  • 12:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:24 Urbanecm: Eu B&C window done
  • 12:23 urbanecm@deploy1001: Synchronized w/static.php: cfb3602: Remove unsupported arg in MediaWiki::doPostOutputShutdown() call (duration: 01m 02s)
  • 12:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2009.codfw.wmnet
  • 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3414289: Add extended-confirmed group and restriction level for bgwiki (T269709) (duration: 01m 19s)
  • 11:06 godog: reboot ms-be1019 / ms-be1020 - T268435
  • 10:56 godog: change librenms alerts and transport groups to use alertmanager - T267018
  • 10:45 moritzm: installing openssl updates on Buster
  • 09:24 jbond42: make message mandatory for disable-puppet
  • 09:03 godog: swift codfw-prod: add ms-be20[58-61] - T269337
  • 01:07 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Explicitly set wgAbuseFilterAflFilterMigrationStage ahead of train roll-out T269712 (duration: 01m 03s)
  • 00:53 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.20/vendor/: 3278ffd: Bump wikimedia/parsoid to v0.13.0-a19 (T269685) (duration: 01m 16s)

2020-12-08

  • 23:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEvents: Enable SessionTick on group0 T248987 (duration: 02m 00s)
  • 22:46 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:17 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.21
  • 22:10 twentyafterfour@deploy1001: Pruned MediaWiki: 1.36.0-wmf.18 (duration: 02m 00s)
  • 21:58 twentyafterfour@deploy1001: Pruned MediaWiki: 1.36.0-wmf.16 (duration: 02m 43s)
  • 21:51 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 21:36 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:28 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:25 reedy@deploy1001: Pruned MediaWiki: 1.36.0-wmf.14 [keeping static files] (duration: 02m 44s)
  • 21:21 reedy@deploy1001: Pruned MediaWiki: 1.36.0-wmf.13 [keeping static files] (duration: 03m 59s)
  • 21:05 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.21 (duration: 39m 02s)
  • 20:41 ejegg: updated standalone SmashPig listener from e3103b96ca to 5a69abd40f
  • 20:28 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.21
  • 20:16 twentyafterfour: syncing new branch 1.376.0-wmf.21
  • 18:47 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 18:44 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 18:42 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 18:26 cstone: civicrm revision changed from b0ffb87c5d to 7d4b71d8b5
  • 17:00 thcipriani@deploy1001: Synchronized README: no-op sync for Drop Scap plugins now that 3.16.0 is installed (duration: 00m 59s)
  • 16:40 papaul: poweroff ms-be2058 for DIMM replacement
  • 16:31 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:29 dcaro@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:03 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:44 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigrations (duration: 00m 58s)
  • 15:28 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigrations (duration: 01m 00s)
  • 14:18 moritzm: installing puma security updates
  • 14:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:06 akosiaris: switch over all apertium traffic to kubernetes
  • 14:05 akosiaris@cumin1001: conftool action : set/pooled=no; selector: service=apertium,name=scb.*
  • 13:56 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: service=apertium,name=kubernetes.*
  • 13:54 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: service=apertium,name=kubernetes1001.*
  • 13:53 akosiaris: pooling of apertium kubernetes service, take #3
  • 13:51 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'plain' .
  • 13:51 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 13:50 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'staging' .
  • 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'plain' .
  • 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'staging' .
  • 13:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
  • 13:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 13:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'plain' .
  • 13:44 godog: bounce logstash on logstash1023
  • 13:44 godog: bounce logstash on logstash102
  • 13:42 effie: disable puppet on mc1034/mc2034 and reimage to buster - T213089
  • 13:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
  • 13:28 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 13:19 moritzm: installing xen security updates (client side libs only)
  • 12:57 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'plain' .
  • 12:45 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: service=apertium,name=kubernetes.*
  • 12:43 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: service=apertium,name=kubernetes1001*
  • 12:35 dcausse: EU B&C done
  • 12:34 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:32 dcaro@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:30 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 12:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:28 dcausse@deploy1001: Synchronized php-1.36.0-wmf.20/extensions/WikimediaEvents/: T266027: [cirrus] setup perfield builder A/B test on spaceless languages (duration: 01m 00s)
  • 12:27 akosiaris@cumin1001: conftool action : set/pooled=no; selector: service=apertium,name=kubernetes.*
  • 12:23 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: service=apertium,name=kubernetes.*
  • 12:23 akosiaris: pooling of apertium kubernetes service, take #2
  • 12:08 akosiaris: rollback pooling of apertium kubernetes service.
  • 12:08 akosiaris@cumin1001: conftool action : set/pooled=no; selector: service=apertium,name=kubernetes.*
  • 12:03 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: service=apertium,name=kubernetes.*
  • 12:01 akosiaris@cumin1001: conftool action : set/weight=1; selector: service=apertium,name=kubernetes.*
  • 12:01 moritzm: installing jasper security updates
  • 11:51 moritzm: uploaded jasper 1.900.1-debian1-2.4+deb8u6+wmf1 to apt.wikimedia.org / jessie-wikimedia
  • 11:45 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'plain' .
  • 11:44 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 11:43 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'staging' .
  • 11:43 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 11:42 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'plain' .
  • 11:42 akosiaris: add a TLS enabled release for apertium
  • 11:42 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 11:41 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'staging' .
  • 11:41 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'plain' .
  • 10:45 ema: A:cp remove libvmod-tbf -- see https://gerrit.wikimedia.org/r/c/operations/puppet/+/623583/
  • 10:33 ema: cp4032: restart ats-{tls,be}
  • 10:29 godog: reboot ms-be1021 - T268435
  • 10:13 ema: cp3054: upgrade varnish back to 6.0.7-1wm1 T264398
  • 09:56 ema: cp3054: downgrade varnish to 5.2.1-1wm1 T264398
  • 09:06 effie: upgrade scap to 3.16 everywhere - T268634
  • 08:16 godog: upgrade hw raid firmware on ms-be1032 and reboot - T269624
  • 08:09 godog: power reset ms-be1032
  • 03:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:16 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:05 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:03 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:30 ejegg: deployed queue consumers for new Adyen capture jobs
  • 01:57 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:55 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 01:54 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 01:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:52 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:51 ryankemper: T246345 Brought two new public wdqs nodes `wdqs101[2,3]` into service: `sudo confctl select 'name=wdqs1012.eqiad.wmnet|wdqs1013.eqiad.wmnet' set/pooled=yes:weight=10`
  • 01:50 ryankemper@mwmaint1002: conftool action : set/pooled=yes:weight=10; selector: name=wdqs1012.eqiad.wmnet|wdqs1013.eqiad.wmnet
  • 01:49 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:49 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:43 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 01:43 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 01:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:23 ryankemper@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:21 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:18 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:18 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:12 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:11 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:56 ryankemper: T269204 reimaging the following instances to debian buster: `eqiad public`->`wdqs1007`, `codfw public`->`wdqs2007`, `codfw internal`->`wdqs2008`, `test`->`wdqs1010`

2020-12-07

  • 23:44 eileen: process-control config revision is b3bc1959cd
  • 23:13 eileen: process-control config revision is f48cdb9184
  • 23:13 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:12 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:07 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 23:07 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 23:07 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 23:07 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 23:07 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 23:07 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:48 sbassett: Re-deployed security patch for T120883 (v2)
  • 22:39 sbassett: Undeployed security patch for T120883 as it caused several errors
  • 22:35 sbassett: Deployed security patch for T120883
  • 22:30 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:17 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:01 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:59 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 20:58 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 20:58 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:27 mutante: mwmaint1002 - systemctl reset-failed to clear icinga alert
  • 20:16 mutante: mwmaint1002 - mediawiki_job_wikidata-updateQueryServiceLag job failed to run
  • 20:00 ryankemper@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:58 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:57 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:55 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:55 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:38 ryankemper: T269204 reimaging the following instances to debian buster => `eqiad public`:`wdqs1006`, `codfw public`:`wdqs2003`, `codfw internal`:`wdqs2006`, `test`:`wdqs1009`
  • 19:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:23 ejegg: updated standalone SmashPig payments listener from 3029b07004 to e3103b96ca
  • 19:21 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:646679 Enable OldRevisionParserCache on labs and group0, CS.php (duration: 00m 59s)
  • 19:19 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:646679 Enable OldRevisionParserCache on labs and group0 (duration: 01m 00s)
  • 19:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3ba3af9: Remove Growth Study Screener Quick Survey Config (T269369) (duration: 01m 02s)
  • 18:57 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:57 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:57 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:57 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:52 herron: systemctl restart icinga on alert1001 T269560
  • 18:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:51 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 18:51 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 18:51 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:51 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:50 ryankemper: T246345 Brought new `wdqs-internal` node `wdqs1011` into service: `sudo confctl select 'name=wdqs1011.eqiad.wmnet' set/pooled=yes:weight=10`
  • 18:49 ryankemper@mwmaint1002: conftool action : set/pooled=yes:weight=10; selector: name=wdqs1011.eqiad.wmnet
  • 18:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:48 ryankemper@mwmaint1002: conftool action : set/pooled=yes:weight=10; selector: service=wdqs-internal,name=wdqs1011.eqiad.wmnet
  • 18:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:26 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:12 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigrations (duration: 00m 57s)
  • 18:02 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigrations (duration: 00m 58s)
  • 17:54 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigrations (duration: 00m 59s)
  • 17:49 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigrations (duration: 01m 01s)
  • 17:37 akosiaris: cleanup the old recommendation-api non TLS LVS service
  • 17:18 effie: disable puppet on mc1035, mc2035 for 646751
  • 16:45 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 16:44 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 16:43 ema: deployment-cache-text06: downgrade varnish to 5.2.1-1wm1 T264398
  • 16:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:40 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:38 dcaro@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:35 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 16:35 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 16:16 moritzm: updated buster installation image to 10.7 T269558
  • 16:04 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 16:04 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:04 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:59 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 15:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:41 moritzm: installing vips security updates
  • 14:50 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.20
  • 14:46 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.20
  • 14:38 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.20 (duration: 01m 06s)
  • 14:37 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.20
  • 14:33 hashar@deploy1001: Synchronized php-1.36.0-wmf.20/includes: Applying https://gerrit.wikimedia.org/r/c/mediawiki/core/+/645312 T2569396 (duration: 01m 15s)
  • 13:38 kart_: Deployed apertium service to eqiad and codfw (T255672)
  • 13:23 hashar: Stopping CI Jenkins for upgrade
  • 13:20 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 13:18 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 13:12 hashar: Upgrading Jenkins 2.252 > 2.263.1 on contint2001 / contint1001
  • 12:57 Urbanecm: EU B&C window done
  • 12:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5691a39: Revoke urlshortener-create-url from sysops (T229633) (duration: 01m 06s)
  • 12:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ee1e400: Assign urlshortener-create-url permission (T229633) (duration: 01m 06s)
  • 12:28 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T266027: [cirrus] A/B test perfield build on spaceless languages (duration: 01m 07s)
  • 12:27 effie: rollour scap 3.16.0-1 to canaries - T268634
  • 12:26 effie: rollour scap 3.16.0-1 to canaries
  • 12:21 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: [cirrus] flip activation of MLR rescore window using supported_syntax (duration: 01m 06s)
  • 12:15 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] cleanup mediasearch commons A/B test (duration: 01m 06s)
  • 12:04 moritzm: installing Linux 4.19.160 updates from Buster point release (initially only package updates, no reboots yet)
  • 11:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 06s)
  • 11:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 40s)
  • 09:02 godog: bounce apache2 on prometheus1003
  • 08:47 godog: add 300G to prometheus global (eqiad)
  • 08:04 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435

2020-12-05

  • 16:25 godog: swift disable sdg1 on ms-be1054
  • 06:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:11 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:09 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:01 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:49 ryankemper: restarted pybal on `lvs1015` per the instructions in https://wikitech.wikimedia.org/wiki/PyBal#Services_known_to_PyBal_but_not_to_IPVS
  • 05:39 ryankemper: restarted pybal on `lvs1016` per the instructions in https://wikitech.wikimedia.org/wiki/PyBal#Services_known_to_PyBal_but_not_to_IPVS
  • 04:44 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 04:44 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 04:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 04:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 04:28 ryankemper@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 04:26 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 04:25 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:23 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime
  • 04:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:22 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime
  • 04:21 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime
  • 04:21 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime
  • 04:05 ryankemper: T269204 reimaging the following instances to debian buster (one each from `[public, internal] x [eqiad, codfw]`): `wdqs1005`, `wdqs2002`, `wdqs1008`, `wdqs2005`
  • 03:49 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:44 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 03:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 03:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 02:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:12 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 02:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 01:58 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:09 eileen: civicrm revision changed from 5fa107d32a to b0ffb87c5d, config revision is ffe0a99133
  • 00:45 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 00:45 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:45 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:40 Urbanecm: End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=eswiki; T246539)
  • 00:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 00:35 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 00:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:32 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:27 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime
  • 00:27 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:26 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:17 Urbanecm: deploy1001 stagging dir is DIRTY: /srv/mediawiki-staging (master u+1): last commit bce4125, author Mukunda Modell (cc twentyafterfour)
  • 00:09 ryankemper: T269204 reimaging the following instances to debian buster: `wdqs1004`, `wdqs2001`, `wdqs1003`

2020-12-04

  • 17:22 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Wilfredor . # T269452
  • 15:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:45 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:15 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:14 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:14 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
  • 13:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 13:07 akosiaris: create apertium namespace on k8s clusters. T255672
  • 11:24 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:24 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:31 jynus: setting db1133 as read-write for backup testing
  • 10:28 moritzm: resetting cumin-check-aliases.service on cumin* hosts
  • 09:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:54 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:30 moritzm: installing zsh security updates on stretch
  • 09:26 moritzm: installing mutt security updates
  • 08:58 moritzm: installing lxml security updates
  • 07:09 marostegui: Stop mysql on clouddb1016 to clone clouddb1020 T267090
  • 07:02 marostegui: Increase pvs on db[1151-1155] T269324 T268742
  • 02:16 eileen: civicrm revision changed from 913ccdfd2b to 5fa107d32a, config revision is ffe0a99133
  • 01:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:42 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:04 ryankemper: T269406 https://grafana.wikimedia.org/d/000000305/maps-performances?viewPanel=11&orgId=1&var-cluster=maps1&from=1606827063027&to=1607043666975 shows that the normal daily dropoff in lag did not occur today, leading to the criticals. It's possible some sort of daily job has failed
  • 00:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:06 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime

2020-12-03

  • 23:47 ejegg: adjusted timings for donations queue consumer and thank you mailer
  • 23:44 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:39 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 22:50 ejegg: updated standalone SmashPig IPN listener from 63dffcb11f to 3029b07004
  • 22:49 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:49 razzi@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:15 shdubsh: restart elasticsearch on logstash1010 - gc issues
  • 22:15 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:46 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback to wmf.18
  • 21:39 twentyafterfour: rolling back wmf.20 due to T269396 refs T263186
  • 21:37 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
  • 21:26 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:26 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:25 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:23 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:05 bstorm: running maintain-dbusers harvest-replicas to populate the user accounts on new wikireplicas servers
  • 20:46 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 20:44 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:43 shdubsh: kill slapd on serpens and restart it
  • 20:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:39 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:38 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:38 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:38 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:38 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:28 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:26 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:10 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:08 ryankemper: T269204 Re-imaging `wdqs2004` to upgrade it to buster: `sudo -i wmf-auto-reimage-host --conftool -p T269204 wdqs2004.codfw.wmnet`
  • 20:03 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.20
  • 19:58 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@95d6432]: (no justification provided) (duration: 00m 23s)
  • 19:58 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: (no justification provided)
  • 19:57 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@95d6432]: (no justification provided) (duration: 00m 19s)
  • 19:57 shdubsh: restart logstash kafka in codfw - java updates
  • 19:57 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: (no justification provided)
  • 19:57 hashar@deploy1001: Synchronized php-1.36.0-wmf.20/extensions/AbuseFilter/includes/FilterLookup.php: Use 'default' as default group when reading filters from history - T269314 (duration: 01m 05s)
  • 19:56 milimetric@deploy1001: Started restart [analytics/aqs/deploy@95d6432]: (no justification provided)
  • 19:56 milimetric@deploy1001: Started restart [analytics/aqs/deploy@95d6432]: (no justification provided)
  • 19:55 milimetric@deploy1001: Started restart [analytics/aqs/deploy@95d6432]: (no justification provided)
  • 19:55 milimetric@deploy1001: Started restart [analytics/aqs/deploy@95d6432]: (no justification provided)
  • 19:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:44 shdubsh: restart logstash kafka in eqiad - java updates
  • 19:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:23 Urbanecm: mwscript namespaceDupes.php --wiki=kuwiktionary --fix (T269319)
  • 19:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 6be070c: Kurdish Wiktionary: Add WF namespace alias to NS_PROJECT (T269319) (duration: 01m 08s)
  • 19:17 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2001.codfw.wmnet
  • 19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b7b946a: Enable NewUserMessage for ptwiki (T269290) (duration: 01m 08s)
  • 19:11 mutante: depooling parse2001 and repeating auto-reimage to see if ferm issue is repeatable (T268524)
  • 19:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
  • 19:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 06d9e8d: Undeploy graphoid for phase 3 wikis (T259207) (duration: 01m 08s)
  • 18:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:17 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:04 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:56 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:50 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:34 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:33 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:32 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:57 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:55 volans@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:27 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:52 effie: upgrading labweb* to ICU 63 - T264991
  • 15:47 moritzm: updated thirdparty/postgres96 to 9.6.20-1.pgdg100+1 9.6.17-2.pgdg100+1
  • 15:46 elukey: moved conf1005 to rack B3 - T267065
  • 15:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:06 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:59 jbond42: disable puppet fleet wide to role out puppetdb node-ttl change
  • 14:46 effie: rolling depool and pool of parsoid servers
  • 14:34 elukey: stop zookeeper and etcd on conf1005 as prep-step before rack move
  • 14:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1089', diff saved to https://phabricator.wikimedia.org/P13523 and previous config saved to /var/cache/conftool/dbconfig/20201203-134724-marostegui.json
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P13522 and previous config saved to /var/cache/conftool/dbconfig/20201203-133953-marostegui.json
  • 13:30 effie: puppet enabled on jobrunners
  • 13:29 hashar: Upgraded Jenkins on releases1002 # T269352
  • 13:24 hashar: Upgraded Jenkins on releases2002 (spare server) # T269352
  • 13:09 moritzm: uploaded jenkins 2.263.1 to apt.wikimedia.org component/ci
  • 13:00 elukey: move db1108 to C3 - T267065
  • 12:37 moritzm: installing jupyter-notebook security updates on Stretch
  • 12:17 elukey: move aqs1006 to rack D6 - T267065
  • 12:10 effie: disable puppet on jobrunners and parsoid - T244340
  • 12:09 Lucas_WMDE: EU backport+config window done
  • 12:07 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Enable implicit description usage (T267745) (duration: 01m 12s)
  • 11:57 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=enwiki; T246539)
  • 11:52 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=eswiki; T246539)
  • 11:46 elukey: move druid1001 to rack A1 - T267065
  • 11:31 volans@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cumin2001.codfw.wmnet with reason: volans's test
  • 11:31 volans@cumin2001: START - Cookbook sre.hosts.downtime for 0:10:00 on cumin2001.codfw.wmnet with reason: volans's test
  • 09:40 ema: A:cp start rolling varnish upgrade to 6.0.7-1wm1 T268736
  • 09:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:34 moritzm: gnt-instance reboot ldap-replica2003 to validate new qemu
  • 09:14 moritzm: installing qemu security updates on Stretch
  • 06:06 marostegui: Create sockpuppet database on m2 T268505
  • 04:13 ejegg: updated fundraising CiviCRM from a2979cbba1 to 913ccdfd2b
  • 03:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:24 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:54 eileen: process-control config revision is f863b32627
  • 01:21 mutante: lists1001 - remove "delete_held_messages" cronjob from root crontab - replaced by systemd timer - systemctl start delete_held_messages.service and confirmed it succeeded

2020-12-02

  • 23:42 reedy@deploy1001: Synchronized php-1.36.0-wmf.20/includes/debug/logger/monolog/LogstashFormatter.php: T269286 (duration: 01m 07s)
  • 22:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:47 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:43 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.20/extensions/CategoryTree/: Deploying backport f6c2d74259b9 to wmf.20, bug: T269235 refs T263186 (duration: 01m 07s)
  • 22:38 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.20/includes/parser/: Deploying backports for wmf.20 refs T263186 (duration: 01m 08s)
  • 21:23 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 21:23 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 21:20 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 21:20 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 21:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 21:16 clarakosi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 21:14 clarakosi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 21:11 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 21:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:56 twentyafterfour: deploying backports for 1.36.0-wmf.20 refs T263186
  • 20:52 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:49 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 20:49 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 20:42 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 20:41 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 20:38 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 20:32 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:32 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:27 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:27 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:25 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:22 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 20:21 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 20:20 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 20:14 mutante: sodium - started update-ubuntu-mirror systemd timer - debugging why it fails; manually syncing with sudo -u mirror /usr/local/sbin/update-ubuntu-mirror
  • 20:10 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:08 mutante: sodium systemctl reset-failed
  • 20:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:56 mutante: sodium - systemctl restart update-tails-mirror.timer
  • 19:20 mforns: restarted turnilo to clear deleted datasource
  • 19:17 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add EventStream config for link recommendations (T261407) (duration: 01m 06s)
  • 18:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • {{safesubst:SAL entry|1=18:00 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.20/extensions/UrlShortener/includes/UrlShortenerUtils.php: [[gerrit:644879|Remove var_dump() left by mistake (duration: 01m 09s)}}
  • 17:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 24da542: Add all subdomains of artsdatabanken.no to the wgCopyUploadsDomains allowlist for commonswiki (T267784) (duration: 01m 06s)
  • 17:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:53 mutante: sodium - commenting "sync ubuntu mirror / sync tails mirror" cronjobs in the crontab of user 'mirror' after they were replaced by systemd timers by gerrit:636082
  • 17:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:25 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1010.eqiad.wmnet
  • 17:11 effie: uploading scap 3.16.0-1
  • 16:25 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:29 moritzm: installing libproxy security updates on Buster
  • 15:27 moritzm: restarting turnilo
  • 15:00 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.20
  • 14:56 hashar: Promoting group0 to 1.36.0-wmf.20 since I haven't done so yesterday :-\ # T263186
  • 14:48 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Event Platform: Rename mw_session_tick stream to mediawiki.client.session_tick (duration: 01m 07s)
  • 14:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:12 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=commonswiki; T246539)
  • 14:10 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=ptwiki; T246539)
  • 14:07 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.20 (duration: 01m 18s)
  • 14:06 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.20
  • 13:11 moritzm: installing brotli security updates
  • 13:07 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 13:06 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 13:06 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 11:34 XioNoX: add Lumen transit to cr3-ulsfo - T268691
  • 11:16 jayme: updated docker-report to 0.0.9-1 on chartmuseum* and deneb
  • 11:11 jayme: imported docker-report 0.0.9-1 to buster-wikimedia
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13516 and previous config saved to /var/cache/conftool/dbconfig/20201202-102348-root.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13515 and previous config saved to /var/cache/conftool/dbconfig/20201202-100845-root.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13513 and previous config saved to /var/cache/conftool/dbconfig/20201202-095341-root.json
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13512 and previous config saved to /var/cache/conftool/dbconfig/20201202-093838-root.json
  • 08:55 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
  • 06:54 marostegui: Remove es1017 from tendril and zarcillo T268825
  • 06:32 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 06:22 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:22 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 05:44 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 05:44 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 05:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 04:34 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:32 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 04:10 ryankemper: T259588 Beginning wdqs categories data-reload on the following instances (one each from `[public, internal] x [eqiad, codfw]`): `wdqs1006`, `wdqs2003`, `wdqs1011`, `wdqs2006`
  • 04:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
  • 04:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
  • 04:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 04:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 00:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 00:48 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 00:43 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 00:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 00:16 Urbanecm: Evening B&C window done
  • 00:14 bstorm: created views and wikireplicas indexes on clouddb10[13-19] sans s1 T268312
  • 00:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c73f0bf: Enable watchlist expiry feature on all wikis (T266875) (duration: 01m 07s)
  • 00:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime

2020-12-01

  • 23:15 ryankemper: T259588 Beginning wdqs categories data-reload on the following instances (one each from `[public, internal] x [eqiad, codfw]`): `wdqs1005`, `wdqs2002`, `wdqs1008`, `wdqs2005`
  • 23:13 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
  • 23:13 razzi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 23:13 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 23:13 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
  • 23:13 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 23:12 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
  • 23:12 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 23:12 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 22:41 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=arwiki; T246539)
  • 22:15 rzl@cumin2001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 22:13 rzl@cumin2001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 21:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:53 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 21:21 hashar: gerrit2001: restarting Gerrit to take in account a config change in the daemon ( --replica moved to daemonOpt config file)
  • 21:18 mutante: applied deployment_server role on deploy2002, added mcrouter cert, initial puppet run pulls mediawiki-config and other repos, downtimed in Icinga for 40 days (T265963)
  • 21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:39 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:19 eileen: civicrm revision changed from fb0ad7f39b to a2979cbba1, config revision is 111cf0d63d
  • 19:56 razzi@deploy1001: Finished deploy [analytics/refinery@41c60d9] (thin): Regular analytics weekly train [analytics/refinery@3e42f46c62722256a1678809097114740806a184] (duration: 00m 07s)
  • 19:56 razzi@deploy1001: Started deploy [analytics/refinery@41c60d9] (thin): Regular analytics weekly train [analytics/refinery@3e42f46c62722256a1678809097114740806a184]
  • 19:54 razzi@deploy1001: Finished deploy [analytics/refinery@41c60d9]: Regular analytics weekly train [analytics/refinery@3e42f46c62722256a1678809097114740806a184] (duration: 08m 45s)
  • 19:51 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:45 razzi@deploy1001: Started deploy [analytics/refinery@41c60d9]: Regular analytics weekly train [analytics/refinery@3e42f46c62722256a1678809097114740806a184]
  • 19:44 razzi: deploy refinery with refinery-source v0.0.140
  • 19:40 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 19:40 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 19:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:36 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 19:35 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:35 ejegg: updated payments-wiki from 8612ed1002 to 756c2f7ce0
  • 19:33 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 19:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 19:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 19:07 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:45 razzi@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:40 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable WikimediaApiPortalOAuth on apiportalwiki gerrit:644305 (duration: 01m 06s)
  • 18:38 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikimediaApiPortalOAuth on apiportalwiki gerrit:644305 (duration: 01m 06s)
  • 18:03 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable session length instrument on officewiki T267494 (duration: 01m 06s)
  • 17:57 mholloway-shell@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/WikimediaEvents: Backport: sessionTick: Update stream name to mw_session_tick (duration: 01m 04s)
  • 17:54 mholloway-shell@deploy1001: Synchronized php-1.36.0-wmf.20/extensions/WikimediaEvents: Backport: sessionTick: Update stream name to mw_session_tick (duration: 01m 07s)
  • 17:34 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add event stream config for android.user_contributions_screen T228179 (duration: 01m 07s)
  • 17:19 marostegui: Sanitize s1 on clouddb1013 and clouddb1017 - T267090
  • 16:09 moritzm: installing vips security updates
  • 15:24 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:22 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:10 hashar@deploy1001: Finished scap: (no justification provided) (duration: 44m 20s)
  • 14:59 jbond42: install libonig updates to scp
  • 14:51 jbond42: instal lxml updates
  • 14:26 hashar@deploy1001: Started scap: (no justification provided)
  • 14:24 hashar@deploy1001: sync-world aborted: testwikis wikis to 1.36.0-wmf.20 (duration: 74m 55s)
  • 14:08 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:05 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port set pic-slot 0 member 2 port 53 - T268808
  • 14:00 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 0 member 2 port 53 - T268808
  • 13:53 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 to clone clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13507 and previous config saved to /var/cache/conftool/dbconfig/20201201-133917-marostegui.json
  • 13:10 hashar@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.20
  • 12:57 hashar: Preparing deployment of 1.36.0-wmf.20 # T263186
  • 12:38 moritzm: uploaded libonig 5.9.5-3.2+deb8u4+wmf1 to apt.wikimedia.org/jessie-wikimedia
  • 12:33 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:05 arturo: [11:53 moritzm] uploaded lxml 3.4.0-1+deb8u1+wmf1 to apt.wikimedia.org/jessie-wikimedia
  • 11:48 marostegui: Install bsd-mailx on the new clouddb hosts (needed for the check private data) T267090 T268725
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13506 and previous config saved to /var/cache/conftool/dbconfig/20201201-110214-root.json
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13505 and previous config saved to /var/cache/conftool/dbconfig/20201201-104710-root.json
  • 10:45 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 10:38 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13503 and previous config saved to /var/cache/conftool/dbconfig/20201201-103207-root.json
  • 10:31 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:30 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13501 and previous config saved to /var/cache/conftool/dbconfig/20201201-101703-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change', diff saved to https://phabricator.wikimedia.org/P13500 and previous config saved to /var/cache/conftool/dbconfig/20201201-101346-marostegui.json
  • 10:08 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13499 and previous config saved to /var/cache/conftool/dbconfig/20201201-100541-root.json
  • 10:02 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13498 and previous config saved to /var/cache/conftool/dbconfig/20201201-095037-root.json
  • 09:49 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13497 and previous config saved to /var/cache/conftool/dbconfig/20201201-093534-root.json
  • 09:35 volans: upgrading spicerack to 0.0.45 on cumin1001
  • 09:32 volans@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:21 volans@cumin2001: START - Cookbook sre.hosts.decommission
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13496 and previous config saved to /var/cache/conftool/dbconfig/20201201-092030-root.json
  • 09:05 moritzm: removing obsolete resources on idp* and idp-test* hosts after going active-active
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change', diff saved to https://phabricator.wikimedia.org/P13495 and previous config saved to /var/cache/conftool/dbconfig/20201201-085916-marostegui.json
  • 08:18 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:11 volans@cumin2001: START - Cookbook sre.dns.netbox
  • 08:10 volans: upgrading spicerack to 0.0.45 on cumin2001
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13494 and previous config saved to /var/cache/conftool/dbconfig/20201201-081002-root.json
  • 08:05 marostegui: Create database mwaddlink on m2 - T267214
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13493 and previous config saved to /var/cache/conftool/dbconfig/20201201-075458-root.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13492 and previous config saved to /var/cache/conftool/dbconfig/20201201-073955-root.json
  • 07:31 marostegui: Deploy "_p" databases to all clouddb hosts (except clouddb1020*) T268312
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13491 and previous config saved to /var/cache/conftool/dbconfig/20201201-072451-root.json
  • 07:15 marostegui: Deploy labsdb role on all clouddb instances (except clouddb1020*) T268312
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1017 from dbctl T268825', diff saved to https://phabricator.wikimedia.org/P13490 and previous config saved to /var/cache/conftool/dbconfig/20201201-065419-marostegui.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P13489 and previous config saved to /var/cache/conftool/dbconfig/20201201-065125-marostegui.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1018 from dbctl T269069', diff saved to https://phabricator.wikimedia.org/P13488 and previous config saved to /var/cache/conftool/dbconfig/20201201-061321-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1017 and es1018 for reboot', diff saved to https://phabricator.wikimedia.org/P13487 and previous config saved to /var/cache/conftool/dbconfig/20201201-060313-marostegui.json
  • 04:13 legoktm: resetting elukey's jenkins API token (T268978)
  • 01:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 01:55 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 01:49 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 01:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 01:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 00:22 ryankemper: T259588 Beginning wdqs categories data-reload on the following instances (one each from `[public, internal] x [eqiad, codfw]`): `wdqs1004`, `wdqs2001`, `wdqs1003`, `wdqs2004`
  • 00:20 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
  • 00:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 00:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
  • 00:17 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 00:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
  • 00:16 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload


2000s

2010s

2020s