Jump to content

Server Admin Log/Archive 68

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2023-07-31

  • 23:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T342617)', diff saved to https://phabricator.wikimedia.org/P49860 and previous config saved to /var/cache/conftool/dbconfig/20230731-235442-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T342617)', diff saved to https://phabricator.wikimedia.org/P49859 and previous config saved to /var/cache/conftool/dbconfig/20230731-233039-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T342617)', diff saved to https://phabricator.wikimedia.org/P49858 and previous config saved to /var/cache/conftool/dbconfig/20230731-233018-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P49857 and previous config saved to /var/cache/conftool/dbconfig/20230731-231512-ladsgroup.json
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P49856 and previous config saved to /var/cache/conftool/dbconfig/20230731-230006-ladsgroup.json
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T342617)', diff saved to https://phabricator.wikimedia.org/P49855 and previous config saved to /var/cache/conftool/dbconfig/20230731-224500-ladsgroup.json
  • 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T342617)', diff saved to https://phabricator.wikimedia.org/P49854 and previous config saved to /var/cache/conftool/dbconfig/20230731-223547-ladsgroup.json
  • 22:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 22:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T342617)', diff saved to https://phabricator.wikimedia.org/P49853 and previous config saved to /var/cache/conftool/dbconfig/20230731-223526-ladsgroup.json
  • 22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P49852 and previous config saved to /var/cache/conftool/dbconfig/20230731-222020-ladsgroup.json
  • 22:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P49851 and previous config saved to /var/cache/conftool/dbconfig/20230731-220514-ladsgroup.json
  • 21:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T342617)', diff saved to https://phabricator.wikimedia.org/P49850 and previous config saved to /var/cache/conftool/dbconfig/20230731-215008-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T342617)', diff saved to https://phabricator.wikimedia.org/P49849 and previous config saved to /var/cache/conftool/dbconfig/20230731-213017-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T342617)', diff saved to https://phabricator.wikimedia.org/P49848 and previous config saved to /var/cache/conftool/dbconfig/20230731-212941-ladsgroup.json
  • 21:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P49847 and previous config saved to /var/cache/conftool/dbconfig/20230731-211435-ladsgroup.json
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P49846 and previous config saved to /var/cache/conftool/dbconfig/20230731-205928-ladsgroup.json
  • 20:45 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T342617)', diff saved to https://phabricator.wikimedia.org/P49845 and previous config saved to /var/cache/conftool/dbconfig/20230731-204422-ladsgroup.json
  • 20:37 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T342617)', diff saved to https://phabricator.wikimedia.org/P49844 and previous config saved to /var/cache/conftool/dbconfig/20230731-203451-ladsgroup.json
  • 20:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T342617)', diff saved to https://phabricator.wikimedia.org/P49843 and previous config saved to /var/cache/conftool/dbconfig/20230731-203413-ladsgroup.json
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P49842 and previous config saved to /var/cache/conftool/dbconfig/20230731-201907-ladsgroup.json
  • 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P49841 and previous config saved to /var/cache/conftool/dbconfig/20230731-200401-ladsgroup.json
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T342617)', diff saved to https://phabricator.wikimedia.org/P49840 and previous config saved to /var/cache/conftool/dbconfig/20230731-194854-ladsgroup.json
  • 19:03 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@47f9458]: (no justification provided) (duration: 00m 16s)
  • 19:03 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@47f9458]: (no justification provided)
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T342617)', diff saved to https://phabricator.wikimedia.org/P49839 and previous config saved to /var/cache/conftool/dbconfig/20230731-184200-ladsgroup.json
  • 18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T342617)', diff saved to https://phabricator.wikimedia.org/P49838 and previous config saved to /var/cache/conftool/dbconfig/20230731-184140-ladsgroup.json
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P49837 and previous config saved to /var/cache/conftool/dbconfig/20230731-182633-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T342617)', diff saved to https://phabricator.wikimedia.org/P49836 and previous config saved to /var/cache/conftool/dbconfig/20230731-182114-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P49835 and previous config saved to /var/cache/conftool/dbconfig/20230731-181127-ladsgroup.json
  • 18:00 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 17:59 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 17:59 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 17:57 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 17:57 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:56 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T342617)', diff saved to https://phabricator.wikimedia.org/P49834 and previous config saved to /var/cache/conftool/dbconfig/20230731-175621-ladsgroup.json
  • 17:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1108.eqiad.wmnet
  • 17:04 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:04 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1108.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
  • 17:02 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1108.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
  • 17:00 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 17:00 btullis@cumin1001: Added views for new wiki: gpewiki T338678
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T342617)', diff saved to https://phabricator.wikimedia.org/P49833 and previous config saved to /var/cache/conftool/dbconfig/20230731-164759-ladsgroup.json
  • 16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 16:47 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T342617)', diff saved to https://phabricator.wikimedia.org/P49832 and previous config saved to /var/cache/conftool/dbconfig/20230731-164738-ladsgroup.json
  • 16:42 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1108.eqiad.wmnet
  • 16:34 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P49831 and previous config saved to /var/cache/conftool/dbconfig/20230731-163232-ladsgroup.json
  • 16:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-airflow1003.eqiad.wmnet
  • 16:30 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:30 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
  • 16:28 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
  • 16:25 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 16:19 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-airflow1003.eqiad.wmnet
  • 16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P49830 and previous config saved to /var/cache/conftool/dbconfig/20230731-161726-ladsgroup.json
  • 16:08 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 24s)
  • 16:07 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 16:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49829 and previous config saved to /var/cache/conftool/dbconfig/20230731-160500-ladsgroup.json
  • 16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T342617)', diff saved to https://phabricator.wikimedia.org/P49828 and previous config saved to /var/cache/conftool/dbconfig/20230731-160220-ladsgroup.json
  • 16:01 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 44s)
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P49827 and previous config saved to /var/cache/conftool/dbconfig/20230731-154954-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P49826 and previous config saved to /var/cache/conftool/dbconfig/20230731-153448-ladsgroup.json
  • 15:20 volans: deploying python3-wmflib fleet wide
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49825 and previous config saved to /var/cache/conftool/dbconfig/20230731-151942-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T342617)', diff saved to https://phabricator.wikimedia.org/P49824 and previous config saved to /var/cache/conftool/dbconfig/20230731-145252-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T342617)', diff saved to https://phabricator.wikimedia.org/P49823 and previous config saved to /var/cache/conftool/dbconfig/20230731-145232-ladsgroup.json
  • 14:47 sukhe: finished rolling out gdnsd 3.99.0~alpha2 upgrade
  • 14:45 fabfur: imported prometheus-rdkafka-exporter package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/software/prometheus-rdkafka-exporter/+/942613) T342154
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P49821 and previous config saved to /var/cache/conftool/dbconfig/20230731-143725-ladsgroup.json
  • 14:32 fabfur: imported file-read-backwards package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/file-read-backwards/+/942491) T342154
  • 14:31 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 14:26 volans: uploaded python3-wmflib_1.2.3 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
  • 14:25 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P49820 and previous config saved to /var/cache/conftool/dbconfig/20230731-142220-ladsgroup.json
  • 14:21 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 14:10 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:09 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) (T342964) (duration: 07m 27s)
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T342617)', diff saved to https://phabricator.wikimedia.org/P49819 and previous config saved to /var/cache/conftool/dbconfig/20230731-140713-ladsgroup.json
  • 14:05 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:03 jforrester@deploy1002: jforrester: Continuing with sync
  • 14:03 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) (T342964) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:02 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) (T342964)
  • 13:59 sukhe: reprepro -C component/dnsdist include bookworm-wikimedia dnsdist_1.8.0-1+wmf12u1_amd64.changes: T342154
  • 13:57 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Disable the Collection extension for now, broken (T342931) (duration: 07m 38s)
  • 13:57 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:52 sukhe: reprepro -C main include bullseye-wikimedia gdnsd_3.99.0~alpha2-1_amd64.changes
  • 13:51 jforrester@deploy1002: jforrester: Continuing with sync
  • 13:51 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Disable the Collection extension for now, broken (T342931) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:50 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Disable the Collection extension for now, broken (T342931)
  • 13:49 jforrester@deploy1002: Synchronized wmf-config/interwiki.php: T325908 (duration: 06m 25s)
  • 13:45 moritzm: install gtk+3.0 bugfix updates from Bullseye 11.7 point release
  • 13:44 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:42 fabfur: imported fifo-log-demux package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/software/fifo-log-demux/+/942414) T342154
  • 13:38 jforrester@deploy1002: Finished scap: Backport for Remove F: namespace alias (T325910) (duration: 24m 24s)
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49818 and previous config saved to /var/cache/conftool/dbconfig/20230731-133707-root.json
  • 13:29 jforrester@deploy1002: jforrester and epicpupper: Continuing with sync
  • 13:29 jforrester@deploy1002: jforrester and epicpupper: Backport for Remove F: namespace alias (T325910) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:24 moritzm: imported jenkins 2.401.3 to thirdparty/ci for bullseye-wikimedia T342572
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49817 and previous config saved to /var/cache/conftool/dbconfig/20230731-132201-root.json
  • 13:14 jforrester@deploy1002: Started scap: Backport for Remove F: namespace alias (T325910)
  • 13:13 James_F: WikiLambda backport verified for T342891 T342687 T341500 T343006 T342901 and T343041
  • 13:09 jforrester@deploy1002: Synchronized php-1.41.0-wmf.19/extensions/WikiLambda/: (no justification provided) (duration: 07m 16s)
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 50%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49816 and previous config saved to /var/cache/conftool/dbconfig/20230731-130657-root.json
  • 13:00 jnuche: CI Jenkins upgraded to 2.401.3: https://phabricator.wikimedia.org/T342572
  • 12:57 moritzm: installing 6.1.38 kernels on Bookworm hosts
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T342617)', diff saved to https://phabricator.wikimedia.org/P49815 and previous config saved to /var/cache/conftool/dbconfig/20230731-125513-ladsgroup.json
  • 12:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1196.eqiad.wmnet with reason: Maint
  • 12:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1196.eqiad.wmnet with reason: Maint
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1196 T342284', diff saved to https://phabricator.wikimedia.org/P49814 and previous config saved to /var/cache/conftool/dbconfig/20230731-125252-ladsgroup.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49813 and previous config saved to /var/cache/conftool/dbconfig/20230731-125152-root.json
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49812 and previous config saved to /var/cache/conftool/dbconfig/20230731-124912-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 12:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 12:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T342617)', diff saved to https://phabricator.wikimedia.org/P49811 and previous config saved to /var/cache/conftool/dbconfig/20230731-124851-ladsgroup.json
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49810 and previous config saved to /var/cache/conftool/dbconfig/20230731-123647-root.json
  • 12:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P49809 and previous config saved to /var/cache/conftool/dbconfig/20230731-123345-ladsgroup.json
  • 12:32 moritzm: installing xapian-core bugfix updates on Bullseye
  • 12:23 moritzm: installing mariadb-10.5 updates from Bullseye 11.7 point release (libs/tools, unrelated to wmf-mariadb packages)
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 5%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49808 and previous config saved to /var/cache/conftool/dbconfig/20230731-122142-root.json
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P49807 and previous config saved to /var/cache/conftool/dbconfig/20230731-121839-ladsgroup.json
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 3%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49806 and previous config saved to /var/cache/conftool/dbconfig/20230731-120638-root.json
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T342617)', diff saved to https://phabricator.wikimedia.org/P49805 and previous config saved to /var/cache/conftool/dbconfig/20230731-120332-ladsgroup.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 1%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49804 and previous config saved to /var/cache/conftool/dbconfig/20230731-115133-root.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2114 T334650', diff saved to https://phabricator.wikimedia.org/P49803 and previous config saved to /var/cache/conftool/dbconfig/20230731-114645-root.json
  • 11:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:11 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:11 btullis@cumin1001: Added views for new wiki: wikifunctionswiki T289316
  • 11:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:55 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:45 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:45 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 10:36 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=parse1002.eqiad.wmnet
  • 10:36 claime: Repooling parse1002 following CPU replacement - T339340
  • 10:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
  • 10:34 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
  • 10:28 _joe_: disabling puppet on mwdebug2002, testing noc.wikimedia.org
  • 10:20 moritzm: installing bind9 security updates (client-side tools/libs)
  • 10:11 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:02 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 10:02 btullis@cumin1001: Added views for new wiki: btmwiktionary T342670
  • 09:56 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
  • 09:54 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
  • 09:51 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable Lift Wing for most wikis (T342115) (duration: 23m 00s)
  • 09:50 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:45 ladsgroup@deploy1002: ladsgroup and isaranto: Continuing with sync
  • 09:41 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,name=parse1002.eqiad.wmnet
  • 09:37 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 09:36 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 09:32 urbanecm: Unblock stuck global rename by running `extensions/CentralAuth/maintenance/fixStuckGlobalRename.php` (T343099)
  • 09:29 ladsgroup@deploy1002: ladsgroup and isaranto: Backport for ores-extension: enable Lift Wing for most wikis (T342115) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 09:29 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1016.eqiad.wmnet with OS bookworm
  • 09:29 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 09:28 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable Lift Wing for most wikis (T342115)
  • 09:28 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 09:27 ladsgroup@deploy1002: Finished scap: Backport for Remove ak from wgImportSources (T333765) (duration: 08m 10s)
  • 09:21 ladsgroup@deploy1002: amire80 and ladsgroup: Continuing with sync
  • 09:20 ladsgroup@deploy1002: amire80 and ladsgroup: Backport for Remove ak from wgImportSources (T333765) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 09:19 ladsgroup@deploy1002: Started scap: Backport for Remove ak from wgImportSources (T333765)
  • 09:13 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
  • 09:10 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
  • 08:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 08:53 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 08:52 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1016.eqiad.wmnet with OS bookworm
  • 08:42 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T342617)', diff saved to https://phabricator.wikimedia.org/P49802 and previous config saved to /var/cache/conftool/dbconfig/20230731-083941-ladsgroup.json
  • 08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:21 taavi@deploy1002: Finished scap: Backport for ruwikibooks: Set wgRestrictDisplayTitle to false (T342800) (duration: 17m 02s)
  • 07:15 taavi@deploy1002: anzx and taavi: Continuing with sync
  • 07:13 taavi@deploy1002: anzx and taavi: Backport for ruwikibooks: Set wgRestrictDisplayTitle to false (T342800) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:04 taavi@deploy1002: Started scap: Backport for ruwikibooks: Set wgRestrictDisplayTitle to false (T342800)
  • 06:06 moritzm: imported jenkins 2.401.3 to thirdparty/ci for buster-wikimedia T342572

2023-07-29

  • 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 T343077', diff saved to https://phabricator.wikimedia.org/P49801 and previous config saved to /var/cache/conftool/dbconfig/20230729-165954-root.json
  • 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1183 to s5 primary T343077', diff saved to https://phabricator.wikimedia.org/P49800 and previous config saved to /var/cache/conftool/dbconfig/20230729-165813-root.json
  • 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'Emergency switchover T343077', diff saved to https://phabricator.wikimedia.org/P49799 and previous config saved to /var/cache/conftool/dbconfig/20230729-165748-root.json
  • 16:57 marostegui: Starting emergency s5 eqiad failover from db1130 to db1183 - T343077 T343076
  • 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1183 with weight 0 T343077', diff saved to https://phabricator.wikimedia.org/P49798 and previous config saved to /var/cache/conftool/dbconfig/20230729-163621-root.json
  • 16:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T343077
  • 16:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T343077
  • 16:19 _joe_: set read_only=0 on db1130
  • 16:15 _joe_: systemctl start mariadb.service on db1130

2023-07-28

  • 22:17 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@1ff1629]: Updating webrequest refine to include wikifunctions (duration: 00m 21s)
  • 22:16 milimetric@deploy1002: Started deploy [airflow-dags/analytics@1ff1629]: Updating webrequest refine to include wikifunctions
  • 22:03 milimetric@deploy1002: Finished deploy [analytics/refinery@f7e74ae] (thin): Fix wikifunction special page (duration: 00m 03s)
  • 22:03 milimetric@deploy1002: Started deploy [analytics/refinery@f7e74ae] (thin): Fix wikifunction special page
  • 22:00 milimetric@deploy1002: Finished deploy [analytics/refinery@f7e74ae]: Fix wikifunction special page (duration: 10m 18s)
  • 21:50 milimetric@deploy1002: Started deploy [analytics/refinery@f7e74ae]: Fix wikifunction special page
  • 20:12 milimetric@deploy1002: Finished deploy [analytics/refinery@53db2ca]: Publish refinery-source-0.2.19 (duration: 16m 53s)
  • 19:55 milimetric@deploy1002: Started deploy [analytics/refinery@53db2ca]: Publish refinery-source-0.2.19
  • 19:37 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@4d8c3db]: Deploying T342926 and https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/469 (duration: 00m 14s)
  • 19:37 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@4d8c3db]: Deploying T342926 and https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/469
  • 18:12 zabe: zabe@mwmaint1002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=brwikimedia --logwiki=metawiki 'Viniciuspontesoficial' 'Eusouvinipontes' # T343013
  • 16:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:54 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 15:44 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:42 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 15:34 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:34 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:32 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:27 kamila@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:26 kamila@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:26 kamila_: k8s: delete and recreate the benthos-cache-invalidator namespace
  • 15:25 kamila@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:25 kamila@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:25 milimetric@deploy1002: Finished deploy [analytics/refinery@1523f12] (thin): Patch sqoop of wikifunctions (duration: 00m 03s)
  • 14:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 14:25 milimetric@deploy1002: Started deploy [analytics/refinery@1523f12] (thin): Patch sqoop of wikifunctions
  • 14:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 14:21 milimetric@deploy1002: Finished deploy [analytics/refinery@1523f12]: Patch sqoop of wikifunctions (duration: 06m 11s)
  • 14:15 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 14:15 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 14:15 milimetric@deploy1002: Started deploy [analytics/refinery@1523f12]: Patch sqoop of wikifunctions
  • 14:14 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 14:08 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:03 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:30 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:28 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 12:28 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 10:51 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:50 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:44 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:41 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:41 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:39 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:38 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:36 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 10:36 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 10:33 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 10:33 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 10:29 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:28 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:18 dcausse: T342924: created search indices for wikifunctions
  • 10:00 aikochou@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:57 aikochou@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1108.eqiad.wmnet with reason: db1108 has been replaced with db1208 - leaving for a few days before decom
  • 09:07 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1108.eqiad.wmnet with reason: db1108 has been replaced with db1208 - leaving for a few days before decom
  • 00:20 tgr@deploy1002: Finished scap: Backport for help: Fix navigation in the help panel (T342927) (duration: 10m 09s)
  • 00:14 tgr@deploy1002: tgr: Continuing with sync
  • 00:11 tgr@deploy1002: tgr: Backport for help: Fix navigation in the help panel (T342927) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 00:10 tgr@deploy1002: Started scap: Backport for help: Fix navigation in the help panel (T342927)

2023-07-27

  • 21:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 21:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T342617)', diff saved to https://phabricator.wikimedia.org/P49790 and previous config saved to /var/cache/conftool/dbconfig/20230727-214302-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P49789 and previous config saved to /var/cache/conftool/dbconfig/20230727-212756-ladsgroup.json
  • 21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P49788 and previous config saved to /var/cache/conftool/dbconfig/20230727-211250-ladsgroup.json
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T342617)', diff saved to https://phabricator.wikimedia.org/P49787 and previous config saved to /var/cache/conftool/dbconfig/20230727-205744-ladsgroup.json
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T342617)', diff saved to https://phabricator.wikimedia.org/P49786 and previous config saved to /var/cache/conftool/dbconfig/20230727-203435-ladsgroup.json
  • 20:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49785 and previous config saved to /var/cache/conftool/dbconfig/20230727-203415-ladsgroup.json
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P49784 and previous config saved to /var/cache/conftool/dbconfig/20230727-201908-ladsgroup.json
  • 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P49783 and previous config saved to /var/cache/conftool/dbconfig/20230727-200402-ladsgroup.json
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49782 and previous config saved to /var/cache/conftool/dbconfig/20230727-194856-ladsgroup.json
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb1014.eqiad.wmnet with OS bullseye
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1014.eqiad.wmnet with reason: host reimage
  • 19:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1014.eqiad.wmnet with reason: host reimage
  • 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49781 and previous config saved to /var/cache/conftool/dbconfig/20230727-190637-ladsgroup.json
  • 19:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 19:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T342617)', diff saved to https://phabricator.wikimedia.org/P49780 and previous config saved to /var/cache/conftool/dbconfig/20230727-190617-ladsgroup.json
  • 18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P49779 and previous config saved to /var/cache/conftool/dbconfig/20230727-185110-ladsgroup.json
  • 18:41 milimetric@deploy1002: Finished deploy [analytics/refinery@1af57de] (thin): Deploying to sync script updates and static files (duration: 00m 04s)
  • 18:41 milimetric@deploy1002: Started deploy [analytics/refinery@1af57de] (thin): Deploying to sync script updates and static files
  • 18:41 milimetric@deploy1002: Finished deploy [analytics/refinery@1af57de]: Deploying to sync script updates and static files (duration: 08m 25s)
  • 18:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P49778 and previous config saved to /var/cache/conftool/dbconfig/20230727-183604-ladsgroup.json
  • 18:33 milimetric@deploy1002: Started deploy [analytics/refinery@1af57de]: Deploying to sync script updates and static files
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T342617)', diff saved to https://phabricator.wikimedia.org/P49777 and previous config saved to /var/cache/conftool/dbconfig/20230727-182058-ladsgroup.json
  • 18:13 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
  • 18:12 krinkle@deploy1002: Finished deploy [performance/navtiming@c868e79]: Rename FID labels (Ibab711), Remove QuickSurveys (T336169), Add Vietnam (T340714) (duration: 00m 05s)
  • 18:12 krinkle@deploy1002: Started deploy [performance/navtiming@c868e79]: Rename FID labels (Ibab711), Remove QuickSurveys (T336169), Add Vietnam (T340714)
  • 18:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb1013.eqiad.wmnet with OS bullseye
  • 18:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T342617)', diff saved to https://phabricator.wikimedia.org/P49776 and previous config saved to /var/cache/conftool/dbconfig/20230727-175659-ladsgroup.json
  • 17:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 17:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T342617)', diff saved to https://phabricator.wikimedia.org/P49775 and previous config saved to /var/cache/conftool/dbconfig/20230727-175638-ladsgroup.json
  • 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1013.eqiad.wmnet with reason: host reimage
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P49774 and previous config saved to /var/cache/conftool/dbconfig/20230727-174132-ladsgroup.json
  • 17:41 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1013.eqiad.wmnet with reason: host reimage
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P49773 and previous config saved to /var/cache/conftool/dbconfig/20230727-172626-ladsgroup.json
  • 17:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
  • 17:04 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:03 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:03 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:02 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:02 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:52 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable lw on itwiki and hewiki (T342115) (duration: 20m 53s)
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T342617)', diff saved to https://phabricator.wikimedia.org/P49771 and previous config saved to /var/cache/conftool/dbconfig/20230727-164711-ladsgroup.json
  • 16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49770 and previous config saved to /var/cache/conftool/dbconfig/20230727-164650-ladsgroup.json
  • 16:45 dancy@deploy1002: Installation of scap version "4.57.0" completed for 600 hosts
  • 16:44 dancy@deploy1002: Installing scap version "4.57.0" for 600 hosts
  • 16:41 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 16:41 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 16:34 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 16:34 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 16:33 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores-extension: enable lw on itwiki and hewiki (T342115) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:31 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable lw on itwiki and hewiki (T342115)
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P49769 and previous config saved to /var/cache/conftool/dbconfig/20230727-163144-ladsgroup.json
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P49768 and previous config saved to /var/cache/conftool/dbconfig/20230727-161638-ladsgroup.json
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49766 and previous config saved to /var/cache/conftool/dbconfig/20230727-160132-ladsgroup.json
  • 15:56 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 15:56 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:49 jynus: restart db2097
  • 15:49 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 15:49 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:45 zabe@deploy1002: Finished scap: Backport for Revert "CheckUser event table migration: Write new on group0" (T342902) (duration: 07m 43s)
  • 15:44 kamila@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:42 kamila@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:40 kamila@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:40 kamila@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:40 kamila@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:40 kamila@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:38 zabe@deploy1002: zabe and dreamyjazz: Backport for Revert "CheckUser event table migration: Write new on group0" (T342902) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 15:37 zabe@deploy1002: Started scap: Backport for Revert "CheckUser event table migration: Write new on group0" (T342902)
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49764 and previous config saved to /var/cache/conftool/dbconfig/20230727-153649-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T342617)', diff saved to https://phabricator.wikimedia.org/P49763 and previous config saved to /var/cache/conftool/dbconfig/20230727-153629-ladsgroup.json
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P49762 and previous config saved to /var/cache/conftool/dbconfig/20230727-152123-ladsgroup.json
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P49761 and previous config saved to /var/cache/conftool/dbconfig/20230727-150616-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T342617)', diff saved to https://phabricator.wikimedia.org/P49759 and previous config saved to /var/cache/conftool/dbconfig/20230727-145110-ladsgroup.json
  • 14:33 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:27 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T342617)', diff saved to https://phabricator.wikimedia.org/P49758 and previous config saved to /var/cache/conftool/dbconfig/20230727-142721-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T342617)', diff saved to https://phabricator.wikimedia.org/P49757 and previous config saved to /var/cache/conftool/dbconfig/20230727-142700-ladsgroup.json
  • 14:26 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 14:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:25 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 14:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 14:22 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:22 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 14:20 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:19 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 14:19 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P49756 and previous config saved to /var/cache/conftool/dbconfig/20230727-141154-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P49755 and previous config saved to /var/cache/conftool/dbconfig/20230727-135648-ladsgroup.json
  • 13:55 fabfur: done restarting lvs6002 (T335835)
  • 13:55 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6002.drmrs.wmnet
  • 13:52 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6002.drmrs.wmnet
  • 13:49 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 13:49 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T342617)', diff saved to https://phabricator.wikimedia.org/P49754 and previous config saved to /var/cache/conftool/dbconfig/20230727-134141-ladsgroup.json
  • 13:32 fabfur: begin restarting lvs6002 (T335835)
  • 13:18 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:17 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T342617)', diff saved to https://phabricator.wikimedia.org/P49752 and previous config saved to /var/cache/conftool/dbconfig/20230727-131733-ladsgroup.json
  • 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T342617)', diff saved to https://phabricator.wikimedia.org/P49751 and previous config saved to /var/cache/conftool/dbconfig/20230727-131712-ladsgroup.json
  • 13:15 fabfur: done restarting lvs6001 (T335835)
  • 13:15 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6001.drmrs.wmnet
  • 13:12 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6001.drmrs.wmnet
  • 13:11 samtar@deploy1002: Finished scap: Backport for Re-enable PC writes for parsoid endpoints (T339867) (duration: 07m 02s)
  • 13:05 samtar@deploy1002: samtar and daniel: Backport for Re-enable PC writes for parsoid endpoints (T339867) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:04 samtar@deploy1002: Started scap: Backport for Re-enable PC writes for parsoid endpoints (T339867)
  • 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P49750 and previous config saved to /var/cache/conftool/dbconfig/20230727-130206-ladsgroup.json
  • 12:54 fabfur: begin restarting lvs6001 (T335835)
  • 12:53 fabfur: done restarting lvs6003 (T335835)
  • 12:48 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6003.drmrs.wmnet
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P49749 and previous config saved to /var/cache/conftool/dbconfig/20230727-124700-ladsgroup.json
  • 12:45 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6003.drmrs.wmnet
  • 12:43 fabfur: begin restarting lvs6003 (T335835)
  • 12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T342617)', diff saved to https://phabricator.wikimedia.org/P49748 and previous config saved to /var/cache/conftool/dbconfig/20230727-123153-ladsgroup.json
  • 12:08 jynus: systemctl stop mariadb@s1 @ db2097
  • 12:07 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Also add square logo for Vector-2022 (duration: 07m 05s)
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T342617)', diff saved to https://phabricator.wikimedia.org/P49747 and previous config saved to /var/cache/conftool/dbconfig/20230727-120710-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:01 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Also add square logo for Vector-2022 synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:00 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Also add square logo for Vector-2022
  • 12:00 ladsgroup@deploy1002: Finished scap: Backport for rdbms: Avoid making wasteful memcached calls in CP (T314434) (duration: 08m 54s)
  • 11:52 ladsgroup@deploy1002: ladsgroup: Backport for rdbms: Avoid making wasteful memcached calls in CP (T314434) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:51 ladsgroup@deploy1002: Started scap: Backport for rdbms: Avoid making wasteful memcached calls in CP (T314434)
  • 11:50 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudnet2005-dev.private.codfw.wikimedia.cloud on all recursors
  • 11:50 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache cloudnet2005-dev.private.codfw.wikimedia.cloud on all recursors
  • 11:50 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudnet2006-dev.private.codfw.wikimedia.cloud on all recursors
  • 11:50 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache cloudnet2006-dev.private.codfw.wikimedia.cloud on all recursors
  • 11:50 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:50 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet2005-dev/2006-dev - aborrero@cumin1001"
  • 11:49 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet2005-dev/2006-dev - aborrero@cumin1001"
  • 11:48 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Add logo, wordmark (duration: 08m 35s)
  • 11:47 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:42 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudnet1005.private.eqiad.wikimedia.cloud on all recursors
  • 11:42 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache cloudnet1005.private.eqiad.wikimedia.cloud on all recursors
  • 11:41 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Add logo, wordmark synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:39 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Add logo, wordmark
  • 11:37 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 11:31 ladsgroup@deploy1002: backport Cancelled
  • 11:30 ladsgroup@deploy1002: Finished scap: Backport for CentralAuthUser: Don't load user information unless needed (duration: 07m 47s)
  • 11:24 ladsgroup@deploy1002: ladsgroup: Backport for CentralAuthUser: Don't load user information unless needed synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:22 ladsgroup@deploy1002: Started scap: Backport for CentralAuthUser: Don't load user information unless needed
  • 11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:12 fabfur: done restarting lvs3006 (T335835)
  • 11:03 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3006.esams.wmnet
  • 11:00 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3006.esams.wmnet
  • 10:53 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet1005/1006 - aborrero@cumin1001"
  • 10:52 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet1005/1006 - aborrero@cumin1001"
  • 10:50 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 10:50 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet1005
  • 10:50 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet1005
  • 10:50 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet1006
  • 10:50 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet1006
  • 10:37 taavi: purge edge caches for "https://wikifunctions.org/"
  • 10:35 fabfur: begin restarting lvs3006 (T335835)
  • 10:34 fabfur: done restarting lvs3005 (T335835)
  • 10:33 kevinbazira@deploy1002: Finished deploy [ores/deploy@c30920f]: T342118 (duration: 09m 04s)
  • 10:24 kevinbazira@deploy1002: Started deploy [ores/deploy@c30920f]: T342118
  • 10:14 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3005.esams.wmnet
  • 10:12 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3005.esams.wmnet
  • 09:56 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 09:54 fabfur: begin restarting lvs3005 (T335835)
  • 09:44 fabfur: done restarting lvs3007 (T335835)
  • 09:42 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3007.esams.wmnet
  • 09:40 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3007.esams.wmnet
  • 09:38 fabfur: begin restarting lvs3007 (T335835)
  • 09:20 urbanecm: Run `mwscript extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php --wiki=frwiki --page="Sensibilité électromagnétique" --force` to debug T342488
  • 09:12 fabfur: done restarting lvs1019 (T335835)
  • 09:11 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
  • 09:07 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
  • 08:42 fabfur: begin restarting lvs1019 (T335835)
  • 08:34 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 08:16 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.19 refs T340247
  • 07:54 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 07:54 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 07:54 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 07:54 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 07:40 XioNoX: reboot lsw1-a1-codfw (test device)
  • 06:53 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 06:39 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 06:38 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 06:36 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 06:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 05:57 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 05:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 05:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 05:26 oblivian@deploy1002: Started scap: (no justification provided)
  • 05:26 _joe_: scap is not syncing; just rebuilding the image from scratch to verify the reason for a bug.
  • 05:22 oblivian@deploy1002: Started scap: (no justification provided)
  • 03:19 cstone: payments-wiki upgraded from 2a68dfe2 to 1a6ca7ab
  • 03:04 eileen: civicrm upgraded from 5a84b138 to 16c2e58a
  • 00:54 eileen: civicrm upgraded from 68f29b70 to 5a84b138
  • 00:51 eileen: civicrm upgraded from 853c14f3 to 68f29b70
  • 00:20 eileen: rollback because I got an error when I tried to view - so let's see
  • 00:20 eileen: civicrm rolled back from 68f29b70 to 853c14f3 (locked)
  • 00:17 eileen: civicrm upgraded from 853c14f3 to 68f29b70

2023-07-26

  • 23:01 jforrester@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache now that wikifunctions is here (duration: 06m 52s)
  • 21:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wcqs2001.codfw.wmnet
  • 21:46 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wcqs2001.codfw.wmnet
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49745 and previous config saved to /var/cache/conftool/dbconfig/20230726-212310-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P49744 and previous config saved to /var/cache/conftool/dbconfig/20230726-210804-ladsgroup.json
  • 21:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
  • 21:04 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
  • 21:00 taavi: manually attach User:WikiLambda_system to SUL T342811
  • 20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P49743 and previous config saved to /var/cache/conftool/dbconfig/20230726-205257-ladsgroup.json
  • 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49742 and previous config saved to /var/cache/conftool/dbconfig/20230726-203751-ladsgroup.json
  • 20:34 taavi@deploy1002: Finished scap: Backport for clienthints: Start collecting client hints data on testwiki (T341110), CheckUser event table migration: Write new on group0 (T330158) (duration: 26m 17s)
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49741 and previous config saved to /var/cache/conftool/dbconfig/20230726-201554-ladsgroup.json
  • 20:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 20:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49740 and previous config saved to /var/cache/conftool/dbconfig/20230726-201533-ladsgroup.json
  • 20:09 taavi@deploy1002: dreamyjazz and taavi: Backport for clienthints: Start collecting client hints data on testwiki (T341110), CheckUser event table migration: Write new on group0 (T330158) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD
  • 20:08 taavi@deploy1002: Started scap: Backport for clienthints: Start collecting client hints data on testwiki (T341110), CheckUser event table migration: Write new on group0 (T330158)
  • 20:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P49739 and previous config saved to /var/cache/conftool/dbconfig/20230726-200026-ladsgroup.json
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P49738 and previous config saved to /var/cache/conftool/dbconfig/20230726-194520-ladsgroup.json
  • 19:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49737 and previous config saved to /var/cache/conftool/dbconfig/20230726-193014-ladsgroup.json
  • 18:48 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:47 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:45 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:44 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49736 and previous config saved to /var/cache/conftool/dbconfig/20230726-184430-ladsgroup.json
  • 18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 18:44 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49735 and previous config saved to /var/cache/conftool/dbconfig/20230726-184408-ladsgroup.json
  • 18:43 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:37 jforrester@deploy1002: Synchronized wmf-config/: Last fixes for initial wikifunctions.org, he says (duration: 06m 44s)
  • 18:37 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:34 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:34 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:34 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P49734 and previous config saved to /var/cache/conftool/dbconfig/20230726-182902-ladsgroup.json
  • 18:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P49732 and previous config saved to /var/cache/conftool/dbconfig/20230726-181356-ladsgroup.json
  • 18:13 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:12 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:12 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:10 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:10 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:09 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49731 and previous config saved to /var/cache/conftool/dbconfig/20230726-175850-ladsgroup.json
  • 17:49 jforrester@deploy1002: Finished scap: Hopefully final update for wikifunctions.org initial config (duration: 07m 30s)
  • 17:41 jforrester@deploy1002: Started scap: Hopefully final update for wikifunctions.org initial config
  • 17:37 jforrester@deploy1002: Finished scap: Backport for wgNoFollowDomainExceptions: Add wikifunctions.org (T275945) (duration: 11m 27s)
  • 17:27 jforrester@deploy1002: jforrester: Backport for wgNoFollowDomainExceptions: Add wikifunctions.org (T275945) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 17:25 jforrester@deploy1002: Started scap: Backport for wgNoFollowDomainExceptions: Add wikifunctions.org (T275945)
  • 17:13 jforrester@deploy1002: Finished scap: Backport for MWMultiVersion: Alert this code to wikifunctions.org existing (T275945) (duration: 08m 40s)
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49730 and previous config saved to /var/cache/conftool/dbconfig/20230726-171244-ladsgroup.json
  • 17:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 17:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49729 and previous config saved to /var/cache/conftool/dbconfig/20230726-171223-ladsgroup.json
  • 17:06 jforrester@deploy1002: jforrester: Backport for MWMultiVersion: Alert this code to wikifunctions.org existing (T275945) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 17:04 jforrester@deploy1002: Started scap: Backport for MWMultiVersion: Alert this code to wikifunctions.org existing (T275945)
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P49728 and previous config saved to /var/cache/conftool/dbconfig/20230726-165717-ladsgroup.json
  • 16:53 jforrester@deploy1002: Finished scap: Backport for docroot: Add wikifunctions.org (T275945) (duration: 08m 05s)
  • 16:47 jforrester@deploy1002: jforrester: Backport for docroot: Add wikifunctions.org (T275945) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:45 jforrester@deploy1002: Started scap: Backport for docroot: Add wikifunctions.org (T275945)
  • 16:44 fabfur: end reboot of lvs1018 (T335835)
  • 16:43 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P49727 and previous config saved to /var/cache/conftool/dbconfig/20230726-164211-ladsgroup.json
  • 16:40 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
  • 16:27 jforrester@deploy1002: Finished scap: Initial deploy of wikifunctionswiki in locked-down mode for T275945 (duration: 07m 49s)
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49726 and previous config saved to /var/cache/conftool/dbconfig/20230726-162705-ladsgroup.json
  • 16:20 jforrester@deploy1002: Started scap: Initial deploy of wikifunctionswiki in locked-down mode for T275945
  • 16:18 fabfur: begin reboot of lvs1018 (T335835)
  • 16:15 jforrester@deploy1002: Finished scap: Backport for Add wikifunctions.org to prod wgLocalVirtualHosts (T275945) (duration: 09m 07s)
  • 16:08 jforrester@deploy1002: jforrester: Backport for Add wikifunctions.org to prod wgLocalVirtualHosts (T275945) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:07 fabfur: end reboot of lvs1017 (T335835)
  • 16:06 jforrester@deploy1002: Started scap: Backport for Add wikifunctions.org to prod wgLocalVirtualHosts (T275945)
  • 16:03 jnuche@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.19 refs T340247 (duration: 06m 56s)
  • 15:56 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.19 refs T340247
  • 15:55 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
  • 15:52 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
  • 15:47 jforrester@deploy1002: Finished scap: Backport for Load RescoreFunctions from the ExtensionRegistry (T342744) (duration: 09m 39s)
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49725 and previous config saved to /var/cache/conftool/dbconfig/20230726-154245-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T342617)', diff saved to https://phabricator.wikimedia.org/P49724 and previous config saved to /var/cache/conftool/dbconfig/20230726-154209-ladsgroup.json
  • 15:39 jforrester@deploy1002: dcausse and jforrester: Backport for Load RescoreFunctions from the ExtensionRegistry (T342744) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 15:38 derick@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 15:37 jforrester@deploy1002: Started scap: Backport for Load RescoreFunctions from the ExtensionRegistry (T342744)
  • 15:37 derick@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 15:37 derick@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 15:35 derick@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 15:34 derick@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 15:34 derick@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 15:32 derick@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 15:32 derick@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 15:30 fabfur: begin reboot of lvs1017 (T335835)
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P49723 and previous config saved to /var/cache/conftool/dbconfig/20230726-152703-ladsgroup.json
  • 15:26 fabfur: end reboot of lvs1020 (T335835)
  • 15:25 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
  • 15:21 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
  • 15:20 fabfur: begin reboot of lvs1020 (T335835)
  • 15:17 fabfur: end reboot of lvs4009 (T335835)
  • 15:13 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4009.ulsfo.wmnet
  • 15:13 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable lw on eswikiquotes and eswikibooks (T342115) (duration: 14m 06s)
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P49722 and previous config saved to /var/cache/conftool/dbconfig/20230726-151157-ladsgroup.json
  • 15:10 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4009.ulsfo.wmnet
  • 15:00 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores-extension: enable lw on eswikiquotes and eswikibooks (T342115) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:59 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable lw on eswikiquotes and eswikibooks (T342115)
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T342617)', diff saved to https://phabricator.wikimedia.org/P49721 and previous config saved to /var/cache/conftool/dbconfig/20230726-145651-ladsgroup.json
  • 14:49 fabfur: begin reboot of lvs4009 (T335835)
  • 14:38 jforrester@deploy1002: Finished scap: Backport for Normalize the skin name when it comes from preferences or useskin (T342733) (duration: 08m 24s)
  • 14:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 14:33 hnowlan: enabling puppet on A:cp to deploy r/941440
  • 14:32 jforrester@deploy1002: jforrester: Backport for Normalize the skin name when it comes from preferences or useskin (T342733) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:31 kuncung: test
  • 14:30 jforrester@deploy1002: Started scap: Backport for Normalize the skin name when it comes from preferences or useskin (T342733)
  • 14:28 fabfur: end reboot of lvs4008 (T335835)
  • 14:27 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4008.ulsfo.wmnet
  • 14:24 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4008.ulsfo.wmnet
  • 14:19 hnowlan: disabling puppet on A:cp to deploy r/941440
  • 14:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 14:13 urbanecm@deploy1002: Finished scap: Backport for Revert "specials: Use cross-wiki aware UserIdentityLookup on Special:UserRights" (T255309 T342747) (duration: 12m 33s)
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T342617)', diff saved to https://phabricator.wikimedia.org/P49720 and previous config saved to /var/cache/conftool/dbconfig/20230726-141228-ladsgroup.json
  • 14:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 14:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 14:02 urbanecm@deploy1002: urbanecm: Backport for Revert "specials: Use cross-wiki aware UserIdentityLookup on Special:UserRights" (T255309 T342747) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:00 urbanecm@deploy1002: Started scap: Backport for Revert "specials: Use cross-wiki aware UserIdentityLookup on Special:UserRights" (T255309 T342747)
  • 14:00 fabfur: begin reboot of lvs4008 (T335835)
  • 13:55 fabfur: end reboot of lvs4010 (T335835)
  • 13:52 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
  • 13:50 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
  • 13:46 fabfur: begin reboot of lvs4010 (T335835)
  • 13:34 jforrester@deploy1002: Finished scap: Backport for Add stream config for iOS schema (T341896) (duration: 20m 16s)
  • 13:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T342617)', diff saved to https://phabricator.wikimedia.org/P49719 and previous config saved to /var/cache/conftool/dbconfig/20230726-133104-ladsgroup.json
  • 13:23 jforrester@deploy1002: jforrester and tsev: Backport for Add stream config for iOS schema (T341896) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:23 fab@deploy1002: Finished deploy [airflow-dags/research@e7b9253]: (no justification provided) (duration: 00m 07s)
  • 13:22 fab@deploy1002: Started deploy [airflow-dags/research@e7b9253]: (no justification provided)
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P49718 and previous config saved to /var/cache/conftool/dbconfig/20230726-131557-ladsgroup.json
  • 13:14 jforrester@deploy1002: Started scap: Backport for Add stream config for iOS schema (T341896)
  • 13:13 jforrester@deploy1002: sync-world aborted: Backport for Add stream config for iOS schema (T341896) (duration: 11m 00s)
  • 13:05 James_F: Created cu_useragent_clienthints.sql and cu_useragent_clienthints_map.sql on testwiki for T258105
  • 13:02 jforrester@deploy1002: Started scap: Backport for Add stream config for iOS schema (T341896)
  • 13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P49717 and previous config saved to /var/cache/conftool/dbconfig/20230726-130051-ladsgroup.json
  • 12:52 jforrester@deploy1002: Synchronized php-1.41.0-wmf.19/extensions/WikiLambda/: Update WikiLambda wmf.19 branch to latest ahead of wikifunctions.org roll-out (duration: 07m 10s)
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T342617)', diff saved to https://phabricator.wikimedia.org/P49716 and previous config saved to /var/cache/conftool/dbconfig/20230726-124545-ladsgroup.json
  • 12:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
  • 12:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
  • 12:36 jforrester@deploy1002: Finished scap: Backport for ProductionServices: Define the wikifunctions orchestrator access point (T297314) (duration: 07m 39s)
  • 12:30 jforrester@deploy1002: jforrester: Backport for ProductionServices: Define the wikifunctions orchestrator access point (T297314) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:28 jforrester@deploy1002: Started scap: Backport for ProductionServices: Define the wikifunctions orchestrator access point (T297314)
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T342617)', diff saved to https://phabricator.wikimedia.org/P49714 and previous config saved to /var/cache/conftool/dbconfig/20230726-115528-ladsgroup.json
  • 11:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T342617)', diff saved to https://phabricator.wikimedia.org/P49713 and previous config saved to /var/cache/conftool/dbconfig/20230726-115507-ladsgroup.json
  • 11:50 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 11:48 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 11:48 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 11:47 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 11:46 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:45 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:40 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P49712 and previous config saved to /var/cache/conftool/dbconfig/20230726-114001-ladsgroup.json
  • 11:32 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1016.eqiad.wmnet with OS bullseye
  • 11:27 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bullseye
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P49711 and previous config saved to /var/cache/conftool/dbconfig/20230726-112454-ladsgroup.json
  • 11:19 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.19 refs T340247
  • 11:14 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1016.eqiad.wmnet with OS bullseye
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T342617)', diff saved to https://phabricator.wikimedia.org/P49710 and previous config saved to /var/cache/conftool/dbconfig/20230726-110948-ladsgroup.json
  • 11:05 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts releases1002.eqiad.wmnet
  • 11:05 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:05 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: releases1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 11:03 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: releases1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 11:01 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 10:57 eoghan@cumin1001: START - Cookbook sre.hosts.decommission for hosts releases1002.eqiad.wmnet
  • 10:56 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts releases2002.codfw.wmnet
  • 10:56 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:56 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: releases2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 10:56 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bullseye
  • 10:55 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: releases2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 10:53 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 10:31 eoghan@cumin1001: START - Cookbook sre.hosts.decommission for hosts releases2002.codfw.wmnet
  • 10:27 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T342617)', diff saved to https://phabricator.wikimedia.org/P49708 and previous config saved to /var/cache/conftool/dbconfig/20230726-102232-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 10:17 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 09:59 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 08:36 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:34 jnuche@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.19 refs T340247 (duration: 19m 56s)
  • 08:14 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.19 refs T340247
  • 08:01 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 07:58 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:56 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:52 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • off: updating bookworm netboot image for point release 12.1 ( https://wikitech.wikimedia.org/wiki/Updating_netboot_image_with_newer_kernel#Updating_production_point_release )
  • 07:46 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:37 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:37 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 07:36 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bullseye
  • 07:36 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 07:35 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 07:35 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 07:30 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 07:18 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 07:07 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 06:48 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:47 oblivian@cumin1001: START - Cookbook sre.dns.netbox
  • 06:37 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts xhgui2001.codfw.wmnet,xhgui1001.eqiad.wmnet
  • 06:37 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:37 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui2001.codfw.wmnet,xhgui1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 06:34 marostegui: Stop mariadb on clouddb1021 T334651
  • 06:33 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui2001.codfw.wmnet,xhgui1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 06:31 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 06:30 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 06:26 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 06:25 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 06:24 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 06:21 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts xhgui2001.codfw.wmnet,xhgui1001.eqiad.wmnet
  • 06:18 oblivian@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 06:17 oblivian@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 06:17 oblivian@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 06:15 oblivian@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 01:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye

2023-07-25

  • 22:52 eileen: revision c62433ab -> 8689d10d
  • 21:18 zabe@deploy1002: Finished scap: Backport for Create UserIdentityValue with correct wiki (T342655), Create UserIdentityValue with correct wiki (T342655) (duration: 10m 06s)
  • 21:10 zabe@deploy1002: zabe: Backport for Create UserIdentityValue with correct wiki (T342655), Create UserIdentityValue with correct wiki (T342655) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 21:08 zabe@deploy1002: Started scap: Backport for Create UserIdentityValue with correct wiki (T342655), Create UserIdentityValue with correct wiki (T342655)
  • 21:02 zabe@deploy1002: Finished scap: update interwiki cache, gerrit:941057 (duration: 07m 20s)
  • 20:55 zabe@deploy1002: Started scap: update interwiki cache, gerrit:941057
  • 20:53 zabe@deploy1002: Finished scap: T335216 (duration: 08m 24s)
  • 20:46 zabe@deploy1002: zabe: T335216 synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:44 zabe@deploy1002: Started scap: T335216
  • 20:42 zabe: create Wiktionary Mandailing # T335216
  • 20:33 taavi@deploy1002: Finished scap: Backport for Fix text showing on icon only buttons (duration: 12m 08s)
  • 20:23 taavi@deploy1002: taavi and bwang: Backport for Fix text showing on icon only buttons synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:21 taavi@deploy1002: Started scap: Backport for Fix text showing on icon only buttons
  • 18:24 dwisehaupt@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:24 dwisehaupt@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"
  • 18:23 dwisehaupt@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"
  • 18:21 dwisehaupt@cumin1001: START - Cookbook sre.dns.netbox
  • 18:21 sukhe: dummy authdns-update returns
  • 18:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4003.wikimedia.org
  • 18:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns4003.wikimedia.org
  • 17:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4004.wikimedia.org
  • 17:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns4004.wikimedia.org
  • 16:56 sukhe: dummy authdns-update
  • 16:51 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:51 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6002.wikimedia.org
  • 16:45 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns6002.wikimedia.org
  • 16:41 fabfur: end rebooting lvs5005 (T335835)
  • 16:41 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5005.eqsin.wmnet
  • 16:40 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 16:37 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5005.eqsin.wmnet
  • 16:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6001.wikimedia.org
  • 16:26 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns6001.wikimedia.org
  • 16:19 fabfur: begin rebooting lvs5005 (T335835)
  • 15:57 dancy@deploy1002: Finished deploy [releng/jenkins-deploy@97b4674] (releasing): (no justification provided) (duration: 33m 26s)
  • 15:36 fabfur: lvs5004 restarted and services are reactivating (T335835)
  • 15:28 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5004.eqsin.wmnet
  • 15:26 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:25 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5004.eqsin.wmnet
  • 15:25 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5004.eqsin.wmnet
  • 15:25 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5004.eqsin.wmnet
  • 15:25 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:24 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:23 dancy@deploy1002: Started deploy [releng/jenkins-deploy@97b4674] (releasing): (no justification provided)
  • 15:22 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:21 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:17 _joe_: removing all tags for docker image openjdk-8-jre T341115
  • 15:16 zabe@deploy1002: Finished scap: Backport for Add namespace translations for Mandailing (btm) (T335217), Add namespace translations for Mandailing (btm) (T335217) (duration: 07m 51s)
  • 15:14 _joe_: removing all tags for docker image openjdk-8-jdk T341115
  • 15:10 zabe@deploy1002: zabe: Backport for Add namespace translations for Mandailing (btm) (T335217), Add namespace translations for Mandailing (btm) (T335217) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 15:09 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:08 zabe@deploy1002: Started scap: Backport for Add namespace translations for Mandailing (btm) (T335217), Add namespace translations for Mandailing (btm) (T335217)
  • 14:59 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:59 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:59 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:59 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:58 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 14:58 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:58 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:58 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 14:58 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:52 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 14:47 damilare: SmashPig upgraded from 9ee24eef to f40badde
  • 14:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 14:45 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 14:45 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 14:45 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 14:43 fabfur: begin rebooting lvs5004 (T335835)
  • 14:35 fabfur: lvs5006 rebooted and services restarted (T335835)
  • 14:33 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5006.eqsin.wmnet
  • 14:30 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 14:30 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
  • 14:30 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5006.eqsin.wmnet
  • 14:29 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
  • 14:29 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5006.eqsin.wmnet
  • 14:29 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
  • 14:29 hnowlan: disabling puppet on A:cp for rollout of r/941405
  • 14:28 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5006.eqsin.wmnet
  • 14:28 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
  • 14:27 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5006.eqsin.wmnet
  • 14:26 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
  • 14:24 fabfur: start stopping services and rebooting lvs5006 (T335835)
  • 14:12 damilare: SmashPig upgraded from a9156920 to 9ee24eef
  • 14:02 urbanecm@deploy1002: Finished scap: Backport for Enable write new on testwiki for CheckUser event tables migration (T330158) (duration: 22m 27s)
  • 14:00 sukhe: rolling out pdns-recursor update on A:dns-rec
  • 13:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on parse1002.eqiad.wmnet with reason: T339340 - hw troubleshooting
  • 13:42 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on parse1002.eqiad.wmnet with reason: T339340 - hw troubleshooting
  • 13:41 urbanecm@deploy1002: urbanecm and dreamyjazz: Backport for Enable write new on testwiki for CheckUser event tables migration (T330158) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "re-run to fix mw1486 - cgoubert@cumin1001"
  • 13:40 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "re-run to fix mw1486 - cgoubert@cumin1001"
  • 13:40 urbanecm@deploy1002: Started scap: Backport for Enable write new on testwiki for CheckUser event tables migration (T330158)
  • 13:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1486.eqiad.wmnet with OS buster
  • 13:38 cgoubert@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1001"
  • 13:38 urbanecm@deploy1002: Finished scap: Backport for Add support for writing both new and old to Hooks.php (T341934 T341586), Follow-up: Add support for writing both new and old to Hooks.php (T341586) (duration: 07m 28s)
  • 13:30 urbanecm@deploy1002: Started scap: Backport for Add support for writing both new and old to Hooks.php (T341934 T341586), Follow-up: Add support for writing both new and old to Hooks.php (T341586)
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T342617)', diff saved to https://phabricator.wikimedia.org/P49704 and previous config saved to /var/cache/conftool/dbconfig/20230725-132121-ladsgroup.json
  • 13:20 godog: powercycle parse1002 - T339340
  • 13:17 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P49702 and previous config saved to /var/cache/conftool/dbconfig/20230725-130615-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P49701 and previous config saved to /var/cache/conftool/dbconfig/20230725-125109-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T342617)', diff saved to https://phabricator.wikimedia.org/P49700 and previous config saved to /var/cache/conftool/dbconfig/20230725-123602-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T342617)', diff saved to https://phabricator.wikimedia.org/P49699 and previous config saved to /var/cache/conftool/dbconfig/20230725-120641-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 11:49 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:49 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:48 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:48 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:48 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:47 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:46 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:45 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:37 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:37 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack - aborrero@cumin1001"
  • 11:36 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack - aborrero@cumin1001"
  • 11:34 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:32 akosiaris: T340087 wikidiff2 rollout done. 1 host is unreachable and will need to be reimaged or upgraded manually to pick this up, parse1002.eqiad.wmnet
  • 11:30 akosiaris: T340087 starting wikidiff2 1.41.1 rollout to eqiad. codfw already done.
  • 11:28 akosiaris: restart php on mw1457
  • 11:25 akosiaris: T340087 keep a copy php-wikidiff2_1.13.0-1_amd64.deb in apt1001:/home/akosiaris/wd/ in case of emergency
  • 11:24 akosiaris: T340087 starting wikidiff2 1.41.1 rollout to codfw
  • 10:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 31 days, 0:00:00 on lvs[1013-1015].eqiad.wmnet with reason: test hosts
  • 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 31 days, 0:00:00 on lvs[1013-1015].eqiad.wmnet with reason: test hosts
  • 09:50 elukey: restart kafka on kafka-main1001 to pick up the new changes - T341558
  • 09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-main1001.eqiad.wmnet with reason: Apply a new setting to the Kafka broker
  • 09:46 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-main1001.eqiad.wmnet with reason: Apply a new setting to the Kafka broker
  • 09:06 slyngs: Restart Tomcat / Apereo CAS on idp1002
  • 09:01 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.19 refs T340247
  • 08:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 08:59 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 08:51 jnuche@deploy1002: Pruned MediaWiki: 1.41.0-wmf.17 (duration: 02m 11s)
  • 08:49 jnuche@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.19 refs T340247 (duration: 52m 35s)
  • 08:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-main1001.eqiad.wmnet with reason: Apply a new setting to the Kafka broker
  • 08:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-main1001.eqiad.wmnet with reason: Apply a new setting to the Kafka broker
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49696 and previous config saved to /var/cache/conftool/dbconfig/20230725-080326-root.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49695 and previous config saved to /var/cache/conftool/dbconfig/20230725-080315-root.json
  • 07:57 jnuche@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.19 refs T340247
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49694 and previous config saved to /var/cache/conftool/dbconfig/20230725-074821-root.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49693 and previous config saved to /var/cache/conftool/dbconfig/20230725-074810-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49692 and previous config saved to /var/cache/conftool/dbconfig/20230725-073317-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49691 and previous config saved to /var/cache/conftool/dbconfig/20230725-073305-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49690 and previous config saved to /var/cache/conftool/dbconfig/20230725-071812-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49689 and previous config saved to /var/cache/conftool/dbconfig/20230725-071801-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49688 and previous config saved to /var/cache/conftool/dbconfig/20230725-070307-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49687 and previous config saved to /var/cache/conftool/dbconfig/20230725-070256-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49686 and previous config saved to /var/cache/conftool/dbconfig/20230725-064802-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49685 and previous config saved to /var/cache/conftool/dbconfig/20230725-064751-root.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49684 and previous config saved to /var/cache/conftool/dbconfig/20230725-063258-root.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49683 and previous config saved to /var/cache/conftool/dbconfig/20230725-063247-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49682 and previous config saved to /var/cache/conftool/dbconfig/20230725-061753-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49681 and previous config saved to /var/cache/conftool/dbconfig/20230725-061742-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1213 (s5, s6)', diff saved to https://phabricator.wikimedia.org/P49680 and previous config saved to /var/cache/conftool/dbconfig/20230725-061319-root.json
  • 06:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs[2004-2006].codfw.wmnet
  • 06:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 05:52 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 05:46 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
  • 05:09 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs[2004-2006].codfw.wmnet
  • 05:08 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts wdqs[2004-2006].codfw.wmnet
  • 04:56 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs[2004-2006].codfw.wmnet
  • 03:47 eileen: civicrm upgraded from ad642712 to 853c14f3
  • 03:31 eileen: civicrm upgraded from d7c8d77e to ad642712 (back to head as the rollback didn't do anything)
  • 03:11 eileen: civicrm changed from ad642712 to d7c8d77e (locked)
  • 02:59 eileen: civicrm upgraded from 6fd25bf6 to ad642712
  • 01:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1014.eqiad.wmnet with OS bullseye
  • 01:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
  • 01:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
  • 00:50 wfan: civicrm upgraded from d7c8d77e to 6fd25bf6
  • 00:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye

2023-07-24

  • 23:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
  • 22:46 zabe@deploy1002: Finished scap: Backport for client: Avoid dynamically registering hook handlers (T341102), HookContainer: avoid instantiation of handlers when calling register() (T341102 T340113 T339834) (duration: 09m 59s)
  • 22:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
  • 22:37 zabe@deploy1002: zabe: Backport for client: Avoid dynamically registering hook handlers (T341102), HookContainer: avoid instantiation of handlers when calling register() (T341102 T340113 T339834) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experime
  • 22:36 zabe@deploy1002: Started scap: Backport for client: Avoid dynamically registering hook handlers (T341102), HookContainer: avoid instantiation of handlers when calling register() (T341102 T340113 T339834)
  • 22:16 jgleeson: civiproxy upgraded from 99cecb92 to c000fc1e
  • 21:28 maryum: Deployed patch for T341565
  • 21:14 sbassett: Deployed updated mitigation for T336027
  • 20:04 dancy@deploy1002: Installing scap version "4.56.0" for 605 hosts
  • 19:29 krinkle@deploy1002: Synchronized lib/: Iaa0cb0c75d4 (duration: 06m 21s)
  • 19:21 krinkle@deploy1002: Synchronized src/Profiler.php: Idada376134 (duration: 06m 30s)
  • 18:09 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:08 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:08 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:07 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:07 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:06 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:39 sukhe: restart ATS to pick up CR 940953: T339134
  • 16:00 topranks: Re-enabling disabled transport from knams to esams after fiber cleaning T337997
  • 14:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1014.eqiad.wmnet with OS bullseye
  • 14:44 vgutierrez: Repooling cp4052 (upload) running ATS 9.2.1 - T339134
  • 14:37 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:36 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:36 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:36 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:34 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:30 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:30 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:29 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:28 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:23 samtar@deploy1002: Finished scap: Backport for Revert "Revert "Run a synthetic test for client side preferences"" (T336527 T339268) (duration: 14m 05s)
  • 14:11 samtar@deploy1002: samtar: Backport for Revert "Revert "Run a synthetic test for client side preferences"" (T336527 T339268) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:09 samtar@deploy1002: Started scap: Backport for Revert "Revert "Run a synthetic test for client side preferences"" (T336527 T339268)
  • 14:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49677 and previous config saved to /var/cache/conftool/dbconfig/20230724-140226-root.json
  • 14:02 samtar@deploy1002: Finished scap: Backport for Revert "Run a synthetic test for client side preferences" (duration: 07m 20s)
  • 14:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
  • 13:56 samtar@deploy1002: samtar: Backport for Revert "Run a synthetic test for client side preferences" synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49676 and previous config saved to /var/cache/conftool/dbconfig/20230724-135604-root.json
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49675 and previous config saved to /var/cache/conftool/dbconfig/20230724-135557-root.json
  • 13:54 samtar@deploy1002: Started scap: Backport for Revert "Run a synthetic test for client side preferences"
  • 13:52 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 13:52 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 13:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:51 TheresNoTime: close UTC afternoon backport window
  • 13:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
  • 13:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:49 TheresNoTime: gerrit:939312 not synced. T336527 T339268
  • 13:48 samtar@deploy1002: Sync cancelled.
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49674 and previous config saved to /var/cache/conftool/dbconfig/20230724-134721-root.json
  • 13:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1014.eqiad.wmnet with OS bullseye
  • 13:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49673 and previous config saved to /var/cache/conftool/dbconfig/20230724-134059-root.json
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49672 and previous config saved to /var/cache/conftool/dbconfig/20230724-134052-root.json
  • 13:38 taavi: run `taavi@mwmaint1002 ~ $ mwscript namespaceDupes.php mywiktionary --fix` after purging null editing page #131577 for T342516
  • 13:34 samtar@deploy1002: samtar and mabualruz: Backport for Run a synthetic test for client side preferences (T336527 T339268) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:33 samtar@deploy1002: Started scap: Backport for Run a synthetic test for client side preferences (T336527 T339268)
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49671 and previous config saved to /var/cache/conftool/dbconfig/20230724-133217-root.json
  • 13:31 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 13:31 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 13:30 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
  • 13:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
  • 13:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1014.eqiad.wmnet with OS bullseye
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49669 and previous config saved to /var/cache/conftool/dbconfig/20230724-132555-root.json
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49668 and previous config saved to /var/cache/conftool/dbconfig/20230724-132548-root.json
  • 13:25 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript namespaceDupes.php mywiktionary --fix` T342516
  • 13:25 samtar@deploy1002: Finished scap: Backport for add citations, concordance, rhymes, reconstruction, therasus, namespaces for mywiktionary (T342516) (duration: 21m 28s)
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49667 and previous config saved to /var/cache/conftool/dbconfig/20230724-131712-root.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49665 and previous config saved to /var/cache/conftool/dbconfig/20230724-131050-root.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49664 and previous config saved to /var/cache/conftool/dbconfig/20230724-131043-root.json
  • 13:05 samtar@deploy1002: anzx and samtar: Backport for add citations, concordance, rhymes, reconstruction, therasus, namespaces for mywiktionary (T342516) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:03 vgutierrez: depooling cp4052 for some ATS 9.2.1 testing - T339134
  • 13:03 samtar@deploy1002: Started scap: Backport for add citations, concordance, rhymes, reconstruction, therasus, namespaces for mywiktionary (T342516)
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49663 and previous config saved to /var/cache/conftool/dbconfig/20230724-130208-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49662 and previous config saved to /var/cache/conftool/dbconfig/20230724-125545-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49661 and previous config saved to /var/cache/conftool/dbconfig/20230724-125538-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49660 and previous config saved to /var/cache/conftool/dbconfig/20230724-124703-root.json
  • 12:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28458
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49659 and previous config saved to /var/cache/conftool/dbconfig/20230724-124040-root.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49658 and previous config saved to /var/cache/conftool/dbconfig/20230724-124034-root.json
  • 12:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 28458
  • 12:36 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49656 and previous config saved to /var/cache/conftool/dbconfig/20230724-123158-root.json
  • 12:31 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49655 and previous config saved to /var/cache/conftool/dbconfig/20230724-122536-root.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49654 and previous config saved to /var/cache/conftool/dbconfig/20230724-122529-root.json
  • 12:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['rdb1013.eqiad.wmnet']
  • 12:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['rdb1013.eqiad.wmnet']
  • 12:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['rdb1014.eqiad.wmnet']
  • 12:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['rdb1013.eqiad.wmnet']
  • 12:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['rdb1014.eqiad.wmnet']
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49653 and previous config saved to /var/cache/conftool/dbconfig/20230724-121653-root.json
  • 12:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['rdb1013.eqiad.wmnet']
  • 12:14 dcausse@deploy1002: Finished deploy [airflow-dags/search@e7b9253]: search: fix table name for wmf_raw.mediawiki_page (duration: 00m 12s)
  • 12:14 dcausse@deploy1002: Started deploy [airflow-dags/search@e7b9253]: search: fix table name for wmf_raw.mediawiki_page
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1187', diff saved to https://phabricator.wikimedia.org/P49652 and previous config saved to /var/cache/conftool/dbconfig/20230724-121329-root.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49651 and previous config saved to /var/cache/conftool/dbconfig/20230724-121031-root.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49650 and previous config saved to /var/cache/conftool/dbconfig/20230724-121024-root.json
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2169 (s6, s7)', diff saved to https://phabricator.wikimedia.org/P49649 and previous config saved to /var/cache/conftool/dbconfig/20230724-120609-root.json
  • 10:58 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:51 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on releases2002.codfw.wmnet,releases1002.eqiad.wmnet with reason: Decommissioning prep
  • 10:51 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on releases2002.codfw.wmnet,releases1002.eqiad.wmnet with reason: Decommissioning prep
  • 10:48 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:47 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:47 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:46 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:46 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:45 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:44 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:41 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:41 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940880 (T342211) to eqiad DC, only one left (disable keepalive on port 80 on A:cp)
  • 10:41 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:39 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1005
  • 10:39 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1005
  • 09:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1124.eqiad.wmnet onto db1133.eqiad.wmnet
  • 09:26 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940873 (T342211) to drmrs DC (disable keepalive on port 80 on A:cp-drmrs)
  • 09:26 dcausse@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 09:24 dcausse@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 09:22 vgutierrez: rollback to trafficserver 9.1.4 in cp4052 - T339134
  • 09:15 ladsgroup@cumin1001: START - Cookbook sre.mysql.clone of db1124.eqiad.wmnet onto db1133.eqiad.wmnet
  • 09:13 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:12 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:08 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:08 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:03 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:01 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:00 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:58 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:57 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:56 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:54 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:45 vgutierrez: testing trafficserver 9.2.1 in cp4052 (upload node) - T339134
  • 08:39 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 08:38 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 08:36 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 08:36 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 08:33 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:33 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:32 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:31 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:30 oblivian@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:30 oblivian@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:29 oblivian@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:28 oblivian@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:22 dcausse@deploy1002: Finished deploy [airflow-dags/search@a47bd0f]: search: Fix partition definition for wmf_raw.mediawiki_page_table (duration: 00m 12s)
  • 08:22 dcausse@deploy1002: Started deploy [airflow-dags/search@a47bd0f]: search: Fix partition definition for wmf_raw.mediawiki_page_table
  • 07:40 urbanecm@deploy1002: Finished scap: Backport for ChangeMentor: Refactor the notification conditions (T336875) (duration: 07m 02s)
  • 07:33 urbanecm@deploy1002: Started scap: Backport for ChangeMentor: Refactor the notification conditions (T336875)
  • 07:32 urbanecm@deploy1002: Finished scap: Backport for Add reassignMentees.php maintenance script (T330071) (duration: 14m 39s)
  • 07:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:25 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:23 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:23 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:17 urbanecm@deploy1002: Started scap: Backport for Add reassignMentees.php maintenance script (T330071)
  • 06:32 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 06:23 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.

2023-07-23

  • 19:53 sukhe@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 19:53 sukhe@cumin2002: START - Cookbook sre.network.cf
  • 01:15 sukhe@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 01:15 sukhe@cumin2002: START - Cookbook sre.network.cf

2023-07-21

  • 21:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1149.eqiad.wmnet with OS bullseye
  • 21:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1149.eqiad.wmnet with reason: host reimage
  • 20:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1149.eqiad.wmnet with reason: host reimage
  • 20:21 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1149.eqiad.wmnet with OS bullseye
  • 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1150.eqiad.wmnet with OS bullseye
  • 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:15 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:14 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:04 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:04 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1150.eqiad.wmnet with reason: host reimage
  • 19:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1150.eqiad.wmnet with reason: host reimage
  • 19:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1150.eqiad.wmnet with OS bullseye
  • 19:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1151.eqiad.wmnet with OS bullseye
  • 19:41 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1151.eqiad.wmnet with reason: host reimage
  • 19:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1151.eqiad.wmnet with reason: host reimage
  • 19:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1151.eqiad.wmnet with OS bullseye
  • 19:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1152.eqiad.wmnet with OS bullseye
  • 18:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1152.eqiad.wmnet with reason: host reimage
  • 18:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1152.eqiad.wmnet with reason: host reimage
  • 18:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1152.eqiad.wmnet with OS bullseye
  • 18:16 dancy@deploy1002: Finished scap: Backport for Enable the CampaignEvents extension before loading CommonSettings-labs (T342452) (duration: 17m 31s)
  • 18:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:00 dancy@deploy1002: daimona and dancy: Backport for Enable the CampaignEvents extension before loading CommonSettings-labs (T342452) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 17:59 dancy@deploy1002: Started scap: Backport for Enable the CampaignEvents extension before loading CommonSettings-labs (T342452)
  • 17:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1152.eqiad.wmnet with reason: host reimage
  • 17:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1152.eqiad.wmnet with reason: host reimage
  • 17:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1152.eqiad.wmnet with OS bullseye
  • 15:35 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 90 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Read-only DB
  • 15:35 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 90 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Read-only DB
  • 15:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.sonic-ssh (exit_code=0) for network device lsw1-e8-eqiad
  • 15:11 ayounsi@cumin1001: START - Cookbook sre.network.sonic-ssh for network device lsw1-e8-eqiad
  • 15:10 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:06 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:05 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:04 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:04 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:58 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1016.eqiad.wmnet with OS bookworm
  • 14:55 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:55 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:54 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:50 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 14:37 sukhe: sudo ipmitool -I lanplus -H "lvs1016.mgmt.eqiad.wmnet" -U root -E chassis power off
  • 14:17 sukhe: sudo ipmitool -I lanplus -H "lvs1016.mgmt.eqiad.wmnet" -U root -E chassis power cycle
  • 14:14 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1016.eqiad.wmnet with OS bookworm
  • 14:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 13:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb1014.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host rdb1014.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host rdb1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host rdb1014
  • 12:50 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host rdb1014
  • 12:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host rdb1013
  • 12:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host rdb1013
  • 12:49 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:49 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt rdb101[34] - jclark@cumin1001"
  • 12:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt rdb101[34] - jclark@cumin1001"
  • 12:46 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 12:39 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 12:38 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 12:38 jbond@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 12:37 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 12:35 jbond@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 12:35 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 12:14 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1016.eqiad.wmnet with OS bookworm
  • 12:03 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 11:47 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1016.eqiad.wmnet with OS bookworm
  • 11:30 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 10:49 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:49 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:34 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:33 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:30 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:27 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1015.eqiad.wmnet with OS bookworm
  • 10:27 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 10:26 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 10:16 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 10:13 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 10:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1015.eqiad.wmnet with reason: host reimage
  • 10:08 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1015.eqiad.wmnet with reason: host reimage
  • 09:58 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 09:58 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 09:58 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1015.eqiad.wmnet with OS bookworm
  • 09:57 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1015.eqiad.wmnet with OS bookworm
  • 09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 09:53 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1015.eqiad.wmnet with OS bookworm
  • 09:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 09:50 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1014.eqiad.wmnet with OS bookworm
  • 09:50 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 09:47 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 09:36 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 09:33 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1014.eqiad.wmnet with reason: host reimage
  • 09:30 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1014.eqiad.wmnet with reason: host reimage
  • 09:19 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1014.eqiad.wmnet with OS bookworm
  • 09:19 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1014.eqiad.wmnet with OS bookworm
  • 09:09 jayme: enable puppet on C:confd - T341669
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49645 and previous config saved to /var/cache/conftool/dbconfig/20230721-090625-root.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49644 and previous config saved to /var/cache/conftool/dbconfig/20230721-090003-root.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49643 and previous config saved to /var/cache/conftool/dbconfig/20230721-085955-root.json
  • 08:59 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1014.eqiad.wmnet with OS bookworm
  • 08:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1108.eqiad.wmnet with reason: db1108 has been replaced with db1208 - leaving for a few days before decom
  • 08:56 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1108.eqiad.wmnet with reason: db1108 has been replaced with db1208 - leaving for a few days before decom
  • 08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudcontrol1005 - aborrero@cumin1001 - T341495"
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49642 and previous config saved to /var/cache/conftool/dbconfig/20230721-085120-root.json
  • 08:48 jayme: ignore "disabling puppet in C:cumin" - was a typo
  • 08:47 jayme: disabling puppet in C:confd - T341669
  • 08:47 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudcontrol1005 - aborrero@cumin1001 - T341495"
  • 08:47 jayme: disabling puppet in C:cumin - T341669
  • 08:45 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudcontrol1005 - aborrero@cumin1001 - T341495"
  • 08:45 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudcontrol1005 - aborrero@cumin1001 - T341495"
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49641 and previous config saved to /var/cache/conftool/dbconfig/20230721-084459-root.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49640 and previous config saved to /var/cache/conftool/dbconfig/20230721-084450-root.json
  • 08:44 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:43 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49639 and previous config saved to /var/cache/conftool/dbconfig/20230721-083616-root.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49638 and previous config saved to /var/cache/conftool/dbconfig/20230721-082954-root.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49637 and previous config saved to /var/cache/conftool/dbconfig/20230721-082946-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49636 and previous config saved to /var/cache/conftool/dbconfig/20230721-082111-root.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49635 and previous config saved to /var/cache/conftool/dbconfig/20230721-081449-root.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49634 and previous config saved to /var/cache/conftool/dbconfig/20230721-081441-root.json
  • 08:10 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49633 and previous config saved to /var/cache/conftool/dbconfig/20230721-080606-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49632 and previous config saved to /var/cache/conftool/dbconfig/20230721-075944-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49631 and previous config saved to /var/cache/conftool/dbconfig/20230721-075936-root.json
  • 07:57 zabe@deploy1002: Finished scap: T342405 (duration: 07m 03s)
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49630 and previous config saved to /var/cache/conftool/dbconfig/20230721-075101-root.json
  • 07:50 zabe@deploy1002: Started scap: T342405
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49629 and previous config saved to /var/cache/conftool/dbconfig/20230721-074440-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49628 and previous config saved to /var/cache/conftool/dbconfig/20230721-074431-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49627 and previous config saved to /var/cache/conftool/dbconfig/20230721-073557-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49626 and previous config saved to /var/cache/conftool/dbconfig/20230721-072935-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49625 and previous config saved to /var/cache/conftool/dbconfig/20230721-072927-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49624 and previous config saved to /var/cache/conftool/dbconfig/20230721-072052-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1201', diff saved to https://phabricator.wikimedia.org/P49623 and previous config saved to /var/cache/conftool/dbconfig/20230721-071623-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49622 and previous config saved to /var/cache/conftool/dbconfig/20230721-071430-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49621 and previous config saved to /var/cache/conftool/dbconfig/20230721-071422-root.json
  • 07:12 marostegui: Upgrade dbstore1005 to mariadb 10.6 T334652
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2171 (s5 and s6)', diff saved to https://phabricator.wikimedia.org/P49620 and previous config saved to /var/cache/conftool/dbconfig/20230721-070110-root.json
  • 06:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3209
  • 06:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3209
  • 06:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398203
  • 06:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398203
  • 06:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 139418
  • 06:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 139418
  • 06:36 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 139148
  • 06:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 139148
  • 04:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:01:00 on 10 hosts with reason: trying to remove downtime on these new hosts
  • 04:00 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 0:01:00 on 10 hosts with reason: trying to remove downtime on these new hosts
  • 03:51 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs202([1-2])\.codfw\.wmnet
  • 03:50 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=active; selector: name=wdqs222([0-1])\.codfw\.wmnet
  • 00:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1153.eqiad.wmnet with OS bullseye
  • 00:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1153.eqiad.wmnet with reason: host reimage

2023-07-20

  • 23:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1153.eqiad.wmnet with reason: host reimage
  • 23:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1153.eqiad.wmnet with OS bullseye
  • 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1154.eqiad.wmnet with OS bullseye
  • 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1154.eqiad.wmnet with reason: host reimage
  • 23:13 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1154.eqiad.wmnet with reason: host reimage
  • 22:54 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1154.eqiad.wmnet with OS bullseye
  • 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1155.eqiad.wmnet with OS bullseye
  • 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:47 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1155.eqiad.wmnet with reason: host reimage
  • 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1155.eqiad.wmnet with reason: host reimage
  • 22:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1155.eqiad.wmnet with OS bullseye
  • 21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1156.eqiad.wmnet with OS bullseye
  • 21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1156.eqiad.wmnet with reason: host reimage
  • 21:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1156.eqiad.wmnet with reason: host reimage
  • 21:18 hashar@deploy1002: Finished deploy [integration/docroot@0e476e5]: Tweak Zuul status page css 🥚 (duration: 00m 07s)
  • 21:18 hashar@deploy1002: Started deploy [integration/docroot@0e476e5]: Tweak Zuul status page css 🥚
  • 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1156.eqiad.wmnet with OS bullseye
  • 20:25 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:25 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:17 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:44 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.18 refs T340246
  • 18:44 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2020.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs20{20}.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2019.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2018.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2017.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2016.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2015.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2014.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2013.codfw.wmnet
  • 18:41 bking@cumin1001: conftool action : set/pooled=yes,set/weight=10; selector: name=wdqs2013-19.codfw.wmnet
  • 18:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:38 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 18:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk1002.eqiad.wmnet with OS bookworm
  • 17:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye
  • 17:39 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk1002.eqiad.wmnet with reason: host reimage
  • 17:36 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk1002.eqiad.wmnet with reason: host reimage
  • 17:25 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1002.eqiad.wmnet with OS bookworm
  • 17:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 17:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 17:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk1001.eqiad.wmnet with OS bookworm
  • 17:21 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm
  • 17:21 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 17:12 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 17:09 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 17:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk1001.eqiad.wmnet with reason: host reimage
  • 16:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 16:56 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk1001.eqiad.wmnet with reason: host reimage
  • 16:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 16:53 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 16:52 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 16:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 16:49 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 16:49 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 16:49 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940190 (T342211) to codfw DC (disable keepalive on port 80 on A:cp-codfw)
  • 16:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1073.eqiad.wmnet with OS bullseye
  • 16:43 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
  • 16:42 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1001.eqiad.wmnet with OS bookworm
  • 16:41 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1013.eqiad.wmnet with OS bookworm
  • 16:40 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
  • 16:38 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 16:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 16:37 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 16:31 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 16:22 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1005
  • 16:21 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1005
  • 16:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye
  • 16:21 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:21 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005 - aborrero@cumin1001"
  • 16:20 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005 - aborrero@cumin1001"
  • 16:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1073.eqiad.wmnet with reason: host reimage
  • 16:18 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 16:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1073.eqiad.wmnet with reason: host reimage
  • 16:15 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
  • 16:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1073.eqiad.wmnet with OS bullseye
  • 16:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 16:03 aborrero@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcontrol1005
  • 16:03 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1005
  • 16:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 15:51 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:51 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:50 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:49 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:48 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 15:48 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 15:48 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
  • 15:46 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 15:46 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 15:46 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:31 elukey: stop kafka main eqiad maintenance - T341558
  • 15:20 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:16 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 15:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1073.eqiad.wmnet with reason: host reimage
  • 15:08 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1073.eqiad.wmnet with reason: host reimage
  • 15:07 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:06 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bullseye
  • 14:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1075.eqiad.wmnet with OS bullseye
  • 14:58 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940150 (T342211) to ulsfo DC (disable keepalive on port 80 on A:cp-ulsfo)
  • 14:56 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:56 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk1003.eqiad.wmnet with reason: host reimage
  • 14:51 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk1003.eqiad.wmnet with reason: host reimage
  • 14:51 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
  • 14:50 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1073.eqiad.wmnet with OS bullseye
  • 14:45 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts analytics1073.eqiad.wmnet
  • 14:45 herron: roll restart codfw/eqiad low-traffic pybals to add prometheus-https T326657
  • 14:45 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts analytics1073.eqiad.wmnet
  • 14:41 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:41 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:38 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 14:38 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 14:37 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 14:36 sukhe: run agent on cumin -b1 -s30 'A:dns-rec and not P{dns4004*}'
  • 14:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1075.eqiad.wmnet with reason: host reimage
  • 14:32 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 14:31 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1075.eqiad.wmnet with reason: host reimage
  • 14:30 sukhe: disable puppet on A:dns-rec to slowly roll out CR 937991
  • 14:15 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1075.eqiad.wmnet with OS bullseye
  • 14:14 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 14:13 sukhe: dns1004 upgrade to pdns-rec 4.8.4: T341611
  • 14:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:04 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:04 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:01 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 13:59 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:57 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 13:57 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 13:55 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 13:55 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:54 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 13:54 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 13:53 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:48 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:47 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:45 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
  • 13:32 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:31 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for vlan ints lsw1-f8-eqiad - cmooney@cumin1001"
  • 13:21 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for vlan ints lsw1-f8-eqiad - cmooney@cumin1001"
  • 13:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:18 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 13:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:17 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 13:12 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 13:10 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:10 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for vlan ints lsw1-f8-eqiad - cmooney@cumin1001"
  • 13:09 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for vlan ints lsw1-f8-eqiad - cmooney@cumin1001"
  • 13:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:00 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 12:56 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 12:44 topranks: LDAP - adding user ifrahkh to groups wmde & nda
  • 12:43 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 12:43 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 12:24 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 12:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for assigned switch gw ips. - cmooney@cumin1001"
  • 12:20 zabe@deploy1002: Finished scap: Backport for SpecialUserRights: Check for username to be temporary (T340468 T342322) (duration: 08m 22s)
  • 12:15 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for assigned switch gw ips. - cmooney@cumin1001"
  • 12:13 zabe@deploy1002: zabe and dreamyjazz: Backport for SpecialUserRights: Check for username to be temporary (T340468 T342322) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:13 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:12 zabe@deploy1002: Started scap: Backport for SpecialUserRights: Check for username to be temporary (T340468 T342322)
  • 11:53 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:53 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for assigned switch loopbacks. - cmooney@cumin1001"
  • 11:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for assigned switch loopbacks. - cmooney@cumin1001"
  • 11:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 11:35 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940101 (T342211) to eqsin DC (disable keepalive on port 80 on A:cp-eqsin)
  • 10:53 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:40 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:33 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:50 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:50 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove reverse dns for IP allocated in error. - cmooney@cumin1001"
  • 09:26 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw1357.eqiad.wmnet
  • 09:25 filippo@cumin1001: conftool action : set/weight=10; selector: name=mw1357.eqiad.wmnet
  • 09:25 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw1356.eqiad.wmnet
  • 09:25 filippo@cumin1001: conftool action : set/weight=10; selector: name=mw1356.eqiad.wmnet
  • 09:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove reverse dns for IP allocated in error. - cmooney@cumin1001"
  • 09:19 filippo@cumin1001: conftool action : set/pooled=no; selector: name=mw1357.eqiad.wmnet
  • 09:19 filippo@cumin1001: conftool action : set/pooled=no; selector: name=mw1356.eqiad.wmnet
  • 09:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:17 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove reverse dns for IP allocated in error. - cmooney@cumin1001"
  • 09:15 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove reverse dns for IP allocated in error. - cmooney@cumin1001"
  • 09:12 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 08:31 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940091 (T342211) to esams DC (disable keepalive on port 80)
  • 08:29 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 08:27 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 08:25 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 08:24 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 08:24 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:23 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:21 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 08:21 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 07:56 apergos: UTC morning backport and config training window really complete
  • 07:50 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 07:50 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 07:40 apergos: UTC morning backport and config training window reopened for fix to the last noc patch
  • 07:37 apergos: UTC morning backport and config training window complete
  • 07:36 ariel@deploy1002: Finished scap: Backport for noc/db.php: use the new etcd fetch function (T341859) (duration: 09m 14s)
  • 07:29 ariel@deploy1002: oblivian and ariel: Backport for noc/db.php: use the new etcd fetch function (T341859) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:27 ariel@deploy1002: Started scap: Backport for noc/db.php: use the new etcd fetch function (T341859)
  • 07:25 ariel@deploy1002: Finished scap: Backport for noc: add script to dump etcd db config (T341859) (duration: 09m 35s)
  • 07:17 ariel@deploy1002: oblivian and ariel: Backport for noc: add script to dump etcd db config (T341859) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:16 ariel@deploy1002: Started scap: Backport for noc: add script to dump etcd db config (T341859)
  • 07:12 ariel@deploy1002: Finished scap: Backport for Enable EditInSequence in pawikisource (duration: 09m 52s)
  • 07:04 ariel@deploy1002: ariel and soda: Backport for Enable EditInSequence in pawikisource synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:02 ariel@deploy1002: Started scap: Backport for Enable EditInSequence in pawikisource
  • 06:37 elukey: start kafka main eqiad maintenance (partitions rebalancing) - T341558
  • 04:33 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=parse1002.*
  • 04:28 eileen: civicrm upgraded from 0cde2608 to d7c8d77e
  • 01:46 tstarling@deploy1002: Synchronized php-1.41.0-wmf.18/includes/diff/DifferenceEngine.php: fix prod error T342099, T341961 (duration: 08m 32s)
  • 01:35 tstarling@deploy1002: Synchronized php-1.41.0-wmf.17/includes/diff/DifferenceEngine.php: fix prod error T342099, T341961 (duration: 09m 20s)

2023-07-19

  • 22:36 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 22:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 22:08 eileen: civicrm upgraded from 7642b3d9 to 0cde2608
  • 21:41 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 21:38 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:37 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:37 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 21:37 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 21:37 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:37 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:36 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:32 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 21:32 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 21:27 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 01m 05s)
  • 21:26 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 21:21 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 01m 47s)
  • 21:20 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 20:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk1003.eqiad.wmnet
  • 20:55 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:55 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 20:54 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 20:43 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:39 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk1003.eqiad.wmnet
  • 20:39 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 20:39 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:38 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:38 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 20:33 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 20:33 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 20:31 TheresNoTime: backport window closed
  • 20:28 samtar@deploy1002: Finished scap: Backport for Replace underscores with spaces in 4 Arabic sitenames (T337725) (duration: 17m 09s)
  • 20:26 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:26 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse dns for spine linknets eqiad - cmooney@cumin1001"
  • 20:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse dns for spine linknets eqiad - cmooney@cumin1001"
  • 20:22 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:12 samtar@deploy1002: samtar and hubaishan: Backport for Replace underscores with spaces in 4 Arabic sitenames (T337725) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:11 samtar@deploy1002: Started scap: Backport for Replace underscores with spaces in 4 Arabic sitenames (T337725)
  • 19:45 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling reboot on A:durum and A:durum
  • 19:42 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 19:41 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 19:40 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 19:40 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 19:40 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 19:40 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:40 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 19:39 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 19:37 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 19:37 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 19:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk1003.eqiad.wmnet
  • 19:36 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:36 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 19:35 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 19:28 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 19:24 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk1003.eqiad.wmnet
  • 19:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
  • 18:58 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
  • 18:41 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling reboot on A:durum and A:durum
  • 18:29 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough and A:wikidough
  • 18:25 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.18 refs T340246
  • 17:58 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:50 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1013.eqiad.wmnet with OS bullseye
  • 17:49 Amir1: powercycled db1218 (T342284)
  • 17:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1218.eqiad.wmnet with reason: Maint
  • 17:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1218.eqiad.wmnet with reason: Maint
  • 17:41 robh@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bullseye
  • 17:40 sukhe@cumin2002: dbctl commit (dc=all): 'Depool db1218', diff saved to https://phabricator.wikimedia.org/P49603 and previous config saved to /var/cache/conftool/dbconfig/20230719-174019-sukhe.json
  • 17:40 sukhe: depool db1218
  • 17:32 sukhe: dummy run of authdns-update
  • 17:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1011.eqiad.wmnet
  • 17:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1011.eqiad.wmnet
  • 17:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1010.eqiad.wmnet
  • 17:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5004.wikimedia.org
  • 17:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1010.eqiad.wmnet
  • 17:15 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns5004.wikimedia.org
  • 17:09 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough and A:wikidough
  • 17:02 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 17:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1009.eqiad.wmnet
  • 16:56 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:56 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1009.eqiad.wmnet
  • 16:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 16:47 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:46 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:30 joal@deploy1002: Finished deploy [airflow-dags/analytics@4c06501]: Fix bug introduced in cassandra loading jobs (duration: 00m 15s)
  • 16:29 joal@deploy1002: Started deploy [airflow-dags/analytics@4c06501]: Fix bug introduced in cassandra loading jobs
  • 16:26 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:26 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for e4 mgmt entries - cmooney@cumin1001"
  • 16:25 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for e4 mgmt entries - cmooney@cumin1001"
  • 16:21 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:20 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:20 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for e4 mgmt entries - cmooney@cumin1001"
  • 16:17 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for e4 mgmt entries - cmooney@cumin1001"
  • 16:14 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:46 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 15:44 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 15:43 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 15:40 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1073.eqiad.wmnet with OS bullseye
  • 15:35 apergos: dumpsdata1007 is now the fallback host for sql/xml dumps and for misc dumps. dumpsdata1004, the former fallback host, is now a spare.
  • 15:34 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1075.eqiad.wmnet with OS bullseye
  • 15:32 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:28 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:26 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:26 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: trying to resolve netbox issues - sukhe@cumin2002"
  • 15:25 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: trying to resolve netbox issues - sukhe@cumin2002"
  • 15:23 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:23 sukhe@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:23 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
  • 15:19 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:18 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:18 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:14 robh: mw140[89] downtime for relocation per T308339
  • 15:13 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
  • 15:11 robh: mw141[01] returned to service per T308339
  • 15:11 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 15:11 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:11 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw1411
  • 15:11 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host mw1411
  • 15:09 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 15:09 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 15:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 15:03 fabfur: disabling keepalive on port 80 for cp5024 https://gerrit.wikimedia.org/r/939707 (T342211)
  • 14:59 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 14:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 14:58 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 14:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 14:54 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcontrol1005 - jclark@cumin1001"
  • 14:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcontrol1005 - jclark@cumin1001"
  • 14:51 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:49 robh: mw141[23] returned to service per T308339. ignore typo of mw1414 it is uninvolved
  • 14:48 robh: mw141[34] returned to service per T308339
  • 14:40 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw1412
  • 14:39 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host mw1412
  • 14:39 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw1413
  • 14:38 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host mw1413
  • 14:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5003.wikimedia.org
  • 14:35 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:33 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns5003.wikimedia.org
  • 14:30 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 14:30 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:28 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:28 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 14:21 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 14:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
  • 14:19 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1073.eqiad.wmnet with OS bullseye
  • 14:16 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 14:16 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:14 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1075.eqiad.wmnet with OS bullseye
  • 14:14 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 14:07 robh: mw141[23] downtimes and relocating per T308339
  • 13:54 Lucas_WMDE: pulled tests: Test setting names (T342249) to deploy1002 (no scap sync needed, tests-only change)
  • 13:46 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 13:46 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 13:46 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 13:46 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:46 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 13:45 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 13:42 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 13:42 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 13:42 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 13:42 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:42 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 13:42 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 13:39 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 13:39 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 13:31 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 13:29 fabfur: aborted previous operations, no need to disable puppet to apply that CR (https://gerrit.wikimedia.org/r/c/operations/puppet/+/939661) (T342211)
  • 13:27 fabfur: temporary disable puppet on cp3052 to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/939661 (T342211)
  • 13:26 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:15 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
  • 13:13 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix) (T318433) (duration: 10m 47s)
  • 13:04 lucaswerkmeister-wmde@deploy1002: ssastry and lucaswerkmeister-wmde: Backport for Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix) (T318433) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:02 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix) (T318433)
  • 12:43 joal@deploy1002: Finished deploy [airflow-dags/analytics@87be328]: Refactor cassandra loading jobs (duration: 00m 14s)
  • 12:43 joal@deploy1002: Started deploy [airflow-dags/analytics@87be328]: Refactor cassandra loading jobs
  • 12:27 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/services/ipoid: apply
  • 12:27 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/services/ipoid: apply
  • 12:22 jbond: switch puppertboard.wikimedia.oreg to use puppet7 infrastructre
  • 12:22 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:22 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:17 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet on all recursors
  • 12:17 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet on all recursors
  • 12:17 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard-next.discovery.wmnet on all recursors
  • 12:17 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard-next.discovery.wmnet on all recursors
  • 11:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1016.eqiad.wmnet
  • 11:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 11:47 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 11:45 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 11:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1016.eqiad.wmnet
  • 11:13 jebe@deploy1002: Finished deploy [analytics/refinery@eaabff2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@eaabff2] (duration: 01m 43s)
  • 11:12 jebe@deploy1002: Started deploy [analytics/refinery@eaabff2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@eaabff2]
  • 11:11 jebe@deploy1002: Finished deploy [analytics/refinery@eaabff2] (thin): Regular analytics weekly train THIN [analytics/refinery@eaabff2] (duration: 00m 04s)
  • 11:11 jebe@deploy1002: Started deploy [analytics/refinery@eaabff2] (thin): Regular analytics weekly train THIN [analytics/refinery@eaabff2]
  • 11:09 jebe@deploy1002: Finished deploy [analytics/refinery@eaabff2]: Regular analytics weekly train [analytics/refinery@eaabff2] (duration: 10m 24s)
  • 10:59 jebe@deploy1002: Started deploy [analytics/refinery@eaabff2]: Regular analytics weekly train [analytics/refinery@eaabff2]
  • 10:02 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:54 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 09:54 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 09:50 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 09:48 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 09:43 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:14 btullis@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 04s)
  • 09:14 btullis@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49599 and previous config saved to /var/cache/conftool/dbconfig/20230719-091205-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49598 and previous config saved to /var/cache/conftool/dbconfig/20230719-090328-root.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49597 and previous config saved to /var/cache/conftool/dbconfig/20230719-085700-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49596 and previous config saved to /var/cache/conftool/dbconfig/20230719-084823-root.json
  • 08:45 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49595 and previous config saved to /var/cache/conftool/dbconfig/20230719-084156-root.json
  • 08:38 dcausse: closing the UTC morning backport window
  • 08:37 dcausse@deploy1002: Finished scap: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate (duration: 07m 59s)
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49594 and previous config saved to /var/cache/conftool/dbconfig/20230719-083319-root.json
  • 08:30 dcausse@deploy1002: dcausse: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:29 dcausse@deploy1002: Started scap: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49593 and previous config saved to /var/cache/conftool/dbconfig/20230719-082651-root.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49592 and previous config saved to /var/cache/conftool/dbconfig/20230719-081814-root.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49591 and previous config saved to /var/cache/conftool/dbconfig/20230719-081146-root.json
  • 08:10 dcausse@deploy1002: Finished scap: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate (duration: 07m 36s)
  • 08:04 dcausse@deploy1002: dcausse: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49590 and previous config saved to /var/cache/conftool/dbconfig/20230719-080309-root.json
  • 08:02 dcausse@deploy1002: Started scap: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49589 and previous config saved to /var/cache/conftool/dbconfig/20230719-075642-root.json
  • 07:54 _joe_: ran scap pull, pool on parse1002 after powercycling
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49588 and previous config saved to /var/cache/conftool/dbconfig/20230719-074804-root.json
  • 07:47 _joe_: powercycling parse1002, console blank, unreachable to network
  • 07:46 dcausse@deploy1002: Backport cancelled.
  • 07:45 oblivian@cumin1001: conftool action : set/pooled=inactive; selector: name=parse1002.eqiad.wmnet
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49587 and previous config saved to /var/cache/conftool/dbconfig/20230719-074137-root.json
  • 07:36 dcausse@deploy1002: Finished scap: Backport for Add channel for TtmServerMessageUpdate of Translate extension (duration: 17m 44s)
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49586 and previous config saved to /var/cache/conftool/dbconfig/20230719-073300-root.json
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49585 and previous config saved to /var/cache/conftool/dbconfig/20230719-072632-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P49584 and previous config saved to /var/cache/conftool/dbconfig/20230719-072207-root.json
  • 07:20 dcausse@deploy1002: dcausse and abi: Backport for Add channel for TtmServerMessageUpdate of Translate extension synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:18 dcausse@deploy1002: Started scap: Backport for Add channel for TtmServerMessageUpdate of Translate extension
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49583 and previous config saved to /var/cache/conftool/dbconfig/20230719-071755-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2158', diff saved to https://phabricator.wikimedia.org/P49582 and previous config saved to /var/cache/conftool/dbconfig/20230719-071204-root.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49581 and previous config saved to /var/cache/conftool/dbconfig/20230719-062313-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49580 and previous config saved to /var/cache/conftool/dbconfig/20230719-060809-root.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49579 and previous config saved to /var/cache/conftool/dbconfig/20230719-055304-root.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49578 and previous config saved to /var/cache/conftool/dbconfig/20230719-053759-root.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49577 and previous config saved to /var/cache/conftool/dbconfig/20230719-052254-root.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49576 and previous config saved to /var/cache/conftool/dbconfig/20230719-050750-root.json
  • 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49575 and previous config saved to /var/cache/conftool/dbconfig/20230719-045245-root.json
  • 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49574 and previous config saved to /var/cache/conftool/dbconfig/20230719-043740-root.json
  • 00:16 eileen: civicrm upgraded from 67c526e7 to 7642b3d9

2023-07-18

  • 22:51 brett@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on P{doh5002*} and A:wikidough
  • 22:44 brett@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on P{doh5002*} and A:wikidough
  • 22:34 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 22:34 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 22:32 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 22:32 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 22:24 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 22:24 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 22:24 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 22:24 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:24 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 22:23 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 22:18 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 22:18 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 22:18 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 22:18 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:18 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 22:16 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 22:12 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 22:12 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 22:06 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 22:06 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 21:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host analytics1073.eqiad.wmnet
  • 21:22 urbanecm@deploy1002: Finished scap: Backport for Don't log for documentElement (nodeType 9) (T340081) (duration: 07m 42s)
  • 21:15 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Don't log for documentElement (nodeType 9) (T340081) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 21:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 21:15 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:14 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:14 urbanecm@deploy1002: Started scap: Backport for Don't log for documentElement (nodeType 9) (T340081)
  • 21:14 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 21:14 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 21:14 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:14 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:13 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:10 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 21:10 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 21:03 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1002.eqiad.wmnet
  • 21:03 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1002.eqiad.wmnet on all recursors
  • 21:03 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1002.eqiad.wmnet on all recursors
  • 21:03 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:03 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 21:02 urbanecm@deploy1002: Finished scap: Backport for Don't log for documentElement (nodeType 9) (T340081) (duration: 10m 01s)
  • 21:02 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 20:54 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Don't log for documentElement (nodeType 9) (T340081) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:52 urbanecm@deploy1002: Started scap: Backport for Don't log for documentElement (nodeType 9) (T340081)
  • 20:48 btullis@cumin1001: START - Cookbook sre.hosts.dhcp for host analytics1073.eqiad.wmnet
  • 20:43 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:43 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1002.eqiad.wmnet on all recursors
  • 20:43 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1002.eqiad.wmnet on all recursors
  • 20:43 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:43 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 20:41 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 20:28 urbanecm@deploy1002: Finished scap: Backport for Add additional debugging closest bug (T340081), Add additional debugging closest bug (T340081), Fixes: Mobile login watermark large and uncentered (T341812) (duration: 10m 28s)
  • 20:26 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:26 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1002.eqiad.wmnet
  • 20:19 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Add additional debugging closest bug (T340081), Add additional debugging closest bug (T340081), Fixes: Mobile login watermark large and uncentered (T341812) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes de
  • 20:17 urbanecm@deploy1002: Started scap: Backport for Add additional debugging closest bug (T340081), Add additional debugging closest bug (T340081), Fixes: Mobile login watermark large and uncentered (T341812)
  • 20:16 urbanecm@deploy1002: Finished scap: Backport for Deploy new logos (T341260 T341243 T341912) (duration: 09m 50s)
  • 20:07 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Deploy new logos (T341260 T341243 T341912) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:06 urbanecm@deploy1002: Started scap: Backport for Deploy new logos (T341260 T341243 T341912)
  • 19:53 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1075.eqiad.wmnet']
  • 19:53 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1075.eqiad.wmnet']
  • 19:53 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 19:52 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 19:50 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 19:49 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 19:49 btullis@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['analytics1073.mgmt.eqiad.wmnet']
  • 19:49 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.mgmt.eqiad.wmnet']
  • 18:57 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 18s)
  • 18:57 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 18:54 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
  • 18:54 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 18:51 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 17s)
  • 18:51 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 18:16 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1002.eqiad.wmnet
  • 18:16 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1002.eqiad.wmnet with OS bookworm
  • 18:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.18 refs T340246
  • 17:46 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 17:45 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1075.eqiad.wmnet with OS bullseye
  • 17:30 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1016
  • 17:30 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs1016
  • 17:29 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:29 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016 relocation - robh@cumin1001"
  • 17:29 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016 relocation - robh@cumin1001"
  • 17:27 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:25 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1002.eqiad.wmnet with OS bookworm
  • 17:21 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 17:20 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 17:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1002.eqiad.wmnet on all recursors
  • 17:19 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1002.eqiad.wmnet on all recursors
  • 17:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:19 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 17:19 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough-drmrs and A:wikidough
  • 17:19 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 17:16 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 17:16 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1002.eqiad.wmnet
  • 17:07 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 17:04 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough-drmrs and A:wikidough
  • 17:02 dancy@deploy1002: Installation of scap version "4.55.0" completed for 605 hosts
  • 17:01 dancy@deploy1002: Installing scap version "4.55.0" for 605 hosts
  • 16:33 dancy@deploy1002: Pruned MediaWiki: 1.41.0-wmf.16 (duration: 02m 11s)
  • 16:30 dancy@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.18 refs T340246 (duration: 46m 15s)
  • 16:28 elukey: maintenance finished for kafka main-codfw
  • 16:24 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1075.eqiad.wmnet with OS bullseye
  • 16:03 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1015
  • 16:03 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs1015
  • 16:03 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1014
  • 16:03 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs10145 relocation - robh@cumin1001"
  • 16:03 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs1014
  • 16:02 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs10145 relocation - robh@cumin1001"
  • 16:01 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bullseye
  • 16:00 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 15:44 dancy@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.18 refs T340246
  • 15:31 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs1016.eqiad.wmnet
  • 15:31 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:31 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 15:31 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 15:28 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 15:23 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts lvs1016.eqiad.wmnet
  • 15:22 robh@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bullseye
  • 15:21 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1013
  • 15:21 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs1013
  • 15:20 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bullseye
  • 15:18 robh@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bullseye
  • 15:08 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host lvs1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:02 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs1015.eqiad.wmnet
  • 15:02 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 15:00 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:57 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:57 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1013 relocation - robh@cumin1001"
  • 14:56 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1013 relocation - robh@cumin1001"
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1198', diff saved to https://phabricator.wikimedia.org/P49571 and previous config saved to /var/cache/conftool/dbconfig/20230718-145529-root.json
  • 14:54 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 14:54 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts lvs1015.eqiad.wmnet
  • 14:45 sukhe: dns2004 upgrade to pdns-rec 4.8.4: T341611
  • 14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.3 - ayounsi@cumin1001
  • 14:37 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs1014.eqiad.wmnet
  • 14:37 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 14:36 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 14:36 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.3 - ayounsi@cumin1001
  • 14:35 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:35 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:34 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 14:34 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:33 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:30 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:29 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts lvs1014.eqiad.wmnet
  • 14:22 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs1013.eqiad.wmnet
  • 14:22 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:22 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 14:22 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 14:19 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1001.eqiad.wmnet
  • 14:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1001.eqiad.wmnet on all recursors
  • 14:19 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1001.eqiad.wmnet on all recursors
  • 14:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:19 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 14:18 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 14:16 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:16 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1001.eqiad.wmnet on all recursors
  • 14:16 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1001.eqiad.wmnet on all recursors
  • 14:16 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:16 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 14:12 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 14:10 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 14:05 XioNoX: asw2-esams# set interfaces xe-4/0/4 disable - T342121
  • 14:04 jforrester@deploy1002: Finished scap: Backport for private/readme.php: Add $wgCampaignEventsProgramsAndEventsDashboardAPISecret (T320260) (duration: 08m 04s)
  • 14:04 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts lvs1013.eqiad.wmnet
  • 13:58 jforrester@deploy1002: jforrester and daimona: Backport for private/readme.php: Add $wgCampaignEventsProgramsAndEventsDashboardAPISecret (T320260) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:56 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 13:56 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1001.eqiad.wmnet
  • 13:56 jforrester@deploy1002: Started scap: Backport for private/readme.php: Add $wgCampaignEventsProgramsAndEventsDashboardAPISecret (T320260)
  • 13:55 jforrester@deploy1002: Finished scap: Backport for prod: Enable wgCampaignEventsProgramsAndEventsDashboardInstance (T320260) (duration: 21m 19s)
  • 13:43 jbond: upload python3-conftool_2.2.2-1+deb12u1
  • 13:43 jbond: upload python3-conftool_2.2.2-1
  • 13:43 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:42 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:42 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:41 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:39 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:35 jforrester@deploy1002: daimona and jforrester: Backport for prod: Enable wgCampaignEventsProgramsAndEventsDashboardInstance (T320260) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:35 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 03s)
  • 13:35 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
  • 13:34 jforrester@deploy1002: Started scap: Backport for prod: Enable wgCampaignEventsProgramsAndEventsDashboardInstance (T320260)
  • 13:26 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host analytics1073.eqiad.wmnet
  • 13:26 jforrester@deploy1002: Finished scap: Backport for Add wikifunctions.org to wgCentralNoticeContentSecurityPolicy (T275945) (duration: 07m 52s)
  • 13:20 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 03s)
  • 13:20 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
  • 13:20 jforrester@deploy1002: jforrester: Backport for Add wikifunctions.org to wgCentralNoticeContentSecurityPolicy (T275945) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:18 jforrester@deploy1002: Started scap: Backport for Add wikifunctions.org to wgCentralNoticeContentSecurityPolicy (T275945)
  • 13:17 jforrester@deploy1002: sync-world aborted: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219) (duration: 00m 06s)
  • 13:17 jforrester@deploy1002: Started scap: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219)
  • 13:16 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 13:15 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 13:13 jforrester@deploy1002: Finished scap: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219) (duration: 08m 40s)
  • 13:12 derick@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 13:10 derick@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 13:09 derick@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 13:08 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.mgmt.eqiad.wmnet']
  • 13:08 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.mgmt.eqiad.wmnet']
  • 13:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 13:07 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 13:07 btullis@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 13:07 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 13:07 derick@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 13:06 derick@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 13:06 jforrester@deploy1002: jforrester: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:04 derick@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 13:04 jforrester@deploy1002: Started scap: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219)
  • 13:00 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 12:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1074.eqiad.wmnet with OS bullseye
  • 12:42 btullis@cumin1001: START - Cookbook sre.hosts.dhcp for host analytics1073.eqiad.wmnet
  • 12:40 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1073.eqiad.wmnet with OS bullseye
  • 12:37 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:37 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:37 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:28 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 12:23 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:23 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:09 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:09 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:08 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1074.eqiad.wmnet with reason: host reimage
  • 12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1015.eqiad.wmnet
  • 12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 12:04 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1074.eqiad.wmnet with reason: host reimage
  • 12:04 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 12:01 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1015.eqiad.wmnet
  • 11:51 jbond@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-f8-eqiad
  • 11:50 jbond@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
  • 11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f8-eqiad
  • 11:39 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
  • 11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f8-eqiad
  • 11:39 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
  • 11:27 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-f8-eqiad
  • 11:27 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
  • 11:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
  • 11:27 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
  • 11:25 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-f8-eqiad
  • 11:25 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
  • 11:24 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-f8-eqiad
  • 11:24 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
  • 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
  • 11:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
  • 11:22 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
  • 11:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
  • 11:22 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
  • 11:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1074.eqiad.wmnet with OS bullseye
  • 11:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
  • 11:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
  • 11:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
  • 11:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1071.eqiad.wmnet with OS bullseye
  • 11:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
  • 11:03 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
  • 10:57 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:56 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:55 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:55 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:55 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:55 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:54 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
  • 10:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 10:42 topranks: repool esams after successful move of cr3-knams to new rack T337997
  • 10:41 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:40 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:39 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:39 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:38 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:38 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:38 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 10:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
  • 10:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
  • 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1003.eqiad.wmnet
  • 10:24 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr3-knams,cr3-knams IPv6
  • 10:24 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr3-knams,cr3-knams IPv6
  • 10:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1071.eqiad.wmnet with reason: host reimage
  • 10:17 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1071.eqiad.wmnet with reason: host reimage
  • 10:11 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
  • 10:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
  • 10:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
  • 10:02 fabfur: fix last entry: correct CR is https://gerrit.wikimedia.org/r/939242
  • 10:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
  • 10:02 fabfur: enable puppet on A:cp-esams for https://gerrit.wikimedia.org/r/939235 (T340983) (hosts will run puppet with the usual schedule)
  • 10:02 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
  • 10:00 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1071.eqiad.wmnet with OS bullseye
  • 09:52 fabfur: disable puppet on A:cp-esams to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/939242 (T340983)
  • 09:51 arturo: deploying https://gerrit.wikimedia.org/r/c/operations/homer/public/+/938819 via homer to cr-eqiad & cr-codfw
  • 09:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1075.eqiad.wmnet
  • 09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:28 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
  • 09:28 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:27 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:24 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1075.eqiad.wmnet
  • 09:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1074.eqiad.wmnet
  • 09:24 XioNoX: remove asw-b1-codfw from asw-b-codfw VC - T342076
  • 09:21 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:21 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:20 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
  • 09:17 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:16 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:16 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1074.eqiad.wmnet
  • 09:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2073.codfw.wmnet
  • 09:09 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1073.eqiad.wmnet
  • 09:08 ladsgroup@deploy1002: Finished scap: Backport for ores: use envoy proxy for Lift Wing (T319170) (duration: 14m 56s)
  • 09:07 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:02 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1073.eqiad.wmnet
  • 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2073.codfw.wmnet
  • 08:58 fabfur: enable puppet on A:cp-eqiad for https://gerrit.wikimedia.org/r/939235 (T340983) (hosts will run puppet with the usual schedule)
  • 08:57 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores: use envoy proxy for Lift Wing (T319170) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2072.codfw.wmnet
  • 08:56 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1072.eqiad.wmnet
  • 08:55 fabfur: disable puppet on A:cp-eqiad to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/939235 (T340983)
  • 08:53 ladsgroup@deploy1002: Started scap: Backport for ores: use envoy proxy for Lift Wing (T319170)
  • 08:48 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1072.eqiad.wmnet
  • 08:48 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2072.codfw.wmnet
  • 08:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2071.codfw.wmnet
  • 08:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
  • 08:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2071.codfw.wmnet
  • 08:37 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
  • 08:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2070.codfw.wmnet
  • 08:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
  • 08:28 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2070.codfw.wmnet
  • 08:27 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
  • 08:25 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
  • 08:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
  • 08:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
  • 08:17 fabfur: enable puppet on A:cp-drmrs for https://gerrit.wikimedia.org/r/c/operations/puppet/+/938902/ (T340983) (hosts will run puppet with the usual schedule)
  • 08:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
  • 08:13 fabfur: disable puppet on A:cp-drmrs to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938902/ (T340983)
  • 08:09 topranks: cr3-knams going offline for move
  • 08:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
  • 08:08 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
  • 07:16 elukey: restart kafka main-codfw rebalances (long maintenance) - T341558
  • 06:48 XioNoX: disable asw-b-codfw:ae0 (to cloudsw1-b1-codfw) - T342076
  • 06:36 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cr3-knams,cr3-knams IPv6 with reason: Downtime cr3-knams ahead of remote hands moving router
  • 06:36 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cr3-knams,cr3-knams IPv6 with reason: Downtime cr3-knams ahead of remote hands moving router

2023-07-17

  • 21:57 btullis@deploy1002: Finished deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint (duration: 02m 10s)
  • 21:55 btullis@deploy1002: Started deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint
  • 21:55 btullis@deploy1002: Finished deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint (duration: 136m 46s)
  • 21:53 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1001.eqiad.wmnet
  • 21:52 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1001.eqiad.wmnet with OS bookworm
  • 21:37 eileen: civicrm upgraded from 2c60d58d to 67c526e7
  • 21:19 jgleeson: payments-wiki upgraded from d76b9085 to c9e298c9
  • 21:16 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 47s)
  • 21:15 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 21:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1001.eqiad.wmnet with OS bookworm
  • 21:00 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 21:00 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 20:59 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1001.eqiad.wmnet on all recursors
  • 20:59 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1001.eqiad.wmnet on all recursors
  • 20:59 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:59 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 20:51 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 20:43 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:19 taavi: taavi@mwmaint1002 ~ $ echo "https://en.wikipedia.org/static/images/mobile/copyright/wikiquote-wordmark-bn.svg" | mwscript purgeList.php --wiki enwiki
  • 20:13 taavi@deploy1002: Finished scap: Backport for bnwikiquote: Update wordmark (T341910) (duration: 08m 34s)
  • 20:06 taavi@deploy1002: taavi and stang: Backport for bnwikiquote: Update wordmark (T341910) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:05 taavi@deploy1002: Started scap: Backport for bnwikiquote: Update wordmark (T341910)
  • 20:03 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:03 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1001.eqiad.wmnet
  • 19:38 btullis@deploy1002: Started deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint
  • 18:58 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:58 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 18:48 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:34 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 02m 41s)
  • 17:31 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 17:19 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 01m 50s)
  • 17:18 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 16:42 urbanecm@deploy1002: Finished scap: Backport for Fix UserDatabaseHelper::hasMainspaceEdits (T341994) (duration: 08m 43s)
  • 16:34 urbanecm@deploy1002: urbanecm: Backport for Fix UserDatabaseHelper::hasMainspaceEdits (T341994) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:33 urbanecm@deploy1002: Started scap: Backport for Fix UserDatabaseHelper::hasMainspaceEdits (T341994)
  • 16:29 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 16:29 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 16:28 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 16:12 elukey: stop kafka-main codfw maintenance - T341558
  • 16:08 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 16:08 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 16:07 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 16:05 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 16:05 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 16:04 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 16:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
  • 16:02 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
  • 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:56 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 15:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
  • 15:50 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:49 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
  • 15:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
  • 15:49 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:49 fabfur@cumin1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 15:48 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
  • 15:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
  • 15:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
  • 15:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
  • 15:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 15:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
  • 15:22 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
  • 15:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
  • 15:14 dancy@deploy1002: Installing scap version "4.54.0" for 605 hosts
  • 15:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
  • 15:09 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
  • 15:09 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
  • 15:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
  • 15:04 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
  • 15:02 sukhe: dns5003 upgrade to pdns-rec 4.8.4: T341611
  • 14:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
  • 14:57 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:57 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1072.eqiad.wmnet with OS bullseye
  • 14:41 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 14:39 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
  • 14:38 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 14:36 elukey: restart rsyslog on centrallog1002 ("peer did not provide a certificate, not permitted to talk to it")
  • 14:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
  • 14:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
  • 14:24 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ores2003.codfw.wmnet
  • 14:21 klausman@puppetmaster1001: conftool action : set/pooled=no; selector: name=ores2003.codfw.wmnet
  • 14:20 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
  • 14:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
  • 14:20 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
  • 14:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
  • 14:17 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 14:16 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
  • 14:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1072.eqiad.wmnet with reason: host reimage
  • 14:12 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:12 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
  • 14:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
  • 14:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
  • 14:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
  • 14:10 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1072.eqiad.wmnet with reason: host reimage
  • 14:10 elukey: start kafka partitions rebalance for main-codfw (long running maintenance, see https://phabricator.wikimedia.org/T341558)
  • 14:09 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1003.eqiad.wmnet
  • 14:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
  • 14:03 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
  • 14:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
  • 13:54 lucaswerkmeister-wmde: Deployed security patch for T340217
  • 13:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
  • 13:53 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1072.eqiad.wmnet with OS bullseye
  • 13:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
  • 13:50 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1072.eqiad.wmnet with OS bullseye
  • 13:48 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 13:47 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
  • 13:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
  • 13:46 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:45 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:43 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
  • 13:43 akosiaris: deploy removal of nutcracker from thumbor. T318695
  • 13:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 13:42 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:42 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:40 fabfur: reimaging cp4037 as preparatory test for knams migration
  • 13:38 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
  • 13:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
  • 13:37 taavi@deploy1002: Finished scap: Backport for NewImpact: fix undefined log function (T341865) (duration: 10m 19s)
  • 13:33 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1072.eqiad.wmnet with OS bullseye
  • 13:32 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
  • 13:28 taavi@deploy1002: taavi and urbanecm: Backport for NewImpact: fix undefined log function (T341865) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:27 taavi: taavi@mwmaint1002 ~ $ mwscript namespaceDupes.php --wiki huwiktionary --fix # T341926
  • 13:27 taavi@deploy1002: Started scap: Backport for NewImpact: fix undefined log function (T341865)
  • 13:26 taavi@deploy1002: Finished scap: Backport for change wgExtraNamespaces , wgNamespaceAliases for mnwwiktionary (T341940), Add appendix namespace aliases on huwiktionary (T341926), robots.txt: Disable indexing draft-related pages on knwiki (T341958) (duration: 19m 48s)
  • 13:25 taavi: taavi@deploy1002 ~ $ mwscript namespaceDupes.php --wiki mnwwiktionary --fix # T341940
  • 13:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
  • 13:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
  • 13:16 taavi@deploy1002: taavi and anzx: Backport for change wgExtraNamespaces , wgNamespaceAliases for mnwwiktionary (T341940), Add appendix namespace aliases on huwiktionary (T341926), robots.txt: Disable indexing draft-related pages on knwiki (T341958) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqia
  • 13:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
  • 13:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it
  • 13:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it
  • 13:09 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
  • 13:08 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 13:07 fabfur: enabled puppet on A:cp-codfw to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938840 (T340983) (hosts will run puppet with the usual schedule)
  • 13:06 taavi@deploy1002: Started scap: Backport for change wgExtraNamespaces , wgNamespaceAliases for mnwwiktionary (T341940), Add appendix namespace aliases on huwiktionary (T341926), robots.txt: Disable indexing draft-related pages on knwiki (T341958)
  • 13:04 fabfur: run puppet on cp2027 to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/938840 (T340983)
  • 13:03 sukhe: run authdns-update to depool esams
  • 12:58 fabfur: disabled puppet on A:cp-codfw to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938840 (T340983)
  • 12:54 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:54 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:53 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
  • 12:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1005.wikimedia.org
  • 12:01 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:01 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
  • 11:53 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
  • 11:46 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
  • 11:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
  • 11:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
  • 11:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
  • 11:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
  • 11:38 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
  • 11:36 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
  • 11:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
  • 11:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1070.eqiad.wmnet with OS bullseye
  • 11:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
  • 11:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
  • 11:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
  • 11:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
  • 11:26 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
  • 11:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
  • 11:18 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
  • 11:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
  • 11:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
  • 11:12 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1005.wikimedia.org
  • 11:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1070.eqiad.wmnet with reason: host reimage
  • 11:10 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
  • 11:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
  • 11:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
  • 11:08 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
  • 11:07 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1070.eqiad.wmnet with reason: host reimage
  • 10:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
  • 10:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
  • 10:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
  • 10:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
  • 10:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
  • 10:45 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
  • 10:45 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
  • 10:44 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1070.eqiad.wmnet with OS bullseye
  • 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
  • 10:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1010.eqiad.wmnet
  • 10:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
  • 10:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
  • 10:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1010.eqiad.wmnet
  • 10:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2048.codfw.wmnet
  • 10:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2002.codfw.wmnet
  • 10:30 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
  • 10:24 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
  • 10:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
  • 10:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2002.codfw.wmnet
  • 10:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
  • 10:21 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
  • 10:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
  • 10:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
  • 10:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2001.codfw.wmnet
  • 10:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
  • 10:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2001.codfw.wmnet
  • 09:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1002.eqiad.wmnet
  • 09:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1002.eqiad.wmnet
  • 09:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1001.eqiad.wmnet
  • 09:48 fabfur: enabled puppet on A:cp hosts in ulsfo to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938807 (T340983) (hosts will run puppet with the usual schedule)
  • 09:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
  • 09:44 fabfur: disabled puppet on A:cp hosts in ulsfo to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938807 (T340983)
  • 09:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1001.eqiad.wmnet
  • 09:42 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-stretch1001.eqiad.wmnet
  • 09:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1001.eqiad.wmnet
  • 09:39 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
  • 09:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
  • 09:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
  • 09:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
  • 09:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
  • 09:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
  • 09:27 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
  • 09:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
  • 09:22 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
  • 09:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
  • 09:18 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
  • 09:18 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 09:17 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:17 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
  • 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
  • 09:01 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
  • 08:51 fabfur: enable puppet on A:cp-eqsin to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938002 (T340983)
  • 08:37 fabfur: enable puppet on cp5024 and cp5032 to deploy 938002
  • 08:30 fabfur: disable puppet on all cp* hosts in eqsin to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938002 (T340983)
  • 04:33 hashar@deploy1002: Finished deploy [gerrit/gerrit@1153a16]: wm-checks-api: check undefined real_author (2) - T328484 (duration: 00m 08s)
  • 04:33 hashar@deploy1002: Started deploy [gerrit/gerrit@1153a16]: wm-checks-api: check undefined real_author (2) - T328484
  • 04:09 hashar@deploy1002: Finished deploy [gerrit/gerrit@cad3002]: wm-checks-api: check undefined real_author - T328484 (duration: 00m 08s)
  • 04:08 hashar@deploy1002: Started deploy [gerrit/gerrit@cad3002]: wm-checks-api: check undefined real_author - T328484

2023-07-16

  • 23:23 eileen: civicrm upgraded from 562224c1 to 2c60d58d
  • 17:20 apergos: starting rsync of sql/xml dumps files, pulling from dumpsdata1004, running in ariel screen session on dumpsdata1007, bw limited to 1G

2023-07-14

  • 19:57 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:56 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:55 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:55 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-worker1153.eqiad.wmnet with OS bullseye
  • 19:19 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@37d3ad6]: Run page_content_change_to_wikitext_raw DAG serially. T335860 (duration: 00m 14s)
  • 19:19 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@37d3ad6]: Run page_content_change_to_wikitext_raw DAG serially. T335860
  • 18:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1153.eqiad.wmnet with OS bullseye
  • 16:05 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:04 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:25 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:22 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:43 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:42 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 10:41 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 10:40 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 10:38 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 10:38 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 09:02 klausman@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=ores2003.codfw.wmnet
  • 09:02 klausman: Setting ores2003 to pooled=inactive wheile we attempt repairs/decide on decom
  • 08:51 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 08:48 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 08:48 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 08:47 _joe_: deploying to mw on k8s for T341825
  • 08:47 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 07:16 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 07:13 hashar@deploy1002: Finished deploy [integration/docroot@56b5745]: Add mwbot-rs to doc.wikimedia.org - T341543 (duration: 00m 08s)
  • 07:13 hashar@deploy1002: Started deploy [integration/docroot@56b5745]: Add mwbot-rs to doc.wikimedia.org - T341543
  • 07:12 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 07:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
  • 07:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
  • 07:06 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 07:04 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:04 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 06:24 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:16 oblivian@deploy1002: Started scap: (no justification provided)

2023-07-13

  • 20:59 inflatador: bking@cumin1001 'disable puppet on hosts using zookeeper class T341792'
  • 20:37 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@889e13f]: (no justification provided) (duration: 00m 23s)
  • 20:37 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@889e13f]: (no justification provided)
  • 20:29 taavi@deploy1002: Finished scap: Backport for Avoid calling wfMessage in the hook handler constructor (T339272) (duration: 07m 38s)
  • 20:23 taavi@deploy1002: func and taavi: Backport for Avoid calling wfMessage in the hook handler constructor (T339272) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:22 taavi@deploy1002: Started scap: Backport for Avoid calling wfMessage in the hook handler constructor (T339272)
  • 20:19 taavi@deploy1002: func and taavi: Backport for Avoid calling wfMessage in the hook handler constructor (T339272) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:17 taavi@deploy1002: Started scap: Backport for Avoid calling wfMessage in the hook handler constructor (T339272)
  • 20:16 taavi@deploy1002: Finished scap: Backport for Set default for UseLegacyMediaStyles and disable on officewiki (T318433) (duration: 07m 47s)
  • 20:10 taavi@deploy1002: taavi and arlolra: Backport for Set default for UseLegacyMediaStyles and disable on officewiki (T318433) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:08 taavi@deploy1002: Started scap: Backport for Set default for UseLegacyMediaStyles and disable on officewiki (T318433)
  • 18:11 milimetric@deploy1002: Finished deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint (duration: 03m 39s)
  • 18:08 milimetric@deploy1002: Started deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint
  • 18:06 milimetric@deploy1002: Finished deploy [analytics/aqs/deploy@91f8d92]: Deploying new AQS endpoint (duration: 00m 05s)
  • 18:06 milimetric@deploy1002: Started deploy [analytics/aqs/deploy@91f8d92]: Deploying new AQS endpoint
  • 16:40 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 16:40 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 16:14 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 16:13 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:56 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 15:56 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 15:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 15:54 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sretest2002
  • 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2002 decommissioned, removing all IPs except the asset tag one - pt1979@cumin2002"
  • 15:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2002 decommissioned, removing all IPs except the asset tag one - pt1979@cumin2002"
  • 15:30 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:25 pt1979@cumin2002: START - Cookbook sre.hosts.decommission for hosts sretest2002
  • 15:18 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:17 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:45 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:44 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:43 elukey: depool ores2003 to allow DCops maintenance work
  • 14:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it
  • 14:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it
  • 14:32 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:32 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:29 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:28 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:28 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:26 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:19 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:16 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:21 urbanecm: UTC afternoon B&C window done
  • 13:19 urbanecm: Run `mwscript namespaceDupes.php --wiki=mnwwiktionary --fix` (T330689)
  • 13:17 urbanecm@deploy1002: Finished scap: Backport for Create Reconstruction and Rhymes namespaces in mnwwiktionary (T330689) (duration: 09m 46s)
  • 13:09 urbanecm@deploy1002: anzx and urbanecm: Backport for Create Reconstruction and Rhymes namespaces in mnwwiktionary (T330689) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:08 urbanecm@deploy1002: Started scap: Backport for Create Reconstruction and Rhymes namespaces in mnwwiktionary (T330689)
  • 12:47 Lucas_WMDE: Start `mwscript DiscussionTools:persistRevisionThreadItems ruwiki --current --all --start '["10086120"]'; touch ~/T315510-ruwiki-exited-$?` in tmux on mwmaint1002 (T315510)
  • 11:38 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:38 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:32 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1014.eqiad.wmnet
  • 11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:30 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 11:29 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1014.eqiad.wmnet
  • 10:57 vgutierrez: restarting pybal on lvs1020
  • 10:35 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:34 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:19 fabfur: puppet enabled on cp3052 and cp5017 and new configuration applied (https://gerrit.wikimedia.org/r/c/operations/puppet/+/936701)
  • 10:15 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 10:15 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 10:13 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 10:12 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 10:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 10:11 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:07 fabfur: disable puppet on cp3052 and cp5017 to safely monitor https://gerrit.wikimedia.org/r/c/operations/puppet/+/936701
  • 10:04 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:04 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:03 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:03 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 09:11 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 09:11 elukey: increased kafka partitions for mediawiki.job.cirrusSearchLinksUpdate and mediawiki.job.cirrusSearchLinksUpdate (eqiad/codfw) - T341558
  • 09:10 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:09 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 09:09 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:04 XioNoX: update NAT on pfw3-eqiad - T340252
  • 08:14 hashar: Restarting CI Jenkins for plugin installation
  • 07:55 apergos: UTC morning backport and config deployment window done
  • 07:53 ariel@deploy1002: Finished scap: Backport for [idwikiquote] Change the logo and add a wordmark (T341177) (duration: 08m 20s)
  • 07:47 ariel@deploy1002: ariel and superpes: Backport for [idwikiquote] Change the logo and add a wordmark (T341177) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 07:45 ariel@deploy1002: Started scap: Backport for [idwikiquote] Change the logo and add a wordmark (T341177)
  • 07:40 ariel@deploy1002: Finished scap: Backport for [idwikiquote] Change the sitename and the project namespace (T341177) (duration: 09m 43s)
  • 07:32 ariel@deploy1002: ariel and superpes: Backport for [idwikiquote] Change the sitename and the project namespace (T341177) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:31 ariel@deploy1002: Started scap: Backport for [idwikiquote] Change the sitename and the project namespace (T341177)
  • 07:27 ariel@deploy1002: Finished scap: Backport for [mywiki] Create 'autopatrolled' and 'patroller' usergroups (T341026) (duration: 08m 39s)
  • 07:20 ariel@deploy1002: ariel and superpes: Backport for [mywiki] Create 'autopatrolled' and 'patroller' usergroups (T341026) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:18 ariel@deploy1002: Started scap: Backport for [mywiki] Create 'autopatrolled' and 'patroller' usergroups (T341026)
  • 07:14 ariel@deploy1002: Finished scap: Backport for [knwiki] Reverting the temporary logo and updating logo/wordmark/tagline (T338136) (duration: 08m 35s)
  • 07:07 ariel@deploy1002: superpes and ariel: Backport for [knwiki] Reverting the temporary logo and updating logo/wordmark/tagline (T338136) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:05 ariel@deploy1002: Started scap: Backport for [knwiki] Reverting the temporary logo and updating logo/wordmark/tagline (T338136)
  • 04:22 eileen: civicrm upgraded from 4521c00a to 562224c1
  • 03:05 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[25,26,27,30,33].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 02:52 eileen: civicrm upgraded from 882e2310 to 4521c00a
  • 02:33 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[25,26,27,30,33].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 01:59 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase10[18,25,26,27,30,33].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 01:50 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[18,25,26,27,30,33].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 01:31 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[17,22,23,24,29,32].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 00:53 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[17,22,23,24,29,32].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 00:52 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase10[17,22,23,24,29,32].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 00:50 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[17,22,23,24,29,32].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 00:47 eileen: config revision changed from ccc33b1e to e3e5a11d - renabled jobs
  • 00:33 eileen: civicrm upgraded from 1bfc3b17 to 882e2310
  • 00:17 eileen: drush @wmff civicrm-upgrade-db
  • 00:08 eileen: config revision changed from 6ac88ea8 to ccc33b1e (I pushed the upgrade code out)

2023-07-12

  • 23:59 eileen: config revision changed from c543419d to ccc33b1e
  • 23:57 eileen: config revision changed from 6ac88ea8 to c543419d
  • 23:47 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[16,19,20,21,28,31].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 23:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:07 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[16,19,20,21,28,31].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 22:18 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[12,17,18,23,26,27].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 21:49 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:43 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[12,17,18,23,26,27].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 21:33 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[16,20,22,25].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 21:32 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 21:32 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 21:09 TheresNoTime: close UTC late backport window
  • 21:09 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[16,20,22,25].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 21:08 samtar@deploy1002: Finished scap: Backport for ruwikibooks: Add NS104 to wgNamespacesToBeSearchedDefault (T341708) (duration: 08m 10s)
  • 21:02 samtar@deploy1002: stang and samtar: Backport for ruwikibooks: Add NS104 to wgNamespacesToBeSearchedDefault (T341708) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:00 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 21:00 samtar@deploy1002: Started scap: Backport for ruwikibooks: Add NS104 to wgNamespacesToBeSearchedDefault (T341708)
  • 21:00 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 20:59 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase20[15-16,20,22,25].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:59 samtar@deploy1002: Finished scap: Backport for Fix mediawiki.special_diff_interactions configuration (duration: 08m 47s)
  • 20:52 samtar@deploy1002: samtar and urbanecm: Backport for Fix mediawiki.special_diff_interactions configuration synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:50 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[15-16,20,22,25].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:50 samtar@deploy1002: Started scap: Backport for Fix mediawiki.special_diff_interactions configuration
  • 20:49 samtar@deploy1002: Finished scap: Backport for log additional events on Special:Diff|MobileDiff (T326212) (duration: 26m 41s)
  • 20:48 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[14,21,24].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:29 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[14,21,24].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:29 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[13,19].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:24 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 20:24 samtar@deploy1002: jsn and samtar: Backport for log additional events on Special:Diff|MobileDiff (T326212) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:23 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 20:22 samtar@deploy1002: Started scap: Backport for log additional events on Special:Diff|MobileDiff (T326212)
  • 20:21 samtar@deploy1002: Finished scap: Backport for Fix Error: Module "./ext.pageTriage.defaultTagsOptions.js" is not loaded (T340112) (duration: 09m 27s)
  • 20:20 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 20:20 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 20:20 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 20:20 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 20:17 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[13,19].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:13 samtar@deploy1002: samtar: Backport for Fix Error: Module "./ext.pageTriage.defaultTagsOptions.js" is not loaded (T340112) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:11 samtar@deploy1002: Started scap: Backport for Fix Error: Module "./ext.pageTriage.defaultTagsOptions.js" is not loaded (T340112)
  • 20:10 samtar@deploy1002: Finished scap: Backport for [ruwiki] Add permissions to 'editor' usergroup (T341707) (duration: 08m 04s)
  • 20:08 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2013.codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:04 samtar@deploy1002: samtar: Backport for [ruwiki] Add permissions to 'editor' usergroup (T341707) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:02 samtar@deploy1002: Started scap: Backport for [ruwiki] Add permissions to 'editor' usergroup (T341707)
  • 19:57 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2013.codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 18:39 dduvall@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.17 refs T340245 (duration: 06m 16s)
  • 18:33 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.17 refs T340245
  • 18:24 dduvall@deploy1002: Finished scap: Backport for QueryMessageGroupActionApi: Apply sorting to groups only (T341627) (duration: 08m 22s)
  • 18:17 dduvall@deploy1002: abi and dduvall: Backport for QueryMessageGroupActionApi: Apply sorting to groups only (T341627) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 18:16 dduvall@deploy1002: Started scap: Backport for QueryMessageGroupActionApi: Apply sorting to groups only (T341627)
  • 17:10 sukhe: restart pybal on lvs1018
  • 17:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6001.drmrs.wmnet with OS bullseye
  • 16:59 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 16:59 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 16:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 16:47 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 16:42 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 16:42 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 16:41 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 16:41 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1013.eqiad.wmnet
  • 16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 16:38 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 16:37 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 16:37 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 16:35 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 16:32 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@a0e00cb] (releasing): (no justification provided) (duration: 00m 58s)
  • 16:31 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@a0e00cb] (releasing): (no justification provided)
  • 16:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1013.eqiad.wmnet
  • 16:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bullseye
  • 16:21 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host durum6001.drmrs.wmnet with OS bookworm
  • 16:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 15:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 15:43 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:43 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:42 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:42 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:35 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:34 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:11 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 15:08 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bookworm
  • 15:07 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:05 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 15:03 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:49 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Temporarily allow OAuth on non-API entry points again (T341656) (duration: 08m 03s)
  • 14:48 sukhe: upgrade dns2004 to gdnsd 3.99.0~alpha2
  • 14:42 lucaswerkmeister-wmde@deploy1002: tgr and lucaswerkmeister-wmde: Backport for Temporarily allow OAuth on non-API entry points again (T341656) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:41 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Temporarily allow OAuth on non-API entry points again (T341656)
  • 14:17 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 14:11 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 14:07 sukhe: dns4003: upgrade to pdns-rec 4.8.4: T341611
  • 13:59 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:57 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:56 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:56 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:46 sukhe: doh6001: upgrade to pdns-rec 4.8.4: T341611
  • 13:44 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:43 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:42 sukhe: reprepro -C main include bullseye-wikimedia pdns-recursor_4.8.4-1+wmf11u1_amd64.changes: T341611
  • 13:40 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:38 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Add new campaign_events.event_answers_status column (T341142) (duration: 07m 59s)
  • 13:34 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:31 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 13:31 lucaswerkmeister-wmde@deploy1002: daimona and lucaswerkmeister-wmde: Backport for Add new campaign_events.event_answers_status column (T341142) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:30 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Add new campaign_events.event_answers_status column (T341142)
  • 13:29 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Html: Support more attr types in getTextInputAttributes() (T341566) (duration: 07m 40s)
  • 13:28 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 47s)
  • 13:27 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 13:27 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 02m 30s)
  • 13:24 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 13:24 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 02m 09s)
  • 13:23 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Html: Support more attr types in getTextInputAttributes() (T341566) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:22 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 13:21 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Html: Support more attr types in getTextInputAttributes() (T341566)
  • 13:06 moritzm: installing node-tough-cookie security updates
  • 12:54 moritzm: rebalance ganeti codfw/D following reboots
  • 12:52 moritzm: imported wikidiff2 1.14.1-0+wmf1+buster1+icu67u1 to component/icu67 T340087 T329491
  • 12:44 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 12:30 akosiaris: upgrade wikidiff2 1.13.0-1+wmf1+buster1 -> 1.14.1-0+wmf1+buster1 on mw-canary hosts T340087
  • 12:11 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
  • 11:50 moritzm: installing apache2 security updates on Bullseye
  • 11:48 moritzm: installing wireshark security updates
  • 11:44 moritzm: rebalance ganeti codfw/C following reboots
  • 11:43 hnowlan: rebuilding fluent-bit image to include wmf-certificates
  • 11:33 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
  • 11:26 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
  • 11:00 ladsgroup@deploy1002: Finished scap: Backport for fix: add request headers properly (T319170) (duration: 10m 20s)
  • 10:51 ladsgroup@deploy1002: ladsgroup: Backport for fix: add request headers properly (T319170) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 10:50 ladsgroup@deploy1002: Started scap: Backport for fix: add request headers properly (T319170)
  • 10:49 ladsgroup@deploy1002: Finished scap: Backport for Externallinks: Keep domain wildcard if path is not specified (T326251) (duration: 08m 09s)
  • 10:42 ladsgroup@deploy1002: ladsgroup: Backport for Externallinks: Keep domain wildcard if path is not specified (T326251) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 10:40 ladsgroup@deploy1002: Started scap: Backport for Externallinks: Keep domain wildcard if path is not specified (T326251)
  • 09:35 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:35 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:34 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:33 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:55 moritzm: move secondary instances away from ganeti2014 T341546
  • 07:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0)
  • 07:29 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 07:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0)
  • 07:26 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:55 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
  • 06:55 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0)
  • 06:47 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0)
  • 06:45 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:37 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
  • 06:37 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:18 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
  • 06:18 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:16 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
  • 06:16 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:11 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
  • 06:11 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:06 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
  • 06:06 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 04:36 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 04:35 tchin@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 03:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 03:41 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 03:39 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 03:29 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 03:29 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 03:16 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 03:15 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 03:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 03:14 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 01:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 19:00:00 on wdqs[2013,2022].codfw.wmnet with reason: new host
  • 01:46 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 19:00:00 on wdqs[2013,2022].codfw.wmnet with reason: new host

2023-07-11

  • 21:51 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:45 urbanecm@deploy1002: Finished scap: Backport for Logos: Fixes grantswiki and idwiktionary, Drop idwiktionary wordmark, Always return the class as string from Html::getTextInputAttributes (T341566) (duration: 11m 10s)
  • 20:35 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Logos: Fixes grantswiki and idwiktionary, Drop idwiktionary wordmark, Always return the class as string from Html::getTextInputAttributes (T341566) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:33 urbanecm@deploy1002: Started scap: Backport for Logos: Fixes grantswiki and idwiktionary, Drop idwiktionary wordmark, Always return the class as string from Html::getTextInputAttributes (T341566)
  • 20:32 urbanecm@deploy1002: Sync cancelled.
  • 20:26 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Logos: Fixes grantswiki and idwiktionary synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:25 urbanecm@deploy1002: Started scap: Backport for Logos: Fixes grantswiki and idwiktionary
  • 20:16 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:14 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:49 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:48 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 18:57 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.17 refs T340245
  • 18:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:53 dduvall@deploy1002: Pruned MediaWiki: 1.41.0-wmf.15 (duration: 02m 16s)
  • 17:50 dduvall@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.17 refs T340245 (duration: 45m 50s)
  • 17:05 dduvall@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.17 refs T340245
  • 16:52 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 16:28 vgutierrez: reenabling puppet in cp6002
  • 16:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 16:08 sukhe: upgrade dns1004 to gdnsd 3.99.0~alpha2
  • 16:04 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase20[13-27].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 16:03 Lucas_WMDE: previous backport also included Remove oversampling for Navigation Timing extension. (T337858)
  • 16:02 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Add option for html label in Menu template (T340217) (duration: 09m 15s)
  • 15:54 lucaswerkmeister-wmde@deploy1002: jdlrobson and lucaswerkmeister-wmde: Backport for Add option for html label in Menu template (T340217) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 15:54 Krinkle: Deployed https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/930712 ("Remove oversampling for Navigation Timing extension.")
  • 15:53 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Add option for html label in Menu template (T340217)
  • 15:48 krinkle@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: pending security problem, see mediawiki_security IRC (duration: 17m 03s)
  • 15:31 krinkle@deploy1002: Locking from deployment [ALL REPOSITORIES]: pending security problem, see mediawiki_security IRC
  • 15:26 krinkle@deploy1002: Sync cancelled.
  • 15:24 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[13-27].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 15:22 krinkle@deploy1002: phedenskog and krinkle: Backport for Remove oversampling for Navigation Timing extension. (T337858) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:21 krinkle@deploy1002: Started scap: Backport for Remove oversampling for Navigation Timing extension. (T337858)
  • 15:17 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching A:restbase-codfw: Applying JVM update - eevans@cumin1001
  • 15:09 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Applying JVM update - eevans@cumin1001
  • 14:49 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:21 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 14:19 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 14:17 moritzm: restarting apache on mw canaries
  • 14:17 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 14:15 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 14:12 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 14:02 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:00 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 13:59 moritzm: installing yajl security updates
  • 13:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:57 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 13:49 moritzm: rebalance ganeti group eqiad/d after reboots
  • 13:42 jgiannelos@deploy1002: Finished deploy [restbase/deploy@930f075]: (no justification provided) (duration: 19m 50s)
  • 13:33 urbanecm@deploy1002: Finished scap: Backport for Enable tabs for non loggedin mobile users on knwikisource (T340276) (duration: 11m 33s)
  • 13:23 urbanecm@deploy1002: urbanecm and anzx: Backport for Enable tabs for non loggedin mobile users on knwikisource (T340276) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:22 jgiannelos@deploy1002: Started deploy [restbase/deploy@930f075]: (no justification provided)
  • 13:21 urbanecm@deploy1002: Started scap: Backport for Enable tabs for non loggedin mobile users on knwikisource (T340276)
  • 13:21 urbanecm@deploy1002: Finished scap: Backport for Growth: Increase mentorship percentage to 25% on enwiki (T341399) (duration: 07m 15s)
  • 13:14 urbanecm@deploy1002: Started scap: Backport for Growth: Increase mentorship percentage to 25% on enwiki (T341399)
  • 13:13 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137) (duration: 09m 45s)
  • 13:05 urbanecm@deploy1002: sgimeno and urbanecm: Backport for GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:03 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137)
  • 13:00 jbond@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 12:59 jbond@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 12:59 jbond@cumin1001: END (ERROR) - Cookbook sre.postgresql.postgres-init (exit_code=97)
  • 12:53 jbond@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 12:00 XioNoX: decom datahop in knams - T340049
  • 11:42 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 11:38 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 11:37 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 11:27 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 11:17 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 11:06 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 10:46 moritzm: installing libx11 security updates
  • 10:44 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:44 ladsgroup@deploy1002: Sync cancelled.
  • 10:39 ladsgroup@deploy1002: ladsgroup: Backport for ExternalLinks: Make oneWildcard avoid adding wildcard to domain (T326251) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 10:38 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:37 ladsgroup@deploy1002: Started scap: Backport for ExternalLinks: Make oneWildcard avoid adding wildcard to domain (T326251)
  • 10:19 moritzm: rebalance ganeti group codfw/C after reboots
  • 10:03 ladsgroup@deploy1002: Finished scap: Backport for Override liftwing hostname (T319170) (duration: 14m 34s)
  • 09:52 ladsgroup@deploy1002: ladsgroup: Backport for Override liftwing hostname (T319170) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:52 jbond: disable puppet fleet wide to deploy 936273
  • 09:49 ladsgroup@deploy1002: Started scap: Backport for Override liftwing hostname (T319170)
  • 09:47 jbond: renable puppet
  • 09:43 hashar: Updating Zuul configuration which was stall to a version from March 29th after the switchover from contint2001 to contint2002 | T324659 T341556
  • 09:36 jbond: deploy gerrit:936273 enable submitting data to puppetdb7
  • 09:30 jbond: disable puppet fleet wide to deploy 936273
  • 09:08 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kafkamon1003.eqiad.wmnet
  • 09:06 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 09:06 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 09:06 jayme: enabled puppet on 'P{R:Package = envoyproxy}'
  • 09:01 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 09:01 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 08:59 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 08:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1003.eqiad.wmnet
  • 08:59 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 08:43 volans: previous downtiming completed
  • 08:40 volans: downtiming service 'Check no envoy runtime configuration is left persistent' on envoy hosts
  • 08:39 jayme: disabled puppet on 'P{R:Package = envoyproxy}'
  • 08:19 godog: upgrade prometheus to 2.24.1+ds-1+wmf2 on cloudmetrics*
  • 08:03 hashar: Stopping Jenkins and Zuul for server switch over
  • 08:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on contint2002.wikimedia.org with reason: Switch contint hosts for hardware replacement
  • 08:01 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on contint2002.wikimedia.org with reason: Switch contint hosts for hardware replacement
  • 08:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on contint2001.wikimedia.org with reason: Switch contint hosts for hardware replacement
  • 08:01 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on contint2001.wikimedia.org with reason: Switch contint hosts for hardware replacement
  • 07:55 kart_: Updated MinT to 2023-07-10-051738-production (T341335, T333969)
  • 07:54 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 07:49 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 07:47 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 07:42 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 07:38 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 07:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 07:36 moritzm: failover broken ganeti2014 node
  • 07:28 moritzm: powercycle ganeti2014
  • 07:22 moritzm: installing libxpm security updates
  • 07:08 moritzm: rebalance ganeti in drmrs after reboots
  • 06:59 elukey: restart kube-apiserver on ml-serve-ctrl1* as attempt to resolve spikes in latencies
  • 06:36 moritzm: rebalance ganeti group eqiad/B after reboots
  • 05:24 rzl: imported otelcol-contrib 0.81.0 to buster-wikimedia and bullseye-wikimedia in component thirdparty/otelcol-contrib
  • 04:34 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 02:05 mutante: LDAP - added urbanecm to wmf group, removed from nda group (conversion volunteer to staff) T341443

2023-07-10

  • 23:11 Krinkle: krinkle@xhgui1001$ Define new `xhgui.watches` table via xhguiadmin@m2-master.eqiad.wmnet database, ref T341499
  • 22:12 maryum: Deployed security patch for T340200
  • 21:42 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 21:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 21:37 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 52s)
  • 21:36 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 20:46 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 20:43 TheresNoTime: close UTC late backport window
  • 20:42 samtar@deploy1002: Finished scap: Backport for Revert "log additional events on Special:Diff|MobileDiff" (duration: 07m 27s)
  • 20:42 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 20:36 samtar@deploy1002: samtar: Backport for Revert "log additional events on Special:Diff|MobileDiff" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:34 samtar@deploy1002: Started scap: Backport for Revert "log additional events on Special:Diff|MobileDiff"
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49544 and previous config saved to /var/cache/conftool/dbconfig/20230710-202536-ladsgroup.json
  • 20:23 samtar@deploy1002: Finished scap: Backport for log additional events on Special:Diff|MobileDiff (T326212) (duration: 21m 42s)
  • 20:23 inflatador: bking@wdqs1006 Restart wdqs-blazegraph to hopefully clear the free allocators alerts
  • 20:19 TheresNoTime: syncing https://gerrit.wikimedia.org/r/c/936748 untested (T326212) for test after sync
  • 20:14 mutante: miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/static-tendril
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49541 and previous config saved to /var/cache/conftool/dbconfig/20230710-201031-ladsgroup.json
  • 20:07 eileen: civicrm upgraded from 0ddd1a51 to 7caf5274
  • 20:03 samtar@deploy1002: samtar and jsn: Backport for log additional events on Special:Diff|MobileDiff (T326212) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:02 samtar@deploy1002: Started scap: Backport for log additional events on Special:Diff|MobileDiff (T326212)
  • 20:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1124.eqiad.wmnet with reason: Reboot
  • 19:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1124.eqiad.wmnet with reason: Reboot
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49540 and previous config saved to /var/cache/conftool/dbconfig/20230710-195527-ladsgroup.json
  • 19:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49538 and previous config saved to /var/cache/conftool/dbconfig/20230710-194022-ladsgroup.json
  • 19:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 19:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2112 T341511', diff saved to https://phabricator.wikimedia.org/P49537 and previous config saved to /var/cache/conftool/dbconfig/20230710-191511-ladsgroup.json
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2103 to s1 primary T341511', diff saved to https://phabricator.wikimedia.org/P49536 and previous config saved to /var/cache/conftool/dbconfig/20230710-191259-ladsgroup.json
  • 19:12 Amir1: Starting s1 codfw failover from db2112 to db2103 - T341511
  • 18:59 sukhe: running authdns-update
  • 18:57 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbproxy1012.eqiad.wmnet
  • 18:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:56 sukhe: finished commissionioning new DNS hosts in eqiad: dns100[4-6]. decomissioned dns100[1-3].
  • 18:55 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 18:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1012.eqiad.wmnet
  • 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns[1002-1003].wikimedia.org
  • 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns[1002-1003].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:49 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbproxy1012.eqiad.wmnet
  • 18:49 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:49 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns[1002-1003].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:46 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2103 with weight 0 T341511', diff saved to https://phabricator.wikimedia.org/P49535 and previous config saved to /var/cache/conftool/dbconfig/20230710-184521-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: Primary switchover s1 T341511
  • 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: Primary switchover s1 T341511
  • 18:43 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 18:38 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns[1002-1003].wikimedia.org
  • 18:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1012.eqiad.wmnet
  • 18:32 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 18:31 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 18:29 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 18:26 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 18:03 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:55 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:51 sukhe: homer "mr*" commit "update ntp_servers (remove dns100[2-3], add dns100[5-6])"
  • 17:26 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:24 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:52 sukhe: rolling restart of ntp.service on A:dns-rec
  • 16:44 sukhe: homer "cr*-eqiad*" commit "Gerrit: 936757 remove DNS hosts dns1002 and dns1003"
  • 16:26 sukhe: ns0: set routing-options static route 208.80.154.238/32 next-hop [ 208.80.154.6 208.80.154.153 208.80.154.77 ]
  • 16:26 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@8fa416b]: T328276: Change articletopic source to the outlink model (duration: 00m 20s)
  • 16:25 ebernhardson@deploy1002: Started deploy [airflow-dags/search@8fa416b]: T328276: Change articletopic source to the outlink model
  • 16:07 taavi@deploy1002: Finished scap: Backport for wikitech: Update codfw1dev LDAP server hostname, Disable UrlShortener on wikitech (T341470) (duration: 07m 47s)
  • 16:00 taavi@deploy1002: taavi: Backport for wikitech: Update codfw1dev LDAP server hostname, Disable UrlShortener on wikitech (T341470) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 15:59 taavi@deploy1002: Started scap: Backport for wikitech: Update codfw1dev LDAP server hostname, Disable UrlShortener on wikitech (T341470)
  • 15:53 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 30s)
  • 15:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 31s)
  • 15:30 sukhe: homer "cr*-eqiad*" commit "Gerrit: 936720 add new DNS host dns1006"
  • 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
  • 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
  • 15:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1006.wikimedia.org with OS bullseye
  • 15:25 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:23 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
  • 15:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
  • 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
  • 15:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1006.wikimedia.org with reason: host reimage
  • 15:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1006.wikimedia.org with reason: host reimage
  • 15:00 moritzm: rebalance ganeti group eqiad/A after reboots
  • 14:57 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:57 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
  • 14:51 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:51 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:49 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1006.wikimedia.org with OS bullseye
  • 14:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:46 tchin@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:33 fabfur: add new dns host dns1005
  • 14:28 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:28 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:27 sukhe@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "running manually for dns1005 - sukhe@cumin1001"
  • 14:26 sukhe@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "running manually for dns1005 - sukhe@cumin1001"
  • 14:23 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1005.wikimedia.org with OS bullseye
  • 14:23 sukhe@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1001"
  • 14:22 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:22 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:22 sukhe@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1001"
  • 14:19 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:19 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:15 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:15 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:10 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:10 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2002.wikimedia.org
  • 14:02 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1005.wikimedia.org with reason: host reimage
  • 14:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
  • 14:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
  • 14:01 ladsgroup@deploy1002: Finished scap: Backport for Set commons to READ_NEW for externallinks migration (T335343) (duration: 09m 22s)
  • 13:59 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1005.wikimedia.org with reason: host reimage
  • 13:58 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint2002.wikimedia.org
  • 13:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
  • 13:53 ladsgroup@deploy1002: ladsgroup: Backport for Set commons to READ_NEW for externallinks migration (T335343) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:52 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host karapace1002.eqiad.wmnet
  • 13:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host karapace1002.eqiad.wmnet with OS bullseye
  • 13:51 ladsgroup@deploy1002: Started scap: Backport for Set commons to READ_NEW for externallinks migration (T335343)
  • 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host dns1005.wikimedia.org with OS bullseye
  • 13:46 ladsgroup@deploy1002: Finished scap: Backport for ExternalLinks: Make order by and continue only rely on el_id in READ NEW (T341000 T47237) (duration: 11m 03s)
  • 13:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
  • 13:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on karapace1002.eqiad.wmnet with reason: host reimage
  • 13:36 ladsgroup@deploy1002: ladsgroup: Backport for ExternalLinks: Make order by and continue only rely on el_id in READ NEW (T341000 T47237) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:36 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on karapace1002.eqiad.wmnet with reason: host reimage
  • 13:35 ladsgroup@deploy1002: Started scap: Backport for ExternalLinks: Make order by and continue only rely on el_id in READ NEW (T341000 T47237)
  • 13:28 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wdqs2020.codfw.wmnet
  • 13:28 ladsgroup@deploy1002: Finished scap: Backport for ores extension: deploy LiftWing usage on testwiki (T319170) (duration: 09m 02s)
  • 13:27 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 13:27 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 13:27 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host karapace1002.eqiad.wmnet with OS bullseye
  • 13:22 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM karapace1002.eqiad.wmnet - btullis@cumin1001"
  • 13:21 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM karapace1002.eqiad.wmnet - btullis@cumin1001"
  • 13:21 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) karapace1002.eqiad.wmnet on all recursors
  • 13:21 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache karapace1002.eqiad.wmnet on all recursors
  • 13:21 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:21 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM karapace1002.eqiad.wmnet - btullis@cumin1001"
  • 13:20 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM karapace1002.eqiad.wmnet - btullis@cumin1001"
  • 13:20 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores extension: deploy LiftWing usage on testwiki (T319170) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:19 ladsgroup@deploy1002: Started scap: Backport for ores extension: deploy LiftWing usage on testwiki (T319170)
  • 13:16 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on group2 wikis (T314318) (duration: 10m 26s)
  • 13:16 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 13:16 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host karapace1002.eqiad.wmnet
  • 13:07 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and arlolra: Backport for Disable wgParserEnableLegacyMediaDOM on group2 wikis (T314318) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on group2 wikis (T314318)
  • 12:34 claime: Running puppet on cp-text hosts - T341463
  • 12:33 claime: Sending 1% of global traffic to mw-on-k8s - T341463
  • 12:04 moritzm: failover ganeti masters in drmrs
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 11:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 11:55 moritzm: installing avahi security updates
  • 11:52 vgutierrez: repool cp2037 (debugging finished) - T320967
  • 11:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
  • 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
  • 11:34 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 11:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
  • 11:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 28 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 11:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 28 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 11:14 moritzm: remove unused VM netflow6002 T330884
  • 11:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti6003.drmrs.wmnet
  • 11:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
  • 10:55 moritzm: failover ganeti master in eqiad to ganeti1029
  • 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
  • 10:50 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 10:50 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 10:49 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb1002.eqiad.wmnet with OS bullseye
  • 10:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
  • 10:45 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:44 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 10:44 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 10:43 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp2037.codfw.wmnet
  • 10:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp2037.codfw.wmnet
  • 10:34 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb1002.eqiad.wmnet with reason: host reimage
  • 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
  • 10:25 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb1002.eqiad.wmnet with reason: host reimage
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
  • 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
  • 10:13 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1002.eqiad.wmnet with OS bullseye
  • 10:12 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,name=parse1012.*
  • 10:12 claime: repooling parse1012.eqiad.wmnet
  • 10:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 10:11 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb1002.eqiad.wmnet with OS bullseye
  • 10:05 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1002.eqiad.wmnet with OS bullseye
  • 10:03 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 10:03 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin1001"
  • 10:02 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin1001"
  • 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1033.eqiad.wmnet
  • 09:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 09:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1033.eqiad.wmnet
  • 09:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2037.codfw.wmnet with reason: vgutierrez debugging
  • 09:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2037.codfw.wmnet with reason: vgutierrez debugging
  • 09:44 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb1002.private.eqiad.wikimedia.cloud on all recursors
  • 09:44 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache cloudlb1002.private.eqiad.wikimedia.cloud on all recursors
  • 09:39 moritzm: rebalance ganeti group codfw/B after reboots
  • 09:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb1001.eqiad.wmnet with reason: host reimage
  • 09:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 09:35 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb1001.eqiad.wmnet with reason: host reimage
  • 09:35 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:35 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb1002 - aborrero@cumin1001"
  • 09:33 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb1002 - aborrero@cumin1001"
  • 09:31 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
  • 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
  • 09:29 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:29 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb1001 - aborrero@cumin1001"
  • 09:28 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb1001 - aborrero@cumin1001"
  • 09:25 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 09:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
  • 09:23 moritzm: rebalance ganeti group codfw/A after reboots
  • 09:14 vgutierrez: depool cp2037 (debugging ATS cacheability issues) - T320967
  • 09:12 moritzm: restarting mw canaries to pick up libxpm security update
  • 09:08 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-eqiad
  • 09:07 moritzm: installing cups security updates (libs only)
  • 09:06 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 09:04 moritzm: rebalance ganeti clusters in esams/ulsfo/eqsin following reboots
  • 09:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
  • 08:58 lucaswerkmeister-wmde:: Deployed security patch for T340220
  • 08:57 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-eqiad
  • 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-codfw
  • 08:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
  • 08:48 moritzm: installing libxpm security updates
  • 08:47 kart_: Updated cxserver to 2023-07-10-065135-production (T337719, T340989)
  • 08:46 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 08:45 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 08:44 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-codfw
  • 08:41 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 08:40 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 08:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
  • 08:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 08:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 08:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
  • 08:24 claime: Running puppet on cp-text hosts - T337489
  • 08:11 hashar: UTC morning backport window completed.
  • 08:11 hashar@deploy1002: Finished scap: Backport for Deploy action blocks on bnwiki (T340904) (duration: 08m 15s)
  • 08:04 moritzm: installing c-ares security updates on buster
  • 08:04 hashar@deploy1002: hashar and mdsshakil: Backport for Deploy action blocks on bnwiki (T340904) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:03 hashar@deploy1002: Started scap: Backport for Deploy action blocks on bnwiki (T340904)
  • 08:02 hashar@deploy1002: Finished scap: Backport for thwiki: Update logos from commons (T341407) (duration: 25m 32s)
  • 08:00 moritzm: installing flask security updates on bullseye
  • 07:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 07:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 07:45 hashar@deploy1002: func and hashar: Backport for thwiki: Update logos from commons (T341407) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:36 hashar@deploy1002: Started scap: Backport for thwiki: Update logos from commons (T341407)
  • 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
  • 07:30 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 07:30 moritzm: installing libgstreamer-plugins-base1.0-0 security updates
  • 07:29 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 07:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
  • 07:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
  • 07:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
  • 07:22 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 07:21 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 07:21 hashar: deploy1002: removed empty untracked directory from MediaWiki staging area: `rmdir /srv/mediawiki-staging/wmf-config/scap/log/ && rmdir /srv/mediawiki-staging/wmf-config/scap/` | T341292
  • 07:20 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 07:20 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 07:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
  • 07:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
  • 07:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 06:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
  • 06:43 godog: add 100G to prometheus/k8s in codfw
  • 01:06 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 01:06 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply

2023-07-09

  • 14:51 apergos: swapped dumpsdata1003 in as the new nfs share for misc dumps; dumpsdata1002 is now a spare, to be decommissioned. 1003 is running bullseye.
  • 04:04 apergos: rsync misc dumps output files from dumpsdata1002 to 1003, in ariel screen session on 1003, bwlimit to 1G

2023-07-08

  • 03:21 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply

2023-07-07

  • 22:55 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 22:55 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 22:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 22:21 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 22:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1156.eqiad.wmnet with OS bullseye
  • 21:59 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 21:24 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 57s)
  • 21:23 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 21:23 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:53 dwisehaupt@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:53 dwisehaupt@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"
  • 20:52 dwisehaupt@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"
  • 20:50 dwisehaupt@cumin1001: START - Cookbook sre.dns.netbox
  • 20:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1156.eqiad.wmnet with OS bullseye
  • 19:33 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:33 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 19:32 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:12 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 18:08 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 17:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 17:57 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 17:56 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 17:40 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:44 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 16:38 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wdqs2020.codfw.wmnet
  • 16:23 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:20 hashar: Restarting CI Jenkins due to a confusion in the next build number leading to intermittent 404 when browsing console links | T341348
  • 16:00 bking@cumin1001: conftool action : set/pooled=no; selector: name=wdqs2020.codfw.wmnet
  • 15:53 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:51 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wdqs2020.codfw.wmnet
  • 15:50 bking@cumin1001: conftool action : set/weight=10; selector: name=wdqs2020.codfw.wmnet
  • 15:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:47 aborrero@cumin1001: START - Cookbook sre.hosts.provision for host cloudlb1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:46 bking@cumin1001: conftool action : set/pooled=yes; selector: service=(wdqs|wdqs-ssl|wdqs-heavy-queries),name=wdqs2020.codfw.wmnet
  • 15:45 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:43 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 15:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:05 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 50s)
  • 15:04 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 14:58 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 49s)
  • 14:57 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 14:57 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 14:50 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
  • 14:50 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 14:49 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
  • 14:49 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 14:47 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:26 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 13:59 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 07s)
  • 13:59 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 13:58 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
  • 13:58 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 12:50 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 12:17 hashar: Re-enabled zuul-merger on contint2001 and removed the Icinga maintenance window
  • 12:02 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:02 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
  • 12:01 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
  • 11:58 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:48 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:48 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
  • 11:47 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
  • 11:45 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:42 hashar: Enabled zuul-merger contint1002, disabled it on contint2001 and marked that host as under maintenance in Icinga for the next two hours
  • 11:27 hashar: Stopped zuul-merger contint1002
  • 11:17 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
  • 11:04 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
  • 11:02 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 10:13 moritzm: rebooting puppetdb1003
  • 10:09 moritzm: rebooting puppetserver1001
  • 10:06 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb2003.codfw.wmnet
  • 10:05 moritzm: rebooting puppetserver2001
  • 10:05 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 10:03 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
  • 09:52 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host debmonitor2003.codfw.wmnet
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet
  • 09:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet
  • 09:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
  • 09:45 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
  • 09:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
  • 09:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
  • 09:34 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host lists1003.wikimedia.org
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
  • 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
  • 09:29 stevemunene@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
  • 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
  • 09:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1003.wikimedia.org
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1004.eqiad.wmnet
  • 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1004.eqiad.wmnet
  • 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
  • 09:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2003.codfw.wmnet
  • 09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2003.codfw.wmnet
  • 09:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
  • 08:53 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 08:50 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:48 moritzm: installing bookworm kernel updates
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: xhgui2002.codfw.wmnet
  • 08:47 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: xhgui2002.codfw.wmnet
  • 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: xhgui1002.eqiad.wmnet
  • 08:46 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: xhgui1002.eqiad.wmnet
  • 08:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-test[1006-1010].eqiad.wmnet with reason: resetting cluster
  • 08:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-test[1006-1010].eqiad.wmnet with reason: resetting cluster
  • 01:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 00:28 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer

2023-07-06

  • 23:14 mutante: mx1001 - rm /usr/local/bin/otrs_aliases ; rm /lib/systemd/system/generate_otrs_aliases.* after deploying gerrit:932316 which renamed script and timer without absenting them
  • 23:08 mutante: mx2001 - rm /usr/local/bin/otrs_aliases ; rm /lib/systemd/system/generate_otrs_aliases.* after deploying gerrit:932316 which renamed script and timer without absenting them
  • 21:12 thcipriani@deploy1002: Finished scap: Clean up font directory gerrit:723652 (duration: 06m 33s)
  • 21:10 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 14m 56s)
  • 21:06 thcipriani@deploy1002: Started scap: Clean up font directory gerrit:723652
  • 21:04 thcipriani@deploy1002: Finished scap: Backport for pawikibooks: Install Quiz extension (T340613) (duration: 12m 19s)
  • 20:55 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 20:54 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
  • 20:54 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 20:53 thcipriani@deploy1002: stang and thcipriani: Backport for pawikibooks: Install Quiz extension (T340613) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:51 thcipriani@deploy1002: Started scap: Backport for pawikibooks: Install Quiz extension (T340613)
  • 20:48 thcipriani@deploy1002: Finished scap: Backport for Update more logos with available SVGs (T338162) (duration: 12m 41s)
  • 20:37 thcipriani@deploy1002: jdlrobson and thcipriani: Backport for Update more logos with available SVGs (T338162) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:35 thcipriani@deploy1002: Started scap: Backport for Update more logos with available SVGs (T338162)
  • 20:16 thcipriani@deploy1002: Finished scap: Backport for Disable purging of old client hint data by default (T340959 T341076) (duration: 10m 08s)
  • 20:07 thcipriani@deploy1002: thcipriani and dreamyjazz: Backport for Disable purging of old client hint data by default (T340959 T341076) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:06 thcipriani@deploy1002: Started scap: Backport for Disable purging of old client hint data by default (T340959 T341076)
  • 19:24 urbanecm@deploy1002: Finished scap: Backport for PageView: Fix base URL when using service proxy (T341191) (duration: 07m 16s)
  • 19:17 urbanecm@deploy1002: Started scap: Backport for PageView: Fix base URL when using service proxy (T341191)
  • 19:06 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:03 urbanecm@deploy1002: Finished scap: Backport for PageView: Route requests through restbase service proxy (T341191) (duration: 07m 27s)
  • 18:57 urbanecm@deploy1002: urbanecm: Backport for PageView: Route requests through restbase service proxy (T341191) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 18:56 urbanecm@deploy1002: Started scap: Backport for PageView: Route requests through restbase service proxy (T341191)
  • 17:33 cstone: tools upgraded from 2ca83336 to 10972e59
  • 17:24 sukhe: sudo cumin -b1 -s300 'A:dns-rec' 'systemctl restart ntp.service'
  • 17:17 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:15 sukhe: homer "mr*" commit "update ntp_servers (add dns1004, remove dns1001)"
  • 17:07 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:06 cstone: SmashPig upgraded from db23b998 to 95181a1b
  • 17:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns1001.wikimedia.org
  • 17:04 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:04 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:02 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:00 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:58 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 16:58 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 16:54 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns1001.wikimedia.org
  • 16:49 sukhe: sudo cumin A:netbox 'run-puppet-agent': removing dns1001 before decomm cookbook
  • 16:44 sukhe: homer "cr*-eqiad*" commit "decommission DNS host dns1001 (replaced by dns1004)"
  • 16:31 sukhe: ns0: set routing-options static route 208.80.154.238/32 next-hop [ 208.80.154.6 208.80.155.108 208.80.154.134 ]
  • 16:30 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 16:30 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 16:16 sukhe: homer "cr*-eqiad*" commit "Gerrit: 933917 add new DNS host dns1004"
  • 16:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 16:11 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 15:54 elukey: changeprop's kafka linger.ms set to 20s - T338357 (was 5ms, now changeprop waits a bit more to batch messages to send to kafka in one go)
  • 15:53 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 15:53 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 15:47 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 15:47 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 15:45 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 15:45 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 15:36 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 15:36 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 15:35 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:29 sukhe: restart ntp.service on A:dns-rec
  • 15:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:25 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 15:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1004.wikimedia.org with OS bullseye
  • 15:21 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:20 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:16 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1004.wikimedia.org with reason: host reimage
  • 14:58 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1004.wikimedia.org with reason: host reimage
  • 14:55 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:54 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 14:51 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: host reimage
  • 14:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1069.eqiad.wmnet
  • 14:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1069.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 14:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: host reimage
  • 14:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bullseye
  • 14:45 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1004.wikimedia.org with OS bullseye
  • 14:42 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1069.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 14:37 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 14:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 14:35 hnowlan: reenabling puppet on A:cp
  • 14:31 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1069.eqiad.wmnet
  • 14:29 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1068.eqiad.wmnet
  • 14:29 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:29 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1068.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 14:28 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 14:27 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1068.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 14:27 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 14:25 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 14:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:22 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:22 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:20 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1068.eqiad.wmnet
  • 14:19 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1001
  • 14:19 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001
  • 14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1067.eqiad.wmnet
  • 14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 14:16 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:15 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:14 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 14:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bullseye
  • 14:13 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1004.wikimedia.org with OS bullseye
  • 14:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:12 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 14:09 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:06 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1067.eqiad.wmnet
  • 14:05 hnowlan: disabling puppet on A:cp-text to test 935464
  • 14:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 14:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bullseye
  • 14:02 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1004.wikimedia.org with OS bullseye
  • 14:01 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bullseye
  • 13:55 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:54 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudlb1001.eqiad.wmnet
  • 13:42 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:38 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudlb1001.eqiad.wmnet
  • 13:34 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zookeeper-test1002.eqiad.wmnet with OS bookworm
  • 13:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-test-worker1003.eqiad.wmnet
  • 13:30 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1066.eqiad.wmnet
  • 13:30 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:30 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1066.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 13:29 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:29 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host an-test-worker1003.eqiad.wmnet
  • 13:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-test-worker1003.eqiad.wmnet
  • 13:24 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1066.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 13:22 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 13:18 urbanecm@deploy1002: Finished scap: Backport for Enable global abuse filters on almost all projects (T341159) (duration: 10m 07s)
  • 13:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1095.eqiad.wmnet with reason: Replacing RAID controller battery
  • 13:17 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker1095.eqiad.wmnet with reason: Replacing RAID controller battery
  • 13:14 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1066.eqiad.wmnet
  • 13:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1065.eqiad.wmnet
  • 13:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1065.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 13:10 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
  • 13:10 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1065.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 13:10 urbanecm@deploy1002: urbanecm: Backport for Enable global abuse filters on almost all projects (T341159) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:08 urbanecm@deploy1002: Started scap: Backport for Enable global abuse filters on almost all projects (T341159)
  • 13:08 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 13:02 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1065.eqiad.wmnet
  • 13:00 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 12:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zookeeper-test1002.eqiad.wmnet with reason: host reimage
  • 12:58 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1064.eqiad.wmnet
  • 12:58 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:58 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1064.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 12:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on zookeeper-test1002.eqiad.wmnet with reason: host reimage
  • 12:43 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1064.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 12:42 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host zookeeper-test1002.eqiad.wmnet with OS bookworm
  • 12:40 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 12:35 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1064.eqiad.wmnet
  • 12:32 samtar@deploy1002: Finished scap: Backport for Revert "Add tag when reference added to the page" (T341202) (duration: 24m 04s)
  • 12:21 samtar@deploy1002: matmarex and samtar: Backport for Revert "Add tag when reference added to the page" (T341202) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 12:15 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host zookeeper-test1002.eqiad.wmnet with OS bookworm
  • 12:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host zookeeper-test1002.eqiad.wmnet with OS bookworm
  • 12:08 samtar@deploy1002: Started scap: Backport for Revert "Add tag when reference added to the page" (T341202)
  • 11:56 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 11:56 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:56 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts analytics1063.eqiad.wmnet
  • 11:56 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:56 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1002
  • 11:56 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1002
  • 11:56 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:56 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:55 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:55 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 11:54 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb1001.eqiad.wmnet on all recursors
  • 11:54 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudlb1001.eqiad.wmnet on all recursors
  • 11:53 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:53 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:52 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:50 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:50 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1063.eqiad.wmnet
  • 11:50 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb1001.eqiad.wmnet on all recursors
  • 11:50 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudlb1001.eqiad.wmnet on all recursors
  • 11:49 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 11:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 11:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 11:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 11:48 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1001
  • 11:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 11:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 11:48 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001
  • 11:47 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts analytics1063.eqiad.wmnet
  • 11:47 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:46 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 11:43 aborrero@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb1001
  • 11:42 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001
  • 11:41 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1063.eqiad.wmnet
  • 11:41 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts analytics1063.eqiad.wmnet
  • 11:41 stevemunene@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:39 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:39 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:38 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:35 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Beta-Wikidata: Always show mul on desktop Termbox (T339104) (duration: 07m 37s)
  • 11:30 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudswift1002.eqiad.wmnet
  • 11:30 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:30 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudswift1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
  • 11:29 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudswift1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
  • 11:27 lucaswerkmeister-wmde@deploy1002: migr and lucaswerkmeister-wmde: Backport for Beta-Wikidata: Always show mul on desktop Termbox (T339104) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 11:27 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
  • 11:27 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:26 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Beta-Wikidata: Always show mul on desktop Termbox (T339104)
  • 11:25 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for foundationwiki: Enable WikibaseClient (T321967) (duration: 08m 58s)
  • 11:24 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 11:24 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudswift1001.eqiad.wmnet
  • 11:24 aborrero@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:23 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudswift1002.eqiad.wmnet
  • 11:22 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:19 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1063.eqiad.wmnet
  • 11:18 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 11:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 11:17 lucaswerkmeister-wmde@deploy1002: varnent and lucaswerkmeister-wmde: Backport for foundationwiki: Enable WikibaseClient (T321967) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 11:16 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for foundationwiki: Enable WikibaseClient (T321967)
  • 11:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 11:14 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 11:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for outreachwiki: Set wmgWikibaseSiteGroup (duration: 07m 35s)
  • 11:12 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 11:12 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 11:10 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudswift1001.eqiad.wmnet
  • 11:07 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for outreachwiki: Set wmgWikibaseSiteGroup synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 11:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for outreachwiki: Set wmgWikibaseSiteGroup
  • 11:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1062.eqiad.wmnet
  • 11:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1062.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 11:04 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:03 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on A:swift-fe
  • 10:58 taavi@deploy1002: Finished scap: Backport for extdist: REL1_40 is stable, REL1_38 is EOL (duration: 08m 21s)
  • 10:54 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1062.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 10:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:51 taavi@deploy1002: taavi: Backport for extdist: REL1_40 is stable, REL1_38 is EOL synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 10:49 taavi@deploy1002: Started scap: Backport for extdist: REL1_40 is stable, REL1_38 is EOL
  • 10:47 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 10:41 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1062.eqiad.wmnet
  • 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1061.eqiad.wmnet
  • 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1061.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 10:08 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1061.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 10:05 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 09:58 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1061.eqiad.wmnet
  • 09:35 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on A:swift-fe
  • 09:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:11 elukey: restart kube-apiserver on ml-serve-ctrl2* as attempt to fix LIST-related latency issues
  • 09:10 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.16 refs T340244
  • 08:55 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 08:55 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 08:51 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 08:50 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 08:49 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 08:49 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 08:45 fabfur: reenabled puppet on cp1075.eqiad.wmnet, cp2027.codfw.wmnet, cp3050.esams.wmnet
  • 08:39 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
  • 08:17 fabfur: disabling puppet temporary on cp1075.eqiad.wmnet, cp2027.codfw.wmnet, cp3050.esams.wmnet to apply 935760 (T340983)
  • 08:03 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade
  • 07:31 kart_: Updated MinT to 2023-07-06-051402-production
  • 07:29 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
  • 07:29 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 07:25 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 07:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 07:17 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 07:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 07:09 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 07:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 9 hosts with reason: Stopping puppet and hadoop-hdfs-datanode services then decommissioning the hosts
  • 07:04 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 9 hosts with reason: Stopping puppet and hadoop-hdfs-datanode services then decommissioning the hosts
  • 06:54 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade
  • 02:17 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 02:16 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 02:06 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 02:05 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 02:05 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 02:05 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 00:22 eileen: civicrm upgraded from 4ca2008d to 0ddd1a51
  • 00:03 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 00:02 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply

2023-07-05

  • 22:52 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:38 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host an-test-worker1003.eqiad.wmnet
  • 22:36 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts xhgui2002
  • 22:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:36 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 22:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-test-worker1003.eqiad.wmnet
  • 22:35 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 22:33 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 22:28 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts xhgui2002
  • 22:28 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts xhgui1002
  • 22:28 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:28 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui1002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 22:27 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui1002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 22:24 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 22:23 mutante: registry1003 - sudo systemctl start build-hompage
  • 22:17 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts xhgui1002
  • 21:23 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:04 urbanecm@deploy1002: Finished scap: Backport for Optimize SVG wordmarks, enable Wikimania wordmark, fix techconduct (T338162) (duration: 08m 22s)
  • 20:57 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Optimize SVG wordmarks, enable Wikimania wordmark, fix techconduct (T338162) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:55 urbanecm@deploy1002: Started scap: Backport for Optimize SVG wordmarks, enable Wikimania wordmark, fix techconduct (T338162)
  • 20:55 urbanecm@deploy1002: Finished scap: Backport for Update various logos where SVGs are available (T338162) (duration: 11m 10s)
  • 20:45 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Update various logos where SVGs are available (T338162) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:44 urbanecm@deploy1002: Started scap: Backport for Update various logos where SVGs are available (T338162)
  • 20:31 urbanecm@deploy1002: Finished scap: Backport for Add language button at the top of the Main page of Italian Wikivoyage (T337666) (duration: 12m 38s)
  • 20:20 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Add language button at the top of the Main page of Italian Wikivoyage (T337666) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:19 urbanecm@deploy1002: Started scap: Backport for Add language button at the top of the Main page of Italian Wikivoyage (T337666)
  • 20:17 urbanecm@deploy1002: Finished scap: Backport for Disable the Nearby feature on some sister projects (T341133) (duration: 13m 12s)
  • 20:05 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Disable the Nearby feature on some sister projects (T341133) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:04 urbanecm@deploy1002: Started scap: Backport for Disable the Nearby feature on some sister projects (T341133)
  • 18:36 sukhe: re-enable puppet in A:dns-rec to finish merging CR 933497 and run-agent: T340479
  • 18:25 denisse: disable puppet on webperf1003 to test PHP memory changes for XHGui
  • 18:25 denisse: disable puppet on webperf1003
  • 18:20 sukhe: disable puppet on A:dns-rec to merge CR 933497
  • 17:40 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
  • 17:07 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 17:07 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 17:03 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 17:02 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 16:33 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
  • 16:25 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host an-test-worker1003.eqiad.wmnet
  • 16:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-test-worker1003.eqiad.wmnet
  • 16:06 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
  • 15:55 fabfur: re-enabled puppet in all cp- hosts (done @2023-07-05 14:22:57 UTC)
  • 15:38 mlitn@deploy1002: Finished deploy [airflow-dags/platform_eng@a97da10]: (no justification provided) (duration: 00m 25s)
  • 15:38 mlitn@deploy1002: Started deploy [airflow-dags/platform_eng@a97da10]: (no justification provided)
  • 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
  • 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 15:26 sukhe: reprepro -C component/dnsdist include bullseye-wikimedia dnsdist_1.8.0-1+wmf11u1_amd64.changes
  • 15:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
  • 15:23 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
  • 15:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1024.eqiad.wmnet
  • 15:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 15:05 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host an-test-worker1003.eqiad.wmnet
  • 15:00 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 14:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
  • 14:48 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 14:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet
  • 14:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet
  • 14:42 sukhe: re-enable puppet and start pybal on lvs2013
  • 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
  • 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
  • 14:38 vgutierrez: pool cdn service in cp2027.codfw.wmnet,cp1075.eqiad.wmnet,cp3050.esams.wmnet
  • 14:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
  • 14:29 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:27 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:27 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:24 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:24 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:22 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:22 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:19 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:18 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 14:18 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:18 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:18 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 14:18 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:17 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 14:17 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:16 vgutierrez: depool cdn service in cp2027.codfw.wmnet,cp1075.eqiad.wmnet,cp3050.esams.wmnet
  • 14:16 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:16 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 14:15 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:14 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:11 fabfur: disabling puppet in all cp- hosts for error in configuration
  • 14:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
  • 14:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
  • 14:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
  • 14:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
  • 13:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
  • 13:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1021.eqiad.wmnet
  • 13:55 elukey: expand kafka topic partitions from 1 to 5 for {codfw,eqiad}.mediawiki.job.RecordLintJob and {eqiad,codfw}.mediawiki.job.refreshLinks on kafka-main eqiad/codfw - T338357
  • 13:53 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable Extension:RelatedArticles for desktop on frwikinews (T341105) (duration: 08m 54s)
  • 13:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: mgmt interface issues
  • 13:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: mgmt interface issues
  • 13:45 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and anzx: Backport for Enable Extension:RelatedArticles for desktop on frwikinews (T341105) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:44 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable Extension:RelatedArticles for desktop on frwikinews (T341105)
  • 13:41 sukhe: disable puppet and stop pybal on lvs2013: T340960
  • 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
  • 13:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
  • 13:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
  • 13:02 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:54 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1019.eqiad.wmnet
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1019.eqiad.wmnet
  • 12:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
  • 12:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1019.eqiad.wmnet
  • 12:34 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudsw-b1.private.codfw.wikimedia.cloud on all recursors
  • 12:34 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudsw-b1.private.codfw.wikimedia.cloud on all recursors
  • 12:31 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:31 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-b1 codfw - aborrero@cumin2002"
  • 12:30 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-b1 codfw - aborrero@cumin2002"
  • 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet
  • 12:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
  • 12:28 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 12:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
  • 12:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts urldownloader1001.wikimedia.org
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:58 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts urldownloader1001.wikimedia.org
  • 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts urldownloader1002.wikimedia.org
  • 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:47 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:47 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet
  • 11:45 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:45 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:39 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 11:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts urldownloader1002.wikimedia.org
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts urldownloader2002.wikimedia.org
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 11:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
  • 11:25 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts urldownloader2002.wikimedia.org
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts urldownloader2001.wikimedia.org
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
  • 11:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:06 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:01 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 11:00 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe
  • 11:00 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts urldownloader2001.wikimedia.org
  • 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet
  • 10:52 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe
  • 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet
  • 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
  • 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
  • 10:41 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 10:40 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 10:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet
  • 10:22 godog: restore US business hours escalation - T340763
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
  • 10:14 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2003.wikimedia.org
  • 10:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
  • 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet
  • 10:05 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
  • 09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw - aborrero@cumin1001"
  • 09:47 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw - aborrero@cumin1001"
  • 09:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet
  • 09:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
  • 09:43 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 09:41 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:41 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw - aborrero@cumin1001"
  • 09:41 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:41 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:40 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw - aborrero@cumin1001"
  • 09:39 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:39 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
  • 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
  • 09:38 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 09:37 claime: running puppet on 'A:cp-text and P:trafficserver::backend' - T341078
  • 09:36 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:36 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:35 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:35 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:31 claime: Sending 0.5% of global traffic to mw-on-k8s - T341078
  • 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
  • 09:29 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow1002.eqiad.wmnet to drbd
  • 09:27 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:27 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:26 cgoubert@deploy1002: Finished scap: (no justification provided) (duration: 02m 19s)
  • 09:25 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:24 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:24 claime: redeploy mw-on-k8s following quota update - T341114
  • 09:24 cgoubert@deploy1002: Started scap: (no justification provided)
  • 09:23 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:22 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:22 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:21 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:21 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:19 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:19 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:18 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:18 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:17 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow1002.eqiad.wmnet to drbd
  • 09:15 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet
  • 09:11 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:10 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:10 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:10 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:09 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:09 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:08 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:08 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:08 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:08 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:07 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 09:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 09:07 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:07 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:07 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 09:06 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:06 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:04 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:04 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:03 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 09:02 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:02 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
  • 09:01 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
  • 09:01 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
  • 08:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
  • 08:53 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:52 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 08:52 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:45 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:45 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:44 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.16 refs T340244
  • 08:40 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:40 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:39 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:34 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 08:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
  • 08:26 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
  • 08:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
  • 08:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 08:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 08:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 08:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 08:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet
  • 00:09 zabe@deploy1002: Finished scap: update interwiki cache (duration: 06m 51s)
  • 00:02 zabe@deploy1002: Started scap: update interwiki cache

2023-07-04

  • 23:58 zabe@deploy1002: Finished scap: T335969 (duration: 07m 40s)
  • 23:52 zabe@deploy1002: zabe: T335969 synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 23:50 zabe@deploy1002: Started scap: T335969
  • 23:50 zabe: create Wikipedia Ghanaian Pidgin # T335969
  • 22:57 zabe@deploy1002: Finished scap: Backport for Remove migrateStewards.php reference (duration: 07m 23s)
  • 22:52 zabe@deploy1002: taavi and zabe: Backport for Remove migrateStewards.php reference synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 22:50 zabe@deploy1002: Started scap: Backport for Remove migrateStewards.php reference
  • 22:46 zabe@deploy1002: Finished scap: Backport for Stop setting $wgCommentTempTableSchemaMigrationStage (T299954) (duration: 07m 56s)
  • 22:39 zabe@deploy1002: zabe: Backport for Stop setting $wgCommentTempTableSchemaMigrationStage (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:38 zabe@deploy1002: Started scap: Backport for Stop setting $wgCommentTempTableSchemaMigrationStage (T299954)
  • 19:38 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:38 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:37 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:37 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:29 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:29 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49512 and previous config saved to /var/cache/conftool/dbconfig/20230704-192646-ladsgroup.json
  • 19:23 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:23 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:21 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49511 and previous config saved to /var/cache/conftool/dbconfig/20230704-191142-ladsgroup.json
  • 19:09 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 19:07 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:07 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:01 jgleeson: payments-wiki upgraded from cbc0b454 to d76b9085
  • 18:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49510 and previous config saved to /var/cache/conftool/dbconfig/20230704-185637-ladsgroup.json
  • 18:56 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49509 and previous config saved to /var/cache/conftool/dbconfig/20230704-184132-ladsgroup.json
  • 18:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 18:38 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2165 T339223', diff saved to https://phabricator.wikimedia.org/P49508 and previous config saved to /var/cache/conftool/dbconfig/20230704-183748-ladsgroup.json
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 primary T339223', diff saved to https://phabricator.wikimedia.org/P49507 and previous config saved to /var/cache/conftool/dbconfig/20230704-183434-ladsgroup.json
  • 18:32 Amir1: Starting s8 codfw failover from db2165 to db2161 - T339223
  • 18:31 sukhe: finished running homer for adding fabfur [pushed to all 55 devices successfully]
  • 18:25 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 18:06 sukhe: enable puppet on A:wikidough to roll out CR 863295
  • 18:01 sukhe: disable puppet on A:wikidough to roll out CR 863295
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 T339223', diff saved to https://phabricator.wikimedia.org/P49506 and previous config saved to /var/cache/conftool/dbconfig/20230704-175604-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T339223
  • 17:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T339223
  • 17:36 sukhe: [correction] homer "*" commit "Gerrit: 935479 add fabfur"
  • 17:36 sukhe: homer "*" commit "Gerrit: 935479 add fabur"
  • 16:14 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard-next.discovery.wmnet on all recursors
  • 16:14 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard-next.discovery.wmnet on all recursors
  • 16:14 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet on all recursors
  • 16:14 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet on all recursors
  • 16:03 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet on all recursors
  • 16:03 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet on all recursors
  • 16:03 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetboard,name=codfw
  • 16:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet on all recursors
  • 16:02 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet on all recursors
  • 15:57 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.wikimedia.org on all recursors
  • 15:57 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.wikimedia.org on all recursors
  • 15:56 jbond@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=puppetboard,name=codfw
  • 15:46 Emperor: delete swift container global-data-elastic-backups in AUTH_search account T341081
  • 15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 15:26 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 15:26 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 15:25 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 15:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 15:24 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 15:19 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 15:12 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 15:08 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard-next.wikimedia.org on all recursors
  • 15:08 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard-next.wikimedia.org on all recursors
  • 15:04 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:04 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-c8.private.eqiad.wikimedia.cloud - aborrero@cumin1001"
  • 15:03 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-c8.private.eqiad.wikimedia.cloud - aborrero@cumin1001"
  • 15:01 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 15:01 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-c8.eqiad.codfw.wikimedia.cloud - aborrero@cumin1001"
  • 15:00 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-c8.eqiad.codfw.wikimedia.cloud - aborrero@cumin1001"
  • 14:58 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:58 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:58 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 14:56 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:56 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:53 claime: Deploying encrypted rsync to deployment servers - T289857
  • 14:52 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:52 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:50 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:50 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:49 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:49 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:46 jbond@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=puppetboard-next,name=codfw
  • 14:43 cgoubert@deploy1002: Finished scap: (no justification provided) (duration: 02m 12s)
  • 14:42 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:42 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:41 cgoubert@deploy1002: Started scap: (no justification provided)
  • 14:41 claime: redeploying mw-on-k8s
  • 14:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:40 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:40 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:39 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:38 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:38 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:37 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:36 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:36 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:33 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:33 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:32 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 14:31 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 14:29 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:29 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:29 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:29 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:27 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:27 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:23 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:22 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:20 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:20 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:18 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:18 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:16 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:16 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:16 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:16 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002) (duration: 18m 41s)
  • 14:03 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:03 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:03 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 14:02 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:02 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 14:02 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 14:02 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:01 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 14:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 14:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 14:00 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 14:00 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 14:00 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 14:00 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 13:59 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and urbanecm: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:58 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 13:58 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 13:57 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002)
  • 13:55 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 13:55 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
  • 13:51 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:50 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:50 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:50 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox: apply
  • 13:50 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:45 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventstreams: apply
  • 13:44 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002) (duration: 08m 25s)
  • 13:40 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/zotero: apply
  • 13:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:38 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/wikifeeds: apply
  • 13:37 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/toolhub: apply
  • 13:37 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/termbox: apply
  • 13:37 lucaswerkmeister-wmde@deploy1002: urbanecm and lucaswerkmeister-wmde: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:36 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/tegola-vector-tiles: apply
  • 13:36 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/similar-users: apply
  • 13:35 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002)
  • 13:35 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox-timeline: apply
  • 13:34 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:34 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox-media: apply
  • 13:34 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox-constraints: apply
  • 13:33 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/recommendation-api: apply
  • 13:32 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 13:32 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/rdf-streaming-updater: apply
  • 13:32 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 13:31 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 13:31 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 13:28 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 13:28 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 13:27 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/push-notifications: apply
  • 13:26 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/proton: apply
  • 13:22 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/mobileapps: apply
  • 13:18 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/miscweb: apply
  • 13:17 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/machinetranslation: apply
  • 13:11 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/linkrecommendation: apply
  • 13:10 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/image-suggestion: apply
  • 13:09 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventstreams-internal: apply
  • 13:08 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventgate-main: apply
  • 13:05 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventgate-logging-external: apply
  • 13:03 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventgate-analytics-external: apply
  • 13:02 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventgate-analytics: apply
  • 13:01 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/zotero: apply
  • 13:01 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/wikifeeds: apply
  • 13:01 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/echostore: apply
  • 13:00 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/device-analytics: apply
  • 12:59 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/developer-portal: apply
  • 12:57 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/cxserver: apply
  • 12:56 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/citoid: apply
  • 12:55 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/toolhub: apply
  • 12:55 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/blubberoid: apply
  • 12:54 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/termbox: apply
  • 12:53 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/tegola-vector-tiles: apply
  • 12:51 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:51 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:50 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:50 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/similar-users: apply
  • 12:49 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:49 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox-timeline: apply
  • 12:48 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:48 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:48 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:48 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:48 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:48 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox-media: apply
  • 12:47 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox-constraints: apply
  • 12:47 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox: apply
  • 12:47 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/recommendation-api: apply
  • 12:46 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/rdf-streaming-updater: apply
  • 12:45 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/push-notifications: apply
  • 12:44 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/proton: apply
  • 12:42 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/mobileapps: apply
  • 12:41 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/miscweb: apply
  • 12:40 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/machinetranslation: apply
  • 12:34 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/linkrecommendation: apply
  • 12:31 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 12:31 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 12:30 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/image-suggestion: apply
  • 12:29 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventstreams-internal: apply
  • 12:29 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventstreams: apply
  • 12:28 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventgate-main: apply
  • 12:27 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventgate-logging-external: apply
  • 12:27 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 12:26 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 12:25 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventgate-analytics-external: apply
  • 12:24 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventgate-analytics: apply
  • 12:23 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/echostore: apply
  • 12:22 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/device-analytics: apply
  • 12:21 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/developer-portal: apply
  • 12:20 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/cxserver: apply
  • 12:20 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/citoid: apply
  • 12:19 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/blubberoid: apply
  • 12:18 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/apertium: apply
  • 12:13 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 12:12 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 11:55 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/similar-users: apply
  • 11:53 jayme@deploy1002: helmfile [staging] FAIL (1) helmfile.d/services/miscweb: apply
  • 11:48 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/toolhub: apply
  • 11:48 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox: apply
  • 11:47 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventstreams-internal: apply
  • 11:47 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventgate-main: apply
  • 11:46 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:46 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventgate-analytics: apply
  • 11:45 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/linkrecommendation: apply
  • 11:45 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/mobileapps: apply
  • 11:43 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/device-analytics: apply
  • 11:43 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/citoid: apply
  • 11:42 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox-media: apply
  • 11:41 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/zotero: apply
  • 11:41 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/cxserver: apply
  • 11:40 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventgate-analytics-external: apply
  • 11:33 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/apertium: apply
  • 11:32 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/tegola-vector-tiles: apply
  • 11:31 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox-timeline: apply
  • 11:29 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox-constraints: apply
  • 11:28 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/api-gateway: apply
  • 11:28 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/wikifeeds: apply
  • 11:27 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/termbox: apply
  • 11:26 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/blubberoid: apply
  • 11:26 jayme@deploy1002: helmfile [staging] FAIL (1) helmfile.d/services/similar-users: apply
  • 11:17 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:17 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:12 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/push-notifications: apply
  • 11:10 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/rdf-streaming-updater: apply
  • 11:08 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:08 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:04 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:04 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:03 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:03 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:53 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventstreams: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/echostore: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/image-suggestion: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/recommendation-api: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/machinetranslation: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/sessionstore: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/developer-portal: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventgate-logging-external: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/rest-gateway: -i apply
  • 10:20 jayme@deploy1002: helmfile [staging] FAIL (3) helmfile.d/services/mw-api-int: -i apply
  • 10:20 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/rest-gateway: -i apply
  • 10:05 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 10:05 jayme@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:05 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:05 jayme@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:04 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.16 refs T340244
  • 09:55 jnuche@deploy1002: Pruned MediaWiki: 1.41.0-wmf.13 (duration: 02m 11s)
  • 09:52 jnuche@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.16 refs T340244 (duration: 50m 51s)
  • 09:48 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 09:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 09:47 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 09:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 09:46 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 09:46 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 09:43 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:42 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:42 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:42 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:42 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:41 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:38 jayme: updated envoyproxy to 1.23.10 on all nodes - T300324
  • 09:37 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:37 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:36 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:36 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-workers (exit_code=99) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 09:02 jnuche@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.16 refs T340244
  • 08:56 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.

2023-07-03

  • 22:00 eileen: civicrm upgraded from 9e04c92d to 4ca2008d
  • 20:18 jiji@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 20:15 effie: restarting swift proxies
  • 20:14 jiji@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 19:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 19:11 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 19:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 19:09 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 19:09 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 19:09 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:10 effie: restarting pybal on lvs2013
  • 16:04 effie: restarting pybal on lvs2014
  • 15:57 effie: restarting pybal on lvs2014
  • 15:52 effie: restarting pybal on lvs1019
  • 15:49 effie: restarting pybal on lvs1020
  • 15:34 jiji@cumin1001: conftool action : set/weight=10; selector: name=kubestagemaster1002.eqiad.wmnet
  • 15:34 jiji@cumin1001: conftool action : set/weight=10; selector: name=kubestagemaster2002.codfw.wmnet
  • 15:34 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=kubestagemaster1002.eqiad.wmnet
  • 15:34 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=kubestagemaster2002.codfw.wmnet
  • 15:14 jiji@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=kubernetes-staging,service=kubemaster
  • 15:12 jiji@cumin1001: conftool action : set/weight=10; selector: name=kubestagemaster1002.eqiad.wmnet
  • 15:12 jiji@cumin1001: conftool action : set/weight=10; selector: name=kubestagemaster1001.eqiad.wmnet
  • 15:09 moritzm: installing Java 8 security updates on Hadoop systems
  • 14:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2002.codfw.wmnet with OS bullseye
  • 14:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
  • 14:01 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
  • 13:50 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubestagemaster2002.codfw.wmnet with OS bullseye
  • 13:37 moritzm: installing openjdk-8 security updates
  • 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet
  • 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
  • 13:28 urbanecm: UTC afternoon B&C window done
  • 13:27 urbanecm@deploy1002: Finished scap: Backport for SpecialLog: Fix issues related to IP users (T338042 T340929) (duration: 08m 32s)
  • 13:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
  • 13:22 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 13:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
  • 13:22 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 13:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1011.eqiad.wmnet
  • 13:21 urbanecm: Run `wikiadmin2023@10.64.16.184(idwiki)> DELETE FROM `category` WHERE cat_title = ; ` (T336780)
  • 13:20 urbanecm@deploy1002: func and urbanecm: Backport for SpecialLog: Fix issues related to IP users (T338042 T340929) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:19 urbanecm@deploy1002: Started scap: Backport for SpecialLog: Fix issues related to IP users (T338042 T340929)
  • 13:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
  • 13:18 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1011.eqiad.wmnet
  • 13:10 urbanecm@deploy1002: Finished scap: Backport for Set wgCollectionDisableSidebarLink for nowiki (T340981) (duration: 08m 09s)
  • 13:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
  • 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
  • 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
  • 13:04 urbanecm@deploy1002: jhsoby and urbanecm: Backport for Set wgCollectionDisableSidebarLink for nowiki (T340981) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:02 urbanecm@deploy1002: Started scap: Backport for Set wgCollectionDisableSidebarLink for nowiki (T340981)
  • 13:00 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
  • 12:59 kart_: Updated MinT to 2023-06-29-061037-production (T340709 + Fixed repeatation with Santali)
  • 12:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 12:51 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 12:50 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:46 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 12:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
  • 12:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 12:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
  • 12:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
  • 12:33 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 12:31 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
  • 12:23 kart_: Updated cxserver to 2023-07-03-045311-production (T285217)
  • 12:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
  • 12:18 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:17 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:12 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Awjrichards out of all services on: 1271 hosts
  • 12:12 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:12 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Awjrichards out of all services on: 1271 hosts
  • 12:12 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Awjrichards out of all services on: 20 hosts
  • 12:12 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Awjrichards out of all services on: 20 hosts
  • 12:12 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Awjrichards out of all services on: 760 hosts
  • 12:11 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Awjrichards out of all services on: 760 hosts
  • 12:09 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:08 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging JMinor out of all services on: 760 hosts
  • 12:04 jmm@cumin2002: START - Cookbook sre.idm.logout Logging JMinor out of all services on: 760 hosts
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging JMinor out of all services on: 1271 hosts
  • 12:03 jmm@cumin2002: START - Cookbook sre.idm.logout Logging JMinor out of all services on: 1271 hosts
  • 12:03 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging JMinor out of all services on: 20 hosts
  • 12:03 jmm@cumin2002: START - Cookbook sre.idm.logout Logging JMinor out of all services on: 20 hosts
  • 11:38 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:35 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:35 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:33 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 11:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jameel Kaisar out of all services on: 20 hosts
  • 11:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jameel Kaisar out of all services on: 20 hosts
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jameel Kaisar out of all services on: 1271 hosts
  • 11:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jameel Kaisar out of all services on: 1271 hosts
  • 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jameel Kaisar out of all services on: 760 hosts
  • 11:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jameel Kaisar out of all services on: 760 hosts
  • 11:16 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:16 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add VIP for kubestagemaster - jiji@cumin1001"
  • 11:15 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add VIP for kubestagemaster - jiji@cumin1001"
  • 11:11 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 760 hosts
  • 11:10 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 760 hosts
  • 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 1271 hosts
  • 11:09 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 1271 hosts
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 20 hosts
  • 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 20 hosts
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Appledora out of all services on: 20 hosts
  • 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Appledora out of all services on: 20 hosts
  • 10:53 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:53 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:53 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 10:53 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:53 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:53 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:52 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 10:52 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:51 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:51 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:49 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:48 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 10:48 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:48 topranks: Re-activating Vodafone DE peering at AMS-IX T340670
  • 10:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:47 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:47 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:46 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:42 jayme: imported envoyproxy 1.23.10 to buster-wikimedia, bullseye-wikimedia, bookworm-wikimedia - T300324
  • 10:19 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:18 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:18 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:17 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:03 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:58 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 09:58 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 09:58 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 09:58 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 09:57 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:57 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 09:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Appledora out of all services on: 1271 hosts
  • 09:44 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Appledora out of all services on: 1271 hosts
  • 09:37 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Appledora out of all services on: 760 hosts
  • 09:37 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Appledora out of all services on: 760 hosts
  • 09:36 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 09:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 09:35 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 09:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 09:34 ladsgroup@deploy1002: Finished scap: Backport for Set externallinks migration to read new everywhere except commons (T335343) (duration: 10m 46s)
  • 09:34 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP host
  • 09:34 volans@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP host
  • 09:25 ladsgroup@deploy1002: ladsgroup: Backport for Set externallinks migration to read new everywhere except commons (T335343) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:24 ladsgroup@deploy1002: Started scap: Backport for Set externallinks migration to read new everywhere except commons (T335343)
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bruno Scarone out of all services on: 760 hosts
  • 09:19 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bruno Scarone out of all services on: 760 hosts
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bruno Scarone out of all services on: 1271 hosts
  • 09:18 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bruno Scarone out of all services on: 1271 hosts
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bruno Scarone out of all services on: 20 hosts
  • 09:17 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bruno Scarone out of all services on: 20 hosts
  • 09:13 lucaswerkmeister-wmde:: Deployed security patch for T339016
  • 09:05 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Barakat Ajadi out of all services on: 4 hosts
  • 09:04 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Barakat Ajadi out of all services on: 4 hosts
  • 08:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Michael.hay out of all services on: 20 hosts
  • 08:56 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Michael.hay out of all services on: 20 hosts
  • 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Michael.hay out of all services on: 1271 hosts
  • 08:55 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Michael.hay out of all services on: 1271 hosts
  • 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Michael.hay out of all services on: 760 hosts
  • 08:55 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Michael.hay out of all services on: 760 hosts
  • 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Skye Berghel out of all services on: 760 hosts
  • 08:54 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Skye Berghel out of all services on: 760 hosts
  • 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Skye Berghel out of all services on: 1271 hosts
  • 08:53 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Skye Berghel out of all services on: 1271 hosts
  • 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Skye Berghel out of all services on: 20 hosts
  • 08:53 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Skye Berghel out of all services on: 20 hosts
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tom Magerlein out of all services on: 20 hosts
  • 08:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Tom Magerlein out of all services on: 20 hosts
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tom Magerlein out of all services on: 1271 hosts
  • 08:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Tom Magerlein out of all services on: 1271 hosts
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tom Magerlein out of all services on: 760 hosts
  • 08:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Tom Magerlein out of all services on: 760 hosts
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Damiendf out of all services on: 760 hosts
  • 08:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Damiendf out of all services on: 760 hosts
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Damiendf out of all services on: 1271 hosts
  • 08:50 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Damiendf out of all services on: 1271 hosts
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Damiendf out of all services on: 20 hosts
  • 08:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Damiendf out of all services on: 20 hosts
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dasm out of all services on: 20 hosts
  • 08:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dasm out of all services on: 20 hosts
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging David.pujol out of all services on: 1271 hosts
  • 08:48 jmm@cumin2002: START - Cookbook sre.idm.logout Logging David.pujol out of all services on: 1271 hosts
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging David.pujol out of all services on: 760 hosts
  • 08:48 jmm@cumin2002: START - Cookbook sre.idm.logout Logging David.pujol out of all services on: 760 hosts
  • 08:48 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dasm out of all services on: 760 hosts
  • 08:47 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dasm out of all services on: 760 hosts
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dasm out of all services on: 1271 hosts
  • 08:46 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dasm out of all services on: 1271 hosts
  • 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dasm out of all services on: 20 hosts
  • 08:46 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dasm out of all services on: 20 hosts
  • 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Aranyap out of all services on: 20 hosts
  • 08:45 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Aranyap out of all services on: 20 hosts
  • 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Aranyap out of all services on: 1271 hosts
  • 08:45 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Aranyap out of all services on: 1271 hosts
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Aranyap out of all services on: 760 hosts
  • 08:44 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Aranyap out of all services on: 760 hosts
  • 08:44 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 08:33 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 07:54 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: sync
  • 07:54 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: sync
  • 07:32 taavi@deploy1002: Finished scap: Backport for Update plwiki autopromote per consensus (T340397) (duration: 07m 48s)
  • 07:25 taavi@deploy1002: msz2001 and taavi: Backport for Update plwiki autopromote per consensus (T340397) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 07:24 taavi@deploy1002: Started scap: Backport for Update plwiki autopromote per consensus (T340397)
  • 07:22 taavi@deploy1002: Finished scap: Backport for Enable edit-in-sequence in Italian Wikisource (T340847) (duration: 18m 21s)
  • 07:13 taavi@deploy1002: soda and taavi: Backport for Enable edit-in-sequence in Italian Wikisource (T340847) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:04 taavi@deploy1002: Started scap: Backport for Enable edit-in-sequence in Italian Wikisource (T340847)

2023-07-01

Other archives

2000s

2010s

2020s