Jump to content

Server Admin Log/Archive 61

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2022-12-31

2022-12-30

  • 21:36 dcausse: restarting blazegraph on wdqs1006 and wdqs1013 (BlazegraphFreeAllocatorsDecreasingRapidly)

2022-12-29

  • 23:26 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:25 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 23:22 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on an-worker1084.eqiad.wmnet with reason: Avoid IRC spam
  • 09:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on an-worker1084.eqiad.wmnet with reason: Avoid IRC spam

2022-12-22

  • 18:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1015.eqiad.wmnet with OS bullseye
  • 18:27 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 18:16 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 18:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1015.eqiad.wmnet with reason: host reimage
  • 18:00 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1015.eqiad.wmnet with reason: host reimage
  • 17:25 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:25 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas ulsfo decom - robh@cumin2002"
  • 17:24 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas ulsfo decom - robh@cumin2002"
  • 17:22 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:56 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
  • 16:51 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1015.eqiad.wmnet with OS bullseye
  • 16:51 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:50 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:23 elukey@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool inference in codfw: maintenance
  • 16:18 elukey@cumin1001: START - Cookbook sre.discovery.service-route pool inference in codfw: maintenance
  • 16:17 elukey@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool inference in eqiad: maintenance
  • 16:17 elukey@cumin1001: START - Cookbook sre.discovery.service-route depool inference in eqiad: maintenance
  • 16:15 elukey@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool inference in codfw: maintenance
  • 16:10 elukey@cumin1001: START - Cookbook sre.discovery.service-route depool inference in codfw: maintenance
  • 16:09 elukey@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check inference: maintenance
  • 16:09 elukey@cumin1001: START - Cookbook sre.discovery.service-route check inference: maintenance
  • 16:00 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:40 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1015.eqiad.wmnet with OS bullseye
  • 14:47 akosiaris: truncate daemon.log.1 on maps1009 to free up disk space
  • 14:43 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1015.eqiad.wmnet with OS bullseye
  • 14:42 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1015.eqiad.wmnet with OS bullseye
  • 13:46 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1015.eqiad.wmnet with OS bullseye
  • 13:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1014.eqiad.wmnet with OS bullseye
  • 13:18 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 12:49 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 12:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1014.eqiad.wmnet with reason: host reimage
  • 12:34 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1014.eqiad.wmnet with reason: host reimage
  • 11:32 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1014.eqiad.wmnet with OS bullseye
  • 11:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1013.eqiad.wmnet with OS bullseye
  • 11:30 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 11:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 17806
  • 11:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 17806
  • 11:09 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 10:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1013.eqiad.wmnet with reason: host reimage
  • 10:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1013.eqiad.wmnet with reason: host reimage
  • 09:38 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1013.eqiad.wmnet with OS bullseye
  • 08:11 moritzm: installing libksba security updates
  • 07:44 vgutierrez: restarting varnish on cp4052 to clear VarnishChildRestarted alert - T325797
  • 07:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56286
  • 07:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 56286
  • 07:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 17806
  • 07:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 17806
  • 07:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 20485
  • 07:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20485
  • 07:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28398
  • 07:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 28398
  • 01:54 mbsantos: Update verbiage of fundraising banner (T325690)
  • 01:53 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 01:52 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 01:52 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 01:51 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 01:50 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 01:50 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply

2022-12-21

  • 23:41 ejegg: civicrm upgraded from d80f9550 to e3405a4e
  • 20:34 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:33 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 13:39 moritzm: installing nano bugfix updates from Bullseye point release
  • 13:32 moritzm: installing node-minimatch security updates
  • 12:02 cgoubert@cumin1001: conftool action : set/pooled=yes:weight=1; selector: dc=eqiad,cluster=parsoid,name=parse1003.eqiad.wmnet,service=canary
  • 11:50 moritzm: instaling libde265 security updates
  • 11:45 moritzm: installing libbluray bugfix update for buster
  • 11:40 moritzm: installing joblib security updates
  • 11:11 moritzm: installing php7.3 security updates on buster
  • 09:55 jayme: scaling shellbox-syntaxhighlight back to 12 replicas
  • 09:09 jayme: correction: increasing replicas of shellbox-syntaxhighlight from 12 to 40
  • 09:08 jayme: increasing replicas of shellbox-syntaxhighlight from 12 to 50
  • 08:32 ryankemper: Downtiming wdqs 20[09-12] until 2023-01-02 (these are new hosts not yet properly brought into service)
  • 00:10 eileen: + fc05361 Check for correct fin type name in hasEndowment

2022-12-20

  • 23:22 ryankemper: [WDQS] Powercycling `wdqs1005`
  • 23:20 eileen: evision changed from 6d8c98a7 to 52a513cb
  • 20:39 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:16 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:10 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:09 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:01 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 20:01 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 20:00 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 19:58 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 18:07 mutante: otrs1001 - upgraded clamav daemon package, manually removed run dir and pid file, stopped, started clamav daemon
  • 16:50 dancy@deploy1002: scap failed: NameError name 'SyslogFormatter' is not defined (duration: 00m 00s)
  • 16:49 dancy@deploy1002: scap failed: NameError name 'SyslogFormatter' is not defined (duration: 00m 00s)
  • 16:46 dancy@deploy1002: scap failed: NameError name 'logging' is not defined (duration: 00m 00s)
  • 16:19 dancy@deploy1002: backport aborted: (duration: 00m 01s)
  • 16:17 dancy@deploy1002: backport aborted: (duration: 00m 04s)
  • 16:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1012.eqiad.wmnet with OS bullseye
  • 16:00 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 15:59 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 15:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1012.eqiad.wmnet with reason: host reimage
  • 15:42 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1012.eqiad.wmnet with reason: host reimage
  • 15:25 moritzm: installing jupyter-core security updates
  • 14:38 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1012.eqiad.wmnet with OS bullseye
  • 14:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1011.eqiad.wmnet with OS bullseye
  • 14:31 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 14:29 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 14:19 moritzm: installing libpgjava security updates
  • 14:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1011.eqiad.wmnet with reason: host reimage
  • 14:16 moritzm: installing jackson-databind security updates
  • 14:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1011.eqiad.wmnet with reason: host reimage
  • 14:10 moritzm: installing ruby-rails-html-sanitizer security updates
  • 13:04 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:58 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:13 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1011.eqiad.wmnet with OS bullseye
  • 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1001.wikimedia.org
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1001.wikimedia.org
  • 11:16 moritzm: installing apache2 security updates on Buster
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host archiva1002.wikimedia.org
  • 10:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host archiva1002.wikimedia.org
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1002.eqiad.wmnet
  • 10:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host matomo1002.eqiad.wmnet
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-web1001.eqiad.wmnet
  • 10:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-web1001.eqiad.wmnet
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1009.eqiad.wmnet
  • 10:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1009.eqiad.wmnet
  • 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1008.eqiad.wmnet
  • 10:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1008.eqiad.wmnet
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1010.eqiad.wmnet
  • 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1010.eqiad.wmnet
  • 10:16 moritzm: rebalance ganeti cluster in ulsfo after adding new node and decom of the old hardware T317247
  • 10:06 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:06 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:05 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:05 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 09:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 09:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:45 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:45 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:40 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:40 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
  • 08:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
  • 04:10 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 03:56 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 02:02 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 01:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 00:40 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 00:38 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 00:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)

2022-12-19

  • 23:50 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 23:32 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2009.*
  • 23:32 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2010.*
  • 23:32 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2011.*
  • 23:32 ryankemper: [WDQS] Temporarily removing wdqs20[09-12] from pybal; these are new hosts that aren't ready for service until data reload has completed (long-running process). In meantime, remove these so they don't factor into pybal's depool threshold
  • 23:30 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2012.*
  • 23:30 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 23:07 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 23:05 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2003.codfw.wmnet with OS bullseye
  • 23:01 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 23:01 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 22:59 ryankemper: [WDQS] Continuing with reboot of WDQS hosts. Doing 1 host each of `[eqiad, codfw]` X `[internal, public]`, so 4 total hosts at once
  • 22:58 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:58 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:58 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 22:58 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:58 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 22:58 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:58 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:57 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:56 ryankemper: [WDQS] Pooled `wdqs2005`
  • 22:43 ryankemper: [WDQS] Pooled `wdqs2007` (was depooled, we may have forgotten to re-pool it in the last week or so)
  • 22:38 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 22:34 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:24 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 22:23 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 22:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev[1004-1006].eqiad.wmnet
  • 22:10 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:10 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 22:09 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 22:06 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 21:58 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev[1004-1006].eqiad.wmnet
  • 21:25 jhuneidi@deploy1002: Installation of scap version "4.30.3" completed for 563 hosts
  • 21:25 jhuneidi@deploy1002: Installing scap version "4.30.3" for 563 hosts
  • 20:42 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 20:42 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 20:39 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 20:38 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 20:32 eileen: civicrm upgraded from 98b48b9a to 3978bd6c
  • 19:33 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: host reimage
  • 19:30 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: host reimage
  • 19:03 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2003.codfw.wmnet with OS bullseye
  • 18:29 sukhe: running authdns-update for Gerrit: 869277: T188561
  • 16:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2002.codfw.wmnet
  • 16:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2002.codfw.wmnet
  • 16:44 moritzm: installing node-tar security updates
  • 16:40 elukey@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
  • 16:40 elukey@cumin1001: START - Cookbook sre.discovery.service-route
  • 16:38 moritzm: installing node-json-schema security updates
  • 16:29 moritzm: installing virglrenderer security updates
  • 16:21 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas
  • 16:19 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas
  • 15:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2001.codfw.wmnet
  • 15:54 elukey@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 15:54 elukey@cumin1001: START - Cookbook sre.discovery.service-route
  • 15:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2001.codfw.wmnet
  • 15:12 thcipriani@deploy1002: Finished scap: Backport for Release new DiscussionTools reply button enhancement to Arabic (T323537) (duration: 09m 31s)
  • 15:04 thcipriani@deploy1002: thcipriani and kemayo: Backport for Release new DiscussionTools reply button enhancement to Arabic (T323537) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 15:02 thcipriani@deploy1002: Started scap: Backport for Release new DiscussionTools reply button enhancement to Arabic (T323537)
  • 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1002.eqiad.wmnet
  • 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be1002.eqiad.wmnet
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2129 main traffic weight', diff saved to https://phabricator.wikimedia.org/P42725 and previous config saved to /var/cache/conftool/dbconfig/20221219-145357-marostegui.json
  • 14:37 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:37 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:34 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:34 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:33 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:33 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:32 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:32 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1002.eqiad.wmnet
  • 14:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-presto[1011-1015].eqiad.wnet
  • 14:20 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-presto[1011-1015].eqiad.wnet
  • 14:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe1002.eqiad.wmnet
  • 14:08 oblivian@deploy1002: Synchronized README: Null sync to force a redeployment of the php-fpm base image (duration: 13m 04s)
  • 14:06 moritzm: installing giflib security updates
  • 13:58 moritzm: installing glibc security updates
  • 13:42 _joe_: purge old docker images from deploy1002 by hand
  • 13:27 moritzm: installing PHP 7.3 security updates on buster
  • 13:19 phedenskog@deploy1002: Finished deploy [performance/navtiming@6aedc70]: (no justification provided) (duration: 00m 08s)
  • 13:19 phedenskog@deploy1002: Started deploy [performance/navtiming@6aedc70]: (no justification provided)
  • 12:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-presto[1006-1010].eqiad.wnet
  • 12:13 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-presto[1006-1010].eqiad.wnet
  • 12:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five
  • 12:09 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five
  • 11:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test
  • 11:43 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test
  • 11:29 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) check 1 in ml-serve-codfw: maintenance
  • 11:29 elukey@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 11:29 elukey@cumin1001: START - Cookbook sre.discovery.service-route
  • 11:29 elukey@cumin1001: START - Cookbook sre.k8s.pool-depool-cluster check 1 in ml-serve-codfw: maintenance
  • 11:27 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 10:51 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 10:36 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d (duration: 02m 28s)
  • 10:33 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d
  • 10:28 taavi@deploy1002: Finished scap: Backport for Only preload getPageData if there's thread data for the page (T325477) (duration: 07m 58s)
  • 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0)
  • 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 10:23 elukey@cumin1001: START - Cookbook sre.discovery.service-route
  • 10:23 elukey@cumin1001: START - Cookbook sre.k8s.pool-depool-cluster
  • 10:21 taavi@deploy1002: taavi and taavi: Backport for Only preload getPageData if there's thread data for the page (T325477) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 10:20 taavi@deploy1002: Started scap: Backport for Only preload getPageData if there's thread data for the page (T325477)
  • 09:59 moritzm: update bullseye netboot image for Bullseye 11.6 point release T325186
  • 09:49 aqu@deploy1002: Finished deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269] (duration: 00m 13s)
  • 09:48 aqu@deploy1002: Started deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269]
  • 09:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269] (duration: 00m 11s)
  • 09:47 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269]
  • 09:30 aqu@deploy1002: Finished deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff] (duration: 00m 08s)
  • 09:29 aqu@deploy1002: Started deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff]
  • 09:29 aqu@deploy1002: Finished deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff] (duration: 08m 02s)
  • 09:21 aqu@deploy1002: Started deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff]
  • 09:20 aqu@deploy1002: Finished deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff] (duration: 01m 14s)
  • 09:19 aqu@deploy1002: Started deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff]
  • 09:17 aqu: About to deploy analytics/refinery (bug fix in HDFS usage pipeline)
  • 09:15 ladsgroup@deploy1002: Finished scap: Backport for Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477) (duration: 09m 24s)
  • 09:14 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:11 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:07 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 09:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:05 ladsgroup@deploy1002: Started scap: Backport for Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)
  • 09:04 dcausse: restarting blazegraph on wdqs1015 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 09:03 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:02 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:01 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:59 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:56 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:37 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:33 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 29535
  • 08:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 29535
  • 08:02 moritzm: installing openexr security updates
  • 07:22 phedenskog@deploy1002: Finished deploy [performance/navtiming@5770d46]: (no justification provided) (duration: 00m 08s)
  • 07:22 phedenskog@deploy1002: Started deploy [performance/navtiming@5770d46]: (no justification provided)
  • 07:05 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 28398
  • 07:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 28398
  • 07:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11686
  • 06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11686
  • 03:08 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: disable wgDiscussionToolsABTest T325477 T321961 (duration: 15m 23s)

2022-12-18

  • 19:40 sukhe: ran sudo cumin -b 1 -s 30 'A:mw-api and A:eqiad' 'restart-php7.4-fpm' [at 18:55 UTC]: T325477
  • 19:31 sukhe: [FINISHED] running sudo cumin -b 1 -s 30 'A:mw-api and A:eqiad' 'restart-php7.4-fpm'
  • 19:31 sukhe: [FINISHED] running sudo cumin -b 1 -s 30 'A:mw-api and A:eqiad' 'restart-php7.4-fpm'
  • 18:55 sukhe: running sudo cumin -b 1 -s 30 'A:mw-api and A:eqiad' 'restart-php7.4-fpm'
  • 18:28 cgoubert@cumin1001: END (FAIL) - Cookbook sre.mediawiki.restart-appservers (exit_code=99)
  • 18:28 cgoubert@cumin1001: START - Cookbook sre.mediawiki.restart-appservers
  • 18:20 cgoubert@cumin1001: END (FAIL) - Cookbook sre.mediawiki.restart-appservers (exit_code=99)
  • 18:20 cgoubert@cumin1001: START - Cookbook sre.mediawiki.restart-appservers

2022-12-17

  • 14:36 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:30 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .

2022-12-16

  • 19:55 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4007.ulsfo.wmnet with OS bullseye
  • 19:55 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002"
  • 19:00 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002"
  • 18:44 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
  • 18:41 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
  • 18:19 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4007.ulsfo.wmnet with OS bullseye
  • 18:08 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4007']
  • 17:56 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4007']
  • 17:53 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4007']
  • 17:52 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4007']
  • 17:28 mutante: etherpad1003 - testing new: sudo systemctl start wmf_auto_restart_envoyproxy
  • 16:46 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4007.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:24 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4007.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:24 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:23 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: gaenti4007 - robh@cumin2002"
  • 16:22 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: gaenti4007 - robh@cumin2002"
  • 16:20 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:19 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4007
  • 16:19 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4007
  • 15:34 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@44d4e81]: Fix subtle bug on image_suggestions when resolving varprop. (duration: 00m 09s)
  • 15:34 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@44d4e81]: Fix subtle bug on image_suggestions when resolving varprop.
  • 13:48 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2001-dev.codfw.wmnet with OS bullseye
  • 12:32 jbond: deploy update to webproxy https://gerrit.wikimedia.org/r/c/operations/puppet/+/868372
  • 12:09 volans: run homer on cr[1-2]-{eqiad,codfw} to allow SSH from cloudcumin hosts to cloud hosts
  • 12:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28398
  • 12:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 28398
  • 11:19 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
  • 11:16 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
  • 10:51 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2001-dev.codfw.wmnet with OS bullseye
  • 09:44 moritzm: import terraform 1.3.6 to thirdparty/terraform for buster/bullseye T322344
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:13 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti5003.eqsin.wmnet
  • 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5003.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 09:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5003.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:53 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:45 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 08:45 jayme@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5003.eqsin.wmnet
  • 08:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 08:42 jayme@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 08:41 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 08:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 08:35 moritzm: power down ganeti5003 manually (mgmt/IPMI broken) for pending decom T322048
  • 08:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2002.wikimedia.org
  • 08:18 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host bast2002.wikimedia.org
  • 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org
  • 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org
  • 00:51 mutante: puppetmasters - merged gerrit:868481 to "revert" gerrit:866644,ran puppet and 'systemctl reset-failed' via cumin on 10 masters, resolved monitoring alerts
  • 00:22 mutante: aphlict1001 - systemctl start man-db - T325246
  • 00:21 mutante: aphlict1001 - systemctl start logrotate - T325246
  • 00:21 mutante: aphlict1001 - :/var/log/aphlict# truncate aphlict.log --size 100M - T325246

2022-12-15

  • 23:10 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netmon1002.wikimedia.org
  • 23:10 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:10 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netmon1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 23:09 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netmon1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 23:05 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 23:00 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts netmon1002.wikimedia.org
  • 22:49 TheresNoTime: `[samtar@mwmaint1002 imports]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Coffeeandcrumbs /home/samtar/imports` T325330
  • 22:48 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts netmon2001.wikimedia.org
  • 22:48 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:47 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 22:42 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts netmon2001.wikimedia.org
  • 22:11 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts netmon2001.wikimedia.org
  • 22:11 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:10 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 22:02 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts netmon2001.wikimedia.org
  • 21:59 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 21:59 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:52 TheresNoTime: done UTC late backport and config training
  • 21:47 samtar@deploy1002: Finished scap: Backport for NewImpact: Use "View all edits" in footer (T325216) (duration: 24m 38s)
  • 21:42 TheresNoTime: logging to note that this deploy is unusually slow, P42715
  • 21:36 samtar@deploy1002: samtar and tgr: Backport for NewImpact: Use "View all edits" in footer (T325216) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:25 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 21:24 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:23 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 21:23 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:22 samtar@deploy1002: Started scap: Backport for NewImpact: Use "View all edits" in footer (T325216)
  • 21:17 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 21:16 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:15 cwhite@cumin2002: conftool action : set/pooled=no; selector: name=logstash2024.codfw.wmnet,service=kibana7
  • 21:11 samtar@deploy1002: Finished scap: Backport for Update reference to CommandLineInc (T184782) (duration: 08m 18s)
  • 21:04 samtar@deploy1002: samtar and zabe: Backport for Update reference to CommandLineInc (T184782) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:03 samtar@deploy1002: Started scap: Backport for Update reference to CommandLineInc (T184782)
  • 20:22 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 20:21 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 20:20 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 20:19 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 20:02 denisse@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka logging-eqiad cluster: Reboot kafka nodes
  • 19:35 mutante: short downtime for misc websites, iegreview, racktables, transparency.wm, annual.wm, design.wm, sitemaps.wm, research.wm, bienvenida.wm, wikiworkshop.org
  • 19:08 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@d23127b]: Deploying Spark3 upgrade of image_suggestions job to the platform_eng Airflow instance. (duration: 00m 10s)
  • 19:07 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@d23127b]: Deploying Spark3 upgrade of image_suggestions job to the platform_eng Airflow instance.
  • 18:50 robh: starting msw2-ulsfo swap for rack .23, mgmt will flap with no expected user impact
  • 18:48 denisse@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka logging-eqiad cluster: Reboot kafka nodes
  • 18:37 denisse@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka logging-codfw cluster: Reboot kafka nodes
  • 18:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1010.eqiad.wmnet with OS bullseye
  • 18:08 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 18:08 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:07 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:05 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:05 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:05 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 18:04 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:03 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:51 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2002.codfw.wmnet with OS bullseye
  • 17:51 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1010.eqiad.wmnet with reason: host reimage
  • 17:48 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1010.eqiad.wmnet with reason: host reimage
  • 17:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db2129', diff saved to https://phabricator.wikimedia.org/P42714 and previous config saved to /var/cache/conftool/dbconfig/20221215-174537-ladsgroup.json
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2129', diff saved to https://phabricator.wikimedia.org/P42713 and previous config saved to /var/cache/conftool/dbconfig/20221215-173713-ladsgroup.json
  • 17:25 denisse@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka logging-codfw cluster: Reboot kafka nodes
  • 17:24 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: host reimage
  • 17:24 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on miscweb2002.codfw.wmnet with reason: reboot
  • 17:24 mutante: miscweb2002 - passive host, rebooting
  • 17:24 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on miscweb2002.codfw.wmnet with reason: reboot
  • 17:21 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: host reimage
  • 17:17 volans: manually removed /etc/spicerack/redis_cluster/sessions.yaml from the cumin hosts
  • 17:06 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on phab2002.codfw.wmnet with reason: reboot
  • 17:06 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on phab2002.codfw.wmnet with reason: reboot
  • 17:04 mutante: phab2002 (non active phabricator server) - rebooting
  • 16:58 mutante: aphlict1001 - aphlict service did not come back after rebooting machine. fix was to manually 'rm /var/run/aphlict/aphlict.pid' and 'systemtctl start aphlict'
  • 16:56 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3001.esams.wmnet
  • 16:54 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2002.codfw.wmnet with OS bullseye
  • 16:52 mutante: aphlict1001 - rebooting - this could mean for a minute there are dropped notifications about Phabricator tickets
  • 16:50 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus3001.esams.wmnet
  • 16:45 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1010.eqiad.wmnet with OS bullseye
  • 16:43 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring" (duration: 01m 44s)
  • 16:42 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring"
  • 16:36 volans: upgrading spicerack to v6.0.0 on cumin1001
  • 16:34 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v6.0.0
  • 16:34 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v6.0.0
  • 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1003.wikimedia.org
  • 16:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2002-dev.wikimedia.org
  • 16:24 volans: upgrading spicerack to v6.0.0 on cumin2002
  • 16:23 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudweb1003.wikimedia.org
  • 16:22 sukhe@cumin2002: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM durum1001.eqiad.wmnet
  • 16:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1004.wikimedia.org
  • 16:17 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host wdqs2009.codfw.wmnet
  • 16:02 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch2001.codfw.wmnet with OS bullseye
  • 16:02 moritzm: installing glibc security updates on bullseye
  • 16:02 nemo-yiannis: switching maps/kartotherian back to codfw
  • 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-stretch2001.codfw.wmnet with reason: host reimage
  • 15:57 moritzm: installing openexr security updates
  • 15:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-stretch2001.codfw.wmnet with reason: host reimage
  • 15:41 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
  • 15:37 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:37 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:34 volans: temporary disabling puppet on A:cumin-all to deploy spicerack v6.0.0
  • 15:31 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:30 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:24 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:24 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum1001.eqiad.wmnet
  • 15:22 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:15 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:15 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:15 reedy@deploy1002: Synchronized php-1.40.0-wmf.14/extensions/GrowthExperiments/: Two backports (duration: 06m 57s)
  • 15:12 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:12 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:06 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:06 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:05 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:05 moritzm: imported prometheus-jmx-exporter for bookworm-wikimedia T321783
  • 14:59 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch2001.codfw.wmnet with OS bullseye
  • 14:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-stretch1002.eqiad.wmnet with OS bullseye
  • 14:44 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudweb1004.wikimedia.org
  • 14:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1004.eqiad.wmnet
  • 14:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-stretch1002.eqiad.wmnet with reason: host reimage
  • 14:30 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup1004.eqiad.wmnet
  • 14:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1003.eqiad.wmnet
  • 14:27 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on cawiki (T297984 T314318) (duration: 09m 18s)
  • 14:27 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:26 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-stretch1002.eqiad.wmnet with reason: host reimage
  • 14:26 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:25 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudweb2002-dev.wikimedia.org
  • 14:23 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup1003.eqiad.wmnet
  • 14:20 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and arlolra: Backport for Disable wgParserEnableLegacyMediaDOM on cawiki (T297984 T314318) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:18 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on cawiki (T297984 T314318)
  • 14:17 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Remove obsolete setting $wgAutoloadAttemptLowercase (T231412) (duration: 10m 57s)
  • 14:07 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and ki: Backport for Remove obsolete setting $wgAutoloadAttemptLowercase (T231412) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Remove obsolete setting $wgAutoloadAttemptLowercase (T231412)
  • 13:51 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1002.eqiad.wmnet with OS bullseye
  • 13:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 13:34 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 13:06 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 12:51 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-stretch1001.eqiad.wmnet with reason: host reimage
  • 12:48 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-stretch1001.eqiad.wmnet with reason: host reimage
  • 12:36 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 12:20 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 12:19 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kartotherian,name=maps1010.eqiad.wmnet
  • 12:19 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kartotherian-ssl,name=maps1010.eqiad.wmnet
  • 12:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1002.wikimedia.org
  • 12:08 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@00c9a16] (eqiad): codfw: Disable traffic mirroring (duration: 01m 00s)
  • 12:07 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@00c9a16] (eqiad): codfw: Disable traffic mirroring
  • 12:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1002.wikimedia.org
  • 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
  • 11:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
  • 11:43 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 11:42 effie: switching maps/kartotherian from codfw to eqiad
  • 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
  • 11:39 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@00c9a16] (codfw): codfw: Disable traffic mirroring (duration: 01m 43s)
  • 11:37 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@00c9a16] (codfw): codfw: Disable traffic mirroring
  • 11:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flowspec1001.eqiad.wmnet
  • 11:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host flowspec1001.eqiad.wmnet
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping3002.esams.wmnet
  • 11:15 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test-eqiad cluster: Reboot kafka nodes
  • 11:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping3002.esams.wmnet
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2002.codfw.wmnet
  • 11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2002.codfw.wmnet
  • 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1002.eqiad.wmnet
  • 10:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1002.eqiad.wmnet
  • 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
  • 10:47 XioNoX: disable ping offload in eqiad
  • 10:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2001.wikimedia.org
  • 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2001.wikimedia.org
  • 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1002.wikimedia.org
  • 10:34 jayme: restarted istiod pods in aux-k8s because of T303184
  • 10:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1002.wikimedia.org
  • 09:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
  • 09:54 effie: stopping and masking nutcracker on mw servers - T277183
  • 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
  • 09:51 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host acmechief2001.codfw.wmnet
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1001.wikimedia.org
  • 09:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
  • 09:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
  • 09:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt1001.wikimedia.org
  • 09:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
  • 09:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
  • 09:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
  • 09:30 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host acmechief-test1001.eqiad.wmnet
  • 09:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
  • 09:27 elukey@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test-eqiad cluster: Reboot kafka nodes
  • 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2002.wikimedia.org
  • 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2002.wikimedia.org
  • 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1001.wikimedia.org
  • 09:12 akosiaris: reboot rdb2007 for kernel upgrades
  • 09:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1001.wikimedia.org
  • 09:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.14 refs T320519
  • 08:53 akosiaris: reboot rdb2009 for kernel upgrades
  • 08:52 akosiaris: correction: reboot rdb1011 for kernel upgrades
  • 08:51 akosiaris: reboot rdb1007 for kernel upgrades
  • 08:51 akosiaris: nothing noticed with rdb1007 reboot for mw, jobqueue, api-gateway. changeprop had a minor backlog increase, but everything appears fine now.
  • 08:28 akosiaris: reboot rdb1009 for kernel upgrades. possibly (but probably not) affected applications: changeprop, cpjobqueue, api-gateway, redisLockManager
  • 08:14 kartik@deploy1002: Finished scap: Backport for Enable Section Translation on 6 WPs (T319177) (duration: 10m 55s)
  • 08:08 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb2003.codfw.wmnet
  • 08:04 kartik@deploy1002: kartik and kartik: Backport for Enable Section Translation on 6 WPs (T319177) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 08:03 kartik@deploy1002: Started scap: Backport for Enable Section Translation on 6 WPs (T319177)
  • 07:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet
  • 01:46 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2026.codfw.wmnet with OS bullseye
  • 00:58 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2026.codfw.wmnet with reason: host reimage
  • 00:55 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2026.codfw.wmnet with reason: host reimage
  • 00:32 mutante: releases1002 - rebooting
  • 00:30 mutante: releases2002 - rebooting
  • 00:19 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2026.codfw.wmnet with OS bullseye
  • 00:19 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash2026.codfw.wmnet with OS bullseye
  • 00:15 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2026.codfw.wmnet with OS bullseye
  • 00:14 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash2026.codfw.wmnet with OS bullseye
  • 00:05 tgr: EU late backports done
  • 00:05 tgr@deploy1002: Synchronized php-1.40.0-wmf.14/extensions/GrowthExperiments/: Backport: User impact: read edit count from primary db in save complete hook (T324930) (duration: 07m 03s)

2022-12-14

  • 23:50 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2026.codfw.wmnet with OS bullseye
  • 23:48 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2026']
  • 23:44 ejegg: civicrm upgraded from a1c2630a to 98b48b9a
  • 23:41 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2026']
  • 23:40 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2026']
  • 23:33 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2026']
  • 23:29 ryankemper: [WDQS] Downtimed wdqs20[09-12] for the next 7 days
  • 23:28 ryankemper: T301167 wdqs2011/2012 were not visible in pybal (oversight from when I added the other hosts with conftool last week). Fixed that, so now all of the new hosts are showing up properly.
  • 23:27 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=wdqs2012.*
  • 23:27 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=wdqs2011.*
  • 23:14 bd808: Toolhub: rebuilding search indices following app update
  • 23:12 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 23:10 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on alert2001.wikimedia.org with reason: kernel update
  • 23:10 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 23:10 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on alert2001.wikimedia.org with reason: kernel update
  • 23:04 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 23:03 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2001.wikimedia.org
  • 23:03 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host alert2001.wikimedia.org
  • 23:03 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 23:01 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 22:59 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 22:56 tgr: doing the last backport by hand due to T325252
  • 22:49 tgr@deploy1002: Finished scap: Backport for NewImpact: Add log event for clicking suggested edits button (T325041), UserEditTracker: Allow querying primary DB for edit timestamp (duration: 11m 37s)
  • 22:46 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host alert1001.wikimedia.org
  • 22:39 tgr@deploy1002: tgr and kharlan and tgr: Backport for NewImpact: Add log event for clicking suggested edits button (T325041), UserEditTracker: Allow querying primary DB for edit timestamp synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 22:37 tgr@deploy1002: Started scap: Backport for NewImpact: Add log event for clicking suggested edits button (T325041), UserEditTracker: Allow querying primary DB for edit timestamp
  • 22:36 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on wdqs2009.codfw.wmnet with reason: NFS troubleshooting
  • 22:36 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2009.codfw.wmnet with reason: NFS troubleshooting
  • 22:32 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host alert1001.wikimedia.org
  • 22:12 samtar@deploy1002: Finished scap: Backport for Deployment of DiscussionTools reply visual enhancements for more wikis (T323537) (duration: 08m 12s)
  • 22:06 samtar@deploy1002: samtar and kemayo: Backport for Deployment of DiscussionTools reply visual enhancements for more wikis (T323537) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:04 samtar@deploy1002: Started scap: Backport for Deployment of DiscussionTools reply visual enhancements for more wikis (T323537)
  • 22:03 samtar@deploy1002: Finished scap: Backport for VisualEnhancements: in some languages put an arrow by the reply button (T323537), VisualEnhancements: in some languages put an arrow by the reply button (T323537) (duration: 08m 55s)
  • 21:56 samtar@deploy1002: samtar and kemayo: Backport for VisualEnhancements: in some languages put an arrow by the reply button (T323537), VisualEnhancements: in some languages put an arrow by the reply button (T323537) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:55 eileen: civicrm upgraded from a65bc00a to a1c2630a
  • 21:54 samtar@deploy1002: Started scap: Backport for VisualEnhancements: in some languages put an arrow by the reply button (T323537), VisualEnhancements: in some languages put an arrow by the reply button (T323537)
  • 21:53 samtar@deploy1002: Finished scap: Backport for Parsoid: don't bypass ParserCache when using Title (duration: 11m 13s)
  • 21:44 samtar@deploy1002: samtar and daniel: Backport for Parsoid: don't bypass ParserCache when using Title synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:42 samtar@deploy1002: Started scap: Backport for Parsoid: don't bypass ParserCache when using Title
  • 21:39 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 21:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 21:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1060.eqiad.wmnet with OS bullseye
  • 21:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1059.eqiad.wmnet with OS bullseye
  • 21:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1058.eqiad.wmnet with OS bullseye
  • 21:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1061.eqiad.wmnet with OS bullseye
  • 21:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 21:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cloudvirt1056.eqiad.wmnet with OS bullseye
  • 21:17 samtar@deploy1002: backport aborted: (duration: 15m 35s)
  • 21:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 21:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 21:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 21:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 21:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 20:59 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 20:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 20:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 20:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 20:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 20:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 20:53 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 20:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 20:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 20:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 20:51 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 20:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 20:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 20:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 20:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 20:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 20:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 20:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 20:42 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 20:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1061.eqiad.wmnet with OS bullseye
  • 20:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1060.eqiad.wmnet with OS bullseye
  • 20:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1059.eqiad.wmnet with OS bullseye
  • 20:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1058.eqiad.wmnet with OS bullseye
  • 20:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 20:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1056.eqiad.wmnet with OS bullseye
  • 20:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 20:29 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 20:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:11 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash1036.eqiad.wmnet with OS buster
  • 19:30 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2001.codfw.wmnet with OS bullseye
  • 19:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS buster
  • 19:21 mutante: aphlict1001 - :/var/log/aphlict# gzip aphlict.log.1
  • 19:19 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash1037.eqiad.wmnet with OS buster
  • 19:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS buster
  • 19:08 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2027.codfw.wmnet with OS bullseye
  • 18:45 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2027.codfw.wmnet with reason: host reimage
  • 18:42 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2027.codfw.wmnet with reason: host reimage
  • 18:40 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: host reimage
  • 18:37 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: host reimage
  • 18:23 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2027.codfw.wmnet with OS bullseye
  • 18:22 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2027']
  • 18:16 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2027']
  • 18:12 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2027']
  • 18:09 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2001.codfw.wmnet with OS bullseye
  • 18:07 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
  • 18:06 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2027']
  • 18:04 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 18:03 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2029.codfw.wmnet with OS bullseye
  • 18:01 volans: uploaded spicerack_6.0.0 to apt.wikimedia.org bullseye-wikimedia
  • 17:45 effie: disable puppet on all P:mediawiki::nutcracker hosts (killing nutcracker on mw)
  • 17:41 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2029.codfw.wmnet with reason: host reimage
  • 17:38 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2029.codfw.wmnet with reason: host reimage
  • 17:34 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2028.codfw.wmnet with OS bullseye
  • 17:33 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
  • 17:28 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
  • 17:27 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2005.codfw.wmnet
  • 17:25 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6001.drmrs.wmnet
  • 17:22 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 17:22 eevans@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 17:22 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2005.codfw.wmnet
  • 17:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2006.codfw.wmnet
  • 17:19 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2029.codfw.wmnet with OS bullseye
  • 17:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes1011.eqiad.wmnet
  • 17:18 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus6001.drmrs.wmnet
  • 17:18 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2029']
  • 17:16 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2006.codfw.wmnet
  • 17:15 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5001.eqsin.wmnet
  • 17:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2007.codfw.wmnet
  • 17:13 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1011.eqiad.wmnet
  • 17:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
  • 17:12 mutante: https://doc.wikimedia.org - maybe a few seconds of downtime
  • 17:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add aux-k8s-ingress VIP - cgoubert@cumin1001"
  • 17:11 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2028.codfw.wmnet with reason: host reimage
  • 17:11 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2029']
  • 17:11 mutante: doc2001 - rebooting
  • 17:10 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add aux-k8s-ingress VIP - cgoubert@cumin1001"
  • 17:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2004.codfw.wmnet
  • 17:09 mutante: planet1002 - rebooting
  • 17:09 mutante: planet2002 - rebooting
  • 17:09 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2029']
  • 17:08 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus5001.eqsin.wmnet
  • 17:08 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2028.codfw.wmnet with reason: host reimage
  • 17:08 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 17:07 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2007.codfw.wmnet
  • 17:07 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2008.codfw.wmnet
  • 17:06 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
  • 17:06 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2004.codfw.wmnet
  • 17:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2003.codfw.wmnet
  • 17:03 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4001.ulsfo.wmnet
  • 17:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2003.codfw.wmnet
  • 17:01 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2029']
  • 16:59 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2003.codfw.wmnet
  • 16:58 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2003.codfw.wmnet
  • 16:58 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1036.eqiad.wmnet with OS buster
  • 16:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 16:57 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus4001.ulsfo.wmnet
  • 16:56 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
  • 16:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1004.eqiad.wmnet
  • 16:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2002.codfw.wmnet
  • 16:53 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2008.codfw.wmnet
  • 16:52 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 16:52 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2009.codfw.wmnet
  • 16:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1004.eqiad.wmnet
  • 16:50 hashar@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.14 refs T320519 (duration: 07m 06s)
  • 16:49 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
  • 16:48 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
  • 16:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2002.codfw.wmnet
  • 16:48 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2028.codfw.wmnet with OS bullseye
  • 16:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2001.codfw.wmnet
  • 16:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 16:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1003.eqiad.wmnet
  • 16:46 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2009.codfw.wmnet
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 16:43 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2028']
  • 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1003.eqiad.wmnet
  • 16:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host orespoolcounter1003.eqiad.wmnet
  • 16:43 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.14 refs T320519
  • 16:42 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1003.eqiad.wmnet
  • 16:41 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2001.codfw.wmnet
  • 16:41 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
  • 16:40 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dispatch-be1001.eqiad.wmnet
  • 16:36 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2028']
  • 16:36 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host dispatch-be1001.eqiad.wmnet
  • 16:35 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2028']
  • 16:33 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1001.eqiad.wmnet
  • 16:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 16:27 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2028']
  • 16:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:25 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog1001.eqiad.wmnet
  • 16:24 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2002.codfw.wmnet
  • 16:22 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:19 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon2002.codfw.wmnet
  • 16:17 eevans@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 16:17 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 16:15 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1002.eqiad.wmnet
  • 16:10 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1002.eqiad.wmnet
  • 16:06 eevans@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 16:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS buster
  • 16:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:51 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 15:50 eevans@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 15:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:45 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:44 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
  • 15:34 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
  • 15:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1009.eqiad.wmnet
  • 15:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2003.codfw.wmnet
  • 15:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2003.codfw.wmnet
  • 15:24 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1009.eqiad.wmnet
  • 15:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2002.codfw.wmnet
  • 15:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1008.eqiad.wmnet
  • 15:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2002.codfw.wmnet
  • 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2001.codfw.wmnet
  • 15:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1008.eqiad.wmnet
  • 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1007.eqiad.wmnet
  • 15:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2001.codfw.wmnet
  • 15:08 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1007.eqiad.wmnet
  • 15:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1006.eqiad.wmnet
  • 15:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1005.eqiad.wmnet
  • 15:00 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1006.eqiad.wmnet
  • 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host xhgui1001.eqiad.wmnet
  • 14:57 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1004.eqiad.wmnet
  • 14:55 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1005.eqiad.wmnet
  • 14:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host xhgui1001.eqiad.wmnet
  • 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host xhgui2001.codfw.wmnet
  • 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host xhgui2001.codfw.wmnet
  • 14:30 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb1003.eqiad.wmnet
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P42707 and previous config saved to /var/cache/conftool/dbconfig/20221214-142958-root.json
  • 14:20 ladsgroup@deploy1002: Finished scap: Backport for Parsoid: Default parsoid version to "0.0.0" for unsupported models (T325137) (duration: 08m 12s)
  • 14:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After testing unix_socket plugin', diff saved to https://phabricator.wikimedia.org/P42706 and previous config saved to /var/cache/conftool/dbconfig/20221214-141644-root.json
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P42705 and previous config saved to /var/cache/conftool/dbconfig/20221214-141453-root.json
  • 14:13 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Parsoid: Default parsoid version to "0.0.0" for unsupported models (T325137) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:13 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
  • 14:11 ladsgroup@deploy1002: Started scap: Backport for Parsoid: Default parsoid version to "0.0.0" for unsupported models (T325137)
  • 14:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wcqs1003.eqiad.wmnet
  • 14:11 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for wcqs1003.eqiad.wmnet
  • 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After testing unix_socket plugin', diff saved to https://phabricator.wikimedia.org/P42704 and previous config saved to /var/cache/conftool/dbconfig/20221214-140139-root.json
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P42703 and previous config saved to /var/cache/conftool/dbconfig/20221214-135948-root.json
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After testing unix_socket plugin', diff saved to https://phabricator.wikimedia.org/P42702 and previous config saved to /var/cache/conftool/dbconfig/20221214-134634-root.json
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P42701 and previous config saved to /var/cache/conftool/dbconfig/20221214-134443-root.json
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After testing unix_socket plugin', diff saved to https://phabricator.wikimedia.org/P42700 and previous config saved to /var/cache/conftool/dbconfig/20221214-133129-root.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P42699 and previous config saved to /var/cache/conftool/dbconfig/20221214-132938-root.json
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After testing unix_socket plugin', diff saved to https://phabricator.wikimedia.org/P42697 and previous config saved to /var/cache/conftool/dbconfig/20221214-131624-root.json
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P42696 and previous config saved to /var/cache/conftool/dbconfig/20221214-131433-root.json
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 5%: After testing unix_socket plugin', diff saved to https://phabricator.wikimedia.org/P42695 and previous config saved to /var/cache/conftool/dbconfig/20221214-130119-root.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 T325154', diff saved to https://phabricator.wikimedia.org/P42694 and previous config saved to /var/cache/conftool/dbconfig/20221214-125950-marostegui.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P42693 and previous config saved to /var/cache/conftool/dbconfig/20221214-125928-root.json
  • 12:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 12:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb1003.eqiad.wmnet
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 T325154', diff saved to https://phabricator.wikimedia.org/P42692 and previous config saved to /var/cache/conftool/dbconfig/20221214-125544-marostegui.json
  • 12:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 12:44 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 12:42 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 12:38 ladsgroup@deploy1002: Finished scap: Backport for Externallinks: Set Persian Wikiquote to WRITE BOTH (T321662) (duration: 29m 18s)
  • 12:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 12:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetdb2002.codfw.wmnet
  • 12:29 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1001.eqiad.wmnet
  • 12:28 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Externallinks: Set Persian Wikiquote to WRITE BOTH (T321662) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 12:27 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host pki2001.codfw.wmnet
  • 12:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetdb1002.eqiad.wmnet
  • 12:20 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetdb2002.codfw.wmnet
  • 12:19 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki2001.codfw.wmnet
  • 12:18 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1001.eqiad.wmnet
  • 12:14 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetdb1002.eqiad.wmnet
  • 12:11 jbond: disable puppet fleet wide to preform server reboots
  • 12:09 ladsgroup@deploy1002: Started scap: Backport for Externallinks: Set Persian Wikiquote to WRITE BOTH (T321662)
  • 11:59 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
  • 11:58 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=yes; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
  • 11:55 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
  • 11:49 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=yes; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 11:46 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
  • 11:42 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
  • 11:42 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
  • 11:38 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
  • 11:34 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
  • 11:26 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
  • 11:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1003.eqiad.wmnet
  • 11:19 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ml-cache2002.codfw.wmnet
  • 11:18 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1003.eqiad.wmnet
  • 11:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1002.eqiad.wmnet
  • 11:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1002.eqiad.wmnet
  • 11:09 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
  • 11:09 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1003.wikimedia.org
  • 11:03 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
  • 11:00 moritzm: installing dpkg bugfix updates from Bullseye point release
  • 11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1003.wikimedia.org
  • 11:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2003.wikimedia.org
  • 10:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
  • 10:54 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
  • 10:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2003.wikimedia.org
  • 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3001.wikimedia.org
  • 10:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
  • 10:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3001.wikimedia.org
  • 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4001.wikimedia.org
  • 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4001.wikimedia.org
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5001.wikimedia.org
  • 10:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
  • 10:35 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
  • 10:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5001.wikimedia.org
  • 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6001.wikimedia.org
  • 10:31 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=no; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 10:31 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
  • 10:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
  • 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
  • 10:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6001.wikimedia.org
  • 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
  • 10:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
  • 10:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging-etcd2003.codfw.wmnet
  • 10:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 10:24 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=yes; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2002.codfw.wmnet
  • 10:19 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging-etcd2003.codfw.wmnet
  • 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2002.codfw.wmnet
  • 10:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging-etcd2002.codfw.wmnet
  • 10:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging-etcd2002.codfw.wmnet
  • 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3005.wikimedia.org
  • 10:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3005.wikimedia.org
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org
  • 09:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging-etcd2001.codfw.wmnet
  • 09:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host serpens.wikimedia.org
  • 09:55 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=no; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 09:55 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging-etcd2001.codfw.wmnet
  • 09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1003.eqiad.wmnet
  • 09:53 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=yes; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 09:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1003.eqiad.wmnet
  • 09:46 ladsgroup@deploy1002: Synchronized php-1.40.0-wmf.14/includes/search/SearchResultThumbnailProvider.php: Backport: search: Avoid setting height in search thumbnails (T322621) (duration: 08m 07s)
  • 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4003.wikimedia.org
  • 09:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1002.eqiad.wmnet
  • 09:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp1001.wikimedia.org
  • 09:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4003.wikimedia.org
  • 09:39 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1002.eqiad.wmnet
  • 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5002.wikimedia.org
  • 09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-corp1001.wikimedia.org
  • 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp2001.wikimedia.org
  • 09:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1001.eqiad.wmnet
  • 09:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-corp2001.wikimedia.org
  • 09:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5002.wikimedia.org
  • 09:28 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:27 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 09:27 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:27 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6001.wikimedia.org
  • 09:26 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
  • 09:25 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6001.wikimedia.org
  • 09:20 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:41 hashar: Restarted Gerrit for a plugin update
  • 08:39 hashar@deploy1002: Finished deploy [gerrit/gerrit@c0b0a70]: Add support for PipelineBot to the Checks API plugin - T214068 (duration: 00m 09s)
  • 08:39 hashar@deploy1002: Started deploy [gerrit/gerrit@c0b0a70]: Add support for PipelineBot to the Checks API plugin - T214068
  • 08:39 aqu@deploy1002: Finished deploy [airflow-dags/analytics@353573b]: HDFS usage dataset pipeline deployment without superuser [airflow-dags@353573b] (duration: 00m 13s)
  • 08:39 aqu@deploy1002: Started deploy [airflow-dags/analytics@353573b]: HDFS usage dataset pipeline deployment without superuser [airflow-dags@353573b]
  • 08:37 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@353573b]: HDFS usage dataset pipeline deployment without superuser TEST [airflow-dags@353573b] (duration: 00m 10s)
  • 08:37 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@353573b]: HDFS usage dataset pipeline deployment without superuser TEST [airflow-dags@353573b]
  • 08:26 hashar@deploy1002: Finished deploy [gerrit/gerrit@c0b0a70]: Add support for PipelineBot to the Checks API plugin - T214068 (duration: 00m 11s)
  • 08:25 hashar@deploy1002: Started deploy [gerrit/gerrit@c0b0a70]: Add support for PipelineBot to the Checks API plugin - T214068
  • 07:28 phedenskog@deploy1002: Finished deploy [performance/navtiming@7ba179f]: (no justification provided) (duration: 00m 08s)
  • 07:28 phedenskog@deploy1002: Started deploy [performance/navtiming@7ba179f]: (no justification provided)
  • 01:22 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2037.codfw.wmnet with OS bullseye
  • 01:18 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2036.codfw.wmnet with OS bullseye
  • 01:06 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage
  • 01:04 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
  • 01:02 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage
  • 01:01 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
  • 00:46 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2037.codfw.wmnet with OS bullseye
  • 00:45 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS bullseye

2022-12-13

  • 22:57 jdrewniak@deploy1002: Finished scap: Backport for Account for syntax errors in closest selector (T325113) (duration: 08m 29s)
  • 22:50 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for Account for syntax errors in closest selector (T325113) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 22:48 jdrewniak@deploy1002: Started scap: Backport for Account for syntax errors in closest selector (T325113)
  • 22:42 jdrewniak@deploy1002: Finished scap: Backport for Account for syntax errors in closest selector (T325113) (duration: 09m 20s)
  • 22:40 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "parse1002: failed -> active - dzahn@cumin2002"
  • 22:40 mutante: netbox: set parse1002 status: failed -> active in web UI; ran cookbook 'sre.puppet.sync-netbox-hiera' to get data in sync - T324949
  • 22:38 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "parse1002: failed -> active - dzahn@cumin2002"
  • 22:35 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:34 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for Account for syntax errors in closest selector (T325113) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:34 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 22:33 jdrewniak@deploy1002: Started scap: Backport for Account for syntax errors in closest selector (T325113)
  • 22:23 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2035.codfw.wmnet with OS bullseye
  • 22:14 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2033.codfw.wmnet with OS bullseye
  • 22:12 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2034.codfw.wmnet with OS bullseye
  • 22:10 aqu@deploy1002: Finished deploy [analytics/refinery@66736e1] (hadoop-test): HDFS FSImage conversion to XML script TEST [analytics/refinery@66736e1] (duration: 01m 11s)
  • 22:09 aqu@deploy1002: Started deploy [analytics/refinery@66736e1] (hadoop-test): HDFS FSImage conversion to XML script TEST [analytics/refinery@66736e1]
  • 22:08 aqu@deploy1002: Finished deploy [analytics/refinery@66736e1] (thin): HDFS FSImage conversion to XML script THIN [analytics/refinery@66736e1] (duration: 00m 07s)
  • 22:08 aqu@deploy1002: Started deploy [analytics/refinery@66736e1] (thin): HDFS FSImage conversion to XML script THIN [analytics/refinery@66736e1]
  • 22:06 aqu@deploy1002: Finished deploy [analytics/refinery@66736e1]: HDFS FSImage conversion to XML script [analytics/refinery@66736e1] (duration: 26m 32s)
  • 21:53 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2033.codfw.wmnet with reason: host reimage
  • 21:52 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: extension distributor updates (duration: 06m 50s)
  • 21:50 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2034.codfw.wmnet with reason: host reimage
  • 21:48 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2033.codfw.wmnet with reason: host reimage
  • 21:47 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2034.codfw.wmnet with reason: host reimage
  • 21:43 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2035.codfw.wmnet with reason: host reimage
  • 21:41 kindrobot: Finishing UTC late backport window
  • 21:40 kindrobot@deploy1002: Finished scap: Backport for Start writing to cul_actor everywhere (T233004) (duration: 18m 47s)
  • 21:40 aqu@deploy1002: Started deploy [analytics/refinery@66736e1]: HDFS FSImage conversion to XML script [analytics/refinery@66736e1]
  • 21:39 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2035.codfw.wmnet with reason: host reimage
  • 21:35 aqu: Deploying analytics/refinery (HDFS FSImage conversion to XML script)
  • 21:32 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2033.codfw.wmnet with OS bullseye
  • 21:30 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2034.codfw.wmnet with OS bullseye
  • 21:23 kindrobot@deploy1002: kindrobot and zabe: Backport for Start writing to cul_actor everywhere (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:23 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2035.codfw.wmnet with OS bullseye
  • 21:21 kindrobot@deploy1002: Started scap: Backport for Start writing to cul_actor everywhere (T233004)
  • 21:17 samtar@deploy1002: Finished scap: Backport for Child elements also trigger previews (T325007) (duration: 09m 38s)
  • 21:09 samtar@deploy1002: samtar and jdlrobson: Backport for Child elements also trigger previews (T325007) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:08 samtar@deploy1002: Started scap: Backport for Child elements also trigger previews (T325007)
  • 20:20 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 20:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 20:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on relforge[1003-1004].eqiad.wmnet with reason: Rolling restart
  • 20:16 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on relforge[1003-1004].eqiad.wmnet with reason: Rolling restart
  • 20:10 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 20:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 19:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts contint1001.wikimedia.org
  • 19:18 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:18 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
  • 19:00 ladsgroup@deploy1002: Finished scap: Backport for ParserCache: fix metrics keys, Don't write to parser cache from maintenance script, Fix brittle test (duration: 07m 53s)
  • 18:59 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
  • 18:57 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:54 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for ParserCache: fix metrics keys, Don't write to parser cache from maintenance script, Fix brittle test synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 18:52 ladsgroup@deploy1002: Started scap: Backport for ParserCache: fix metrics keys, Don't write to parser cache from maintenance script, Fix brittle test
  • 18:51 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts contint1001.wikimedia.org
  • 18:51 mutante: decom'ing contint1001 (formerly prod CI) server, replaced by contint1002 T324698
  • 18:47 ladsgroup@deploy1002: Finished scap: Backport for Don't write to parser cache from maintenance script, ParserCache: fix metrics keys (duration: 09m 25s)
  • 18:40 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Don't write to parser cache from maintenance script, ParserCache: fix metrics keys synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 18:38 ladsgroup@deploy1002: Started scap: Backport for Don't write to parser cache from maintenance script, ParserCache: fix metrics keys
  • 18:21 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 17:27 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 17:26 btullis: edited automation/proxies/ttyS1-115200.conf to remove `include "/etc/dhcp/automation/ttyS1-115200/kafka-stretch1001.conf";`and restarted isc-dhc-server
  • 17:22 btullis: btullis@install1003:/etc/dhcp/automation/ttyS1-115200$ sudo systemctl restart isc-dhcp-server.service T314156
  • 16:35 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 16:35 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 16:17 hnowlan@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 16:15 hnowlan@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 16:12 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch2002.codfw.wmnet with OS bullseye
  • 16:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[01234].eqiad.wmnet
  • 16:03 moritzm: installing ruby-tzinfo security updates
  • 16:01 hnowlan@puppetmaster1001: conftool action : set/weight=4:pooled=yes; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
  • 15:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 20804
  • 15:54 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=yes; selector: service=thumbor,name=kubernetes101[234].eqiad.wmnet
  • 15:51 hnowlan@puppetmaster1001: conftool action : set/weight=2:pooled=yes; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 15:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 20804
  • 15:27 derick@deploy1002: Finished scap: Backport for RangeChronologicalPager: Restore the compatibility with derived classes (T228431 T325034) (duration: 08m 59s)
  • 15:20 derick@deploy1002: derick and func: Backport for RangeChronologicalPager: Restore the compatibility with derived classes (T228431 T325034) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 15:19 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 15:19 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 15:18 derick@deploy1002: Started scap: Backport for RangeChronologicalPager: Restore the compatibility with derived classes (T228431 T325034)
  • 15:02 derick@deploy1002: Finished scap: Backport for Log linter data while parsing full pages (T246403) (duration: 10m 28s)
  • 14:53 derick@deploy1002: derick and arlolra: Backport for Log linter data while parsing full pages (T246403) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:52 derick@deploy1002: Started scap: Backport for Log linter data while parsing full pages (T246403)
  • 14:50 derick@deploy1002: backport aborted: (duration: 07m 09s)
  • 14:41 derick@deploy1002: Finished scap: Backport for hewiki: set VisualEditor to direct mode (T320529) (duration: 14m 34s)
  • 14:32 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch2002.codfw.wmnet with OS bullseye
  • 14:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on wcqs1003.eqiad.wmnet with reason: hardware diagnostics
  • 14:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on wcqs1003.eqiad.wmnet with reason: hardware diagnostics
  • 14:28 derick@deploy1002: derick and daniel: Backport for hewiki: set VisualEditor to direct mode (T320529) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:26 derick@deploy1002: Started scap: Backport for hewiki: set VisualEditor to direct mode (T320529)
  • 14:22 moritzm: added smunene to pwstore
  • 14:17 derick@deploy1002: Backport cancelled.
  • 14:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-stretch2001.codfw.wmnet with reason: Accessing BIOS on kafka-stretch2001
  • 14:03 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-stretch2001.codfw.wmnet with reason: Accessing BIOS on kafka-stretch2001
  • 13:59 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 13:59 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 13:57 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 13:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 13:57 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 13:49 jayme@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 13:49 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 13:48 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 13:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-stretch2002.codfw.wmnet with reason: Accessing BIOS on kafka-stretch2002
  • 13:28 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-stretch2002.codfw.wmnet with reason: Accessing BIOS on kafka-stretch2002
  • 12:31 claime: sessionstore outage being monitored
  • 12:23 claime: sessionstore outage, login functions severely impacted
  • 12:07 hashar: Gerrit now has CI job results represented in the Checks tab which should be a little nicer. The old HTML result table is gone and replaced by little bubbles representing the state of the builds for the latest patchset. Ref: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/3ULF5NPVC4MSVABZBSXAMDODLZUKFXHS/
  • 12:00 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: restart
  • 12:00 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: restart
  • 11:57 hashar: Restarted Gerrit on gerrit1001
  • 11:55 hashar@deploy1002: Finished deploy [gerrit/gerrit@9ef1a16]: Replace CI result table by Checks API plugin - T214068 (duration: 00m 09s)
  • 11:55 hashar@deploy1002: Started deploy [gerrit/gerrit@9ef1a16]: Replace CI result table by Checks API plugin - T214068
  • 11:54 hashar: Restarted Gerrit on gerrit2002 (replica)
  • 11:52 hashar@deploy1002: Finished deploy [gerrit/gerrit@9ef1a16]: Replace CI result table by Checks API plugin - T214068 (duration: 00m 11s)
  • 11:52 hashar@deploy1002: Started deploy [gerrit/gerrit@9ef1a16]: Replace CI result table by Checks API plugin - T214068
  • 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: Various tests which may cause temporary breakage on idp-test.w.o
  • 11:42 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: Various tests which may cause temporary breakage on idp-test.w.o
  • 11:22 moritzm: installing paramiko security updates#
  • 11:17 claime: Puppet re-enabled on cp::text nodes - T290536
  • 10:58 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@27ac6d3] (codfw): Increase codfw mirrored traffic to 100% (duration: 01m 40s)
  • 10:57 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@27ac6d3] (codfw): Increase codfw mirrored traffic to 100%
  • 10:54 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@e988b5e]: Relax sla for the weekly es transfer and subgraph_and_query_metrics (duration: 02m 25s)
  • 10:51 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@e988b5e]: Relax sla for the weekly es transfer and subgraph_and_query_metrics
  • 10:36 vgutierrez: clean up stale prometheus target files in prometheus5001
  • 10:22 claime: puppet run on cp4037 - T290536
  • 10:21 claime: puppet disabled on cp hosts for T290536
  • 10:01 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:00 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:54 moritzm: installing libhttp-daemon-perl security updates
  • 09:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.14 refs T320519
  • 09:07 claime: Repooled parse1002.eqiad.wmnet in parsoid service - T324949
  • 09:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
  • 09:05 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
  • 09:01 cgoubert@cumin1001: conftool action : set/pooled=no; selector: name=parse1002.eqiad.wmnet
  • 08:58 moritzm: installing libpgjava security updates
  • 08:55 moritzm: installing xen security updates
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42683 and previous config saved to /var/cache/conftool/dbconfig/20221213-083019-root.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42682 and previous config saved to /var/cache/conftool/dbconfig/20221213-081514-root.json
  • 08:13 kartik@deploy1002: Finished scap: Backport for Enable Section Translation in Chuvash Wikipedia (T319176) (duration: 10m 01s)
  • 08:05 kartik@deploy1002: kartik and kartik: Backport for Enable Section Translation in Chuvash Wikipedia (T319176) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 08:03 kartik@deploy1002: Started scap: Backport for Enable Section Translation in Chuvash Wikipedia (T319176)
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42681 and previous config saved to /var/cache/conftool/dbconfig/20221213-080009-root.json
  • 07:52 ladsgroup@deploy1002: Finished scap: Backport for Reduce PC writes from parsoid API to 1% (duration: 09m 35s)
  • 07:45 ladsgroup@deploy1002: ladsgroup and daniel: Backport for Reduce PC writes from parsoid API to 1% synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42680 and previous config saved to /var/cache/conftool/dbconfig/20221213-074504-root.json
  • 07:43 ladsgroup@deploy1002: Started scap: Backport for Reduce PC writes from parsoid API to 1%
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42679 and previous config saved to /var/cache/conftool/dbconfig/20221213-072959-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42678 and previous config saved to /var/cache/conftool/dbconfig/20221213-071454-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42677 and previous config saved to /var/cache/conftool/dbconfig/20221213-065949-root.json
  • 06:59 kart_: Updated cxserver to 2022-12-06-121330-production (T321781, T324534)
  • 06:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:54 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:54 marostegui: Reboot db1206 to test RAID controller
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P42676 and previous config saved to /var/cache/conftool/dbconfig/20221213-065402-marostegui.json
  • 06:53 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:51 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:50 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:23 dwisehaupt: payments now using the new digicert certificate in eqiad
  • 04:56 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.12 (duration: 02m 15s)
  • 04:54 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.14 refs T320519 (duration: 52m 11s)
  • 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.14 refs T320519

2022-12-12

  • 23:35 tzatziki: removing 3 files for legal compliance
  • 23:12 tzatziki: removing 2 files for legal compliance
  • 22:04 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: search_codfw elasticsearch and plugin upgrade - ryankemper@cumin2002
  • 21:57 tzatziki: removing 1 file for legal compliance
  • 21:13 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: search_codfw elasticsearch and plugin upgrade - ryankemper@cumin2002
  • 21:08 cstone: civicrm upgraded from 3ae68ab4 to 09925be0
  • 20:31 ryankemper: [WDQS] `ryankemper@cumin2002:~$ sudo -E cumin -b 4 wdqs2* 'systemctl restart wdqs-blazegraph'`
  • 17:59 ladsgroup@deploy1002: Finished scap: Backport for Disable writing parsoid html to PC on commons and wikidata. (duration: 07m 45s)
  • 17:53 ladsgroup@deploy1002: ladsgroup and daniel: Backport for Disable writing parsoid html to PC on commons and wikidata. synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 17:52 ladsgroup@deploy1002: Started scap: Backport for Disable writing parsoid html to PC on commons and wikidata.
  • 16:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1004.eqiad.wmnet with OS bullseye
  • 16:16 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 16:15 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 16:12 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 16:08 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 16:04 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts netmon2001.wikimedia.org
  • 16:04 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:02 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 16:02 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 16:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 16:01 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:59 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 15:57 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts netmon2001.wikimedia.org
  • 15:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:46 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 15:43 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:36 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 15:34 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:33 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:24 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:09 krinkle@deploy1002: Finished scap: Backport for Add Largest Contentful Paint (LCP) (T319329) (duration: 08m 51s)
  • 15:06 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:02 krinkle@deploy1002: krinkle and krinkle: Backport for Add Largest Contentful Paint (LCP) (T319329) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 15:00 krinkle@deploy1002: Started scap: Backport for Add Largest Contentful Paint (LCP) (T319329)
  • 14:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 21804
  • 14:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 21804
  • 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 21320
  • 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 21320
  • 14:27 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:27 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Wikidata: don't show Vector search thumbnails (T316093) (duration: 09m 47s)
  • 14:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 209275
  • 14:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 209275
  • 14:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38757
  • 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38757
  • 14:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 20804
  • 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20804
  • 14:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21804
  • 14:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21804
  • 14:19 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and migr: Backport for Wikidata: don't show Vector search thumbnails (T316093) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62887
  • 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62887
  • 14:18 moritzm: rebalance Ganeti cluster in eqsin after adding ganeti5005-5007 T324610
  • 14:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11686
  • 14:17 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Wikidata: don't show Vector search thumbnails (T316093)
  • 14:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11686
  • 14:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Add event stream config for ios.talk_page_interaction (T324340) (duration: 08m 42s)
  • 14:14 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@a2ebe75] (codfw): Increase codfw mirrored traffic to 75% (duration: 01m 41s)
  • 14:12 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@a2ebe75] (codfw): Increase codfw mirrored traffic to 75%
  • 14:07 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and tsev: Backport for Add event stream config for ios.talk_page_interaction (T324340) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:05 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Add event stream config for ios.talk_page_interaction (T324340)
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti5003.eqsin.wmnet with reason: Remove for eventual decom
  • 13:48 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti5003.eqsin.wmnet with reason: Remove for eventual decom
  • 13:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1004.eqiad.wmnet with reason: host reimage
  • 13:45 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1004.eqiad.wmnet with reason: host reimage
  • 13:31 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be1004.eqiad.wmnet with OS bullseye
  • 12:49 moritzm: installing jackson-databind security updates
  • 12:48 moritzm: installing Django security updates
  • 12:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1003.eqiad.wmnet with OS bullseye
  • 12:19 moritzm: installing jqueryui security updates
  • 12:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1003.eqiad.wmnet with reason: host reimage
  • 12:13 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1003.eqiad.wmnet with reason: host reimage
  • 12:04 moritzm: installing twisted security updates
  • 11:59 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be1003.eqiad.wmnet with OS bullseye
  • 11:49 moritzm: drain ganeti5003 for eventual decom T322048
  • 11:43 moritzm: failover Ganeti master in eqsin to ganeti5004 (5003 will be decommissioned) T322048
  • 11:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: Bad CPU
  • 11:40 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: Bad CPU
  • 11:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5007.eqsin.wmnet to cluster eqsin and group 1
  • 11:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5007.eqsin.wmnet to cluster eqsin and group 1
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
  • 10:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1002.eqiad.wmnet with OS bullseye
  • 10:52 claime: Switched parse1002 to parse1003 in parsoid-canary - T324949
  • 10:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 10:49 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti5007.eqsin.wmnet
  • 10:47 ladsgroup@deploy1002: Finished scap: Backport for Bump portals to HEAD (duration: 09m 48s)
  • 10:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1002.eqiad.wmnet with reason: host reimage
  • 10:39 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Bump portals to HEAD synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 10:39 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1002.eqiad.wmnet with reason: host reimage
  • 10:37 ladsgroup@deploy1002: Started scap: Backport for Bump portals to HEAD
  • 10:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
  • 10:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 10:24 slyngs: update add-ldap-group tool, https://gerrit.wikimedia.org/r/c/operations/puppet/+/860568
  • 10:18 slyngs: Update modify-mfa tools, https://gerrit.wikimedia.org/r/c/operations/puppet/+/861385
  • 10:17 claime: depooled parse1002.eqiad.wmnet for hw failure - T324949
  • 10:02 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@bdc19a3] (codfw): Increase codfw mirrored traffic to 50% (duration: 01m 42s)
  • 10:00 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@bdc19a3] (codfw): Increase codfw mirrored traffic to 50%
  • 09:58 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=parse1002.eqiad.wmnet
  • 09:06 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6af0d2d] (codfw): Increase codfw mirrored traffic to 25% (duration: 02m 15s)
  • 09:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6af0d2d] (codfw): Increase codfw mirrored traffic to 25%
  • 08:55 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be1002.eqiad.wmnet with OS bullseye
  • 08:25 matthiasmullie: UTC morning backports done
  • 08:24 mlitn@deploy1002: Finished scap: Backport for Add mediawiki.searchpreview schema (T321069) (duration: 18m 21s)
  • 08:16 mlitn@deploy1002: mlitn and mlitn: Backport for Add mediawiki.searchpreview schema (T321069) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:08 XioNoX: remove bast5001 from management routers ACLs (replaced by bast5002)
  • 08:06 mlitn@deploy1002: Started scap: Backport for Add mediawiki.searchpreview schema (T321069)
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42672 and previous config saved to /var/cache/conftool/dbconfig/20221212-074700-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42671 and previous config saved to /var/cache/conftool/dbconfig/20221212-073155-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42670 and previous config saved to /var/cache/conftool/dbconfig/20221212-071650-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42669 and previous config saved to /var/cache/conftool/dbconfig/20221212-070145-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42668 and previous config saved to /var/cache/conftool/dbconfig/20221212-064640-root.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42667 and previous config saved to /var/cache/conftool/dbconfig/20221212-063135-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42666 and previous config saved to /var/cache/conftool/dbconfig/20221212-061630-root.json

2022-12-10

  • 03:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: search_codfw elasticsearch and plugin upgrade - ryankemper@cumin2002
  • 02:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 49 hosts with reason: Plugin upgrade for T322776
  • 02:00 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 49 hosts with reason: Plugin upgrade for T322776
  • 01:59 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: search_codfw elasticsearch and plugin upgrade - ryankemper@cumin2002
  • 01:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776

2022-12-09

  • 23:59 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:39 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 22:27 cstone: payments-wiki upgraded from 35555b67 to 77297c12
  • 19:50 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: Moar Disk!
  • 19:49 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: Moar Disk!
  • 19:34 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: Moar Disk!
  • 19:34 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: Moar Disk!
  • 18:22 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:16 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: Moar Disk 2!
  • 18:16 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: Moar Disk 2!
  • 18:04 jnuche@deploy1002: Installation of scap version "4.30.2" completed for 562 hosts
  • 18:04 jnuche@deploy1002: Installing scap version "4.30.2" for 562 hosts
  • 17:59 jnuche@deploy1002: Installing scap version "4.30.2" for 563 hosts
  • 17:44 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: Moar Disk
  • 17:44 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: Moar Disk
  • 17:03 claime: eventgate-analytics bumped to 30 replicas to absorb increased load - T320518
  • 17:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-stretch1001 - cmjohnson@cumin1001"
  • 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-stretch1002.eqiad.wmnet with OS bullseye
  • 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
  • 16:58 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-stretch1001 - cmjohnson@cumin1001"
  • 16:57 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
  • 16:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:43 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-stretch1002.eqiad.wmnet with reason: host reimage
  • 16:42 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 16:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-stretch1002.eqiad.wmnet with reason: host reimage
  • 16:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-stretch1001 - cmjohnson@cumin1001"
  • 16:38 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-stretch1001 - cmjohnson@cumin1001"
  • 16:36 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:35 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 16:34 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:33 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 16:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1002.eqiad.wmnet with OS bullseye
  • 16:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 16:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:07 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:47 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P42662 and previous config saved to /var/cache/conftool/dbconfig/20221209-134806-marostegui.json
  • 13:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1001.eqiad.wmnet with OS bullseye
  • 13:13 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.13 refs T320518
  • 13:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1001.eqiad.wmnet with reason: host reimage
  • 13:08 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1001.eqiad.wmnet with reason: host reimage
  • 12:36 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be1001.eqiad.wmnet with OS bullseye
  • 12:14 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6b70e03] (codfw): Reduce mirrored traffic to 5% (duration: 01m 39s)
  • 12:12 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6b70e03] (codfw): Reduce mirrored traffic to 5%
  • 11:58 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 11:58 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
  • 11:41 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc] (codfw): Enable geopoints on production (duration: 01m 00s)
  • 11:40 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc] (codfw): Enable geopoints on production
  • 11:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2004.codfw.wmnet with OS bullseye
  • 11:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2004.codfw.wmnet with reason: host reimage
  • 11:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2004.codfw.wmnet with reason: host reimage
  • 11:10 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:10 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Decomissioning netmon2001 - cgoubert@cumin1001"
  • 11:09 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@17b9319] (codfw): codfw: Enable mirroring for 25% of the traffic (duration: 05m 08s)
  • 11:06 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Decomissioning netmon2001 - cgoubert@cumin1001"
  • 11:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@17b9319] (codfw): codfw: Enable mirroring for 25% of the traffic
  • 11:02 ladsgroup@deploy1002: Finished scap: Backport for Followup to 5cb38845: Don't drop revid info (T324801) (duration: 12m 59s)
  • 11:01 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 11:00 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2004.codfw.wmnet with OS bullseye
  • 10:51 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Followup to 5cb38845: Don't drop revid info (T324801) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 10:49 ladsgroup@deploy1002: Started scap: Backport for Followup to 5cb38845: Don't drop revid info (T324801)
  • 10:36 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5006.eqsin.wmnet to cluster eqsin and group 1
  • 10:34 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5006.eqsin.wmnet to cluster eqsin and group 1
  • 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
  • 10:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
  • 10:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2003.codfw.wmnet with OS bullseye
  • 09:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2003.codfw.wmnet with reason: host reimage
  • 09:51 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2003.codfw.wmnet with reason: host reimage
  • 09:34 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2003.codfw.wmnet with OS bullseye
  • 08:39 marostegui: dbmaint schema change on s8@eqiad T324797
  • 08:39 marostegui: dbmaint schema change on s7@eqiad T324797
  • 08:38 marostegui: dbmaint schema change on s6@eqiad T324797
  • 08:38 marostegui: dbmaint schema change on s5@eqiad T324797
  • 08:38 marostegui: dbmaint schema change on s4@eqiad T324797
  • 08:38 marostegui: dbmaint schema change on s2@eqiad T324797
  • 08:38 marostegui: dbmaint schema change on s1@eqiad T324797
  • 08:35 marostegui: dbmaint schema change on s3@eqiad T324797
  • 08:02 marostegui: dbmaint schema change on s3 T324797
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42661 and previous config saved to /var/cache/conftool/dbconfig/20221209-075057-root.json
  • 07:36 marostegui: dbmaint schema change on s5 T324797
  • 07:36 marostegui: dbmaint schema change on s1 T324797
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42660 and previous config saved to /var/cache/conftool/dbconfig/20221209-073552-root.json
  • 07:29 marostegui: dbmaint schema change on s6 T324797
  • 07:29 marostegui: dbmaint schema change on s8 T324797
  • 07:29 marostegui: dbmaint schema change on s7 T324797
  • 07:29 marostegui: dbmaint schema change on s4 T324797
  • 07:29 marostegui: dbmaint schema change on s2 T324797
  • 07:28 marostegui: Deploy schema change on s2 T324797
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42659 and previous config saved to /var/cache/conftool/dbconfig/20221209-072047-root.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42658 and previous config saved to /var/cache/conftool/dbconfig/20221209-070542-root.json
  • 07:00 marostegui: Deploy schema change on s4 T324797
  • 06:58 marostegui: Deploy schema change on s7 T324797
  • 06:57 marostegui: Deploy schema change on s8 T324797
  • 06:55 marostegui: Deploy schema change on s6 T324797
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42657 and previous config saved to /var/cache/conftool/dbconfig/20221209-065037-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42656 and previous config saved to /var/cache/conftool/dbconfig/20221209-063532-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42655 and previous config saved to /var/cache/conftool/dbconfig/20221209-062027-root.json
  • 05:28 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 05:13 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 05:10 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 05:03 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 04:09 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 03:52 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 50 hosts with reason: Rolling restart in progress
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 50 hosts with reason: Rolling restart in progress
  • 03:51 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: search_eqiad elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 02:39 cwhite@deploy1002: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.40.0-wmf.13"
  • 02:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2003.codfw.wmnet with OS buster
  • 02:27 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 02:23 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 02:11 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
  • 02:08 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
  • 01:49 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2003.codfw.wmnet with OS buster
  • 01:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2002.codfw.wmnet with OS buster
  • 01:48 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 01:47 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 01:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
  • 01:30 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
  • 01:11 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS buster
  • 01:07 eevans@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2003
  • 01:06 eevans@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2003
  • 01:06 eevans@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2002
  • 01:05 eevans@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2002
  • 01:05 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:05 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev200x hosts to cassandra-dev200x - eevans@cumin1001"
  • 01:04 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev200x hosts to cassandra-dev200x - eevans@cumin1001"
  • 01:01 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 00:47 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2003.codfw.wmnet
  • 00:46 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:46 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 00:45 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 00:43 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 00:39 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev2003.codfw.wmnet
  • 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2002.codfw.wmnet
  • 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 00:37 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 00:34 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 00:30 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev2002.codfw.wmnet

2022-12-08

  • 23:32 ladsgroup@deploy1002: Finished scap: Backport for Make wikibase.client.init module target mobile (T235712) (duration: 08m 42s)
  • 23:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Make wikibase.client.init module target mobile (T235712) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 23:23 ladsgroup@deploy1002: Started scap: Backport for Make wikibase.client.init module target mobile (T235712)
  • 23:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS buster
  • 23:14 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 23:13 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 23:11 ladsgroup@deploy1002: Finished scap: Backport for File pages: Add mobile targets to modules that are silently being removed (T324723 T320518) (duration: 09m 12s)
  • 23:04 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for File pages: Add mobile targets to modules that are silently being removed (T324723 T320518) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 23:02 ladsgroup@deploy1002: Started scap: Backport for File pages: Add mobile targets to modules that are silently being removed (T324723 T320518)
  • 22:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 22:55 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 22:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS buster
  • 22:29 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 22:28 TheresNoTime: close UTC late backport and config training (+28m)
  • 22:27 samtar@deploy1002: Finished scap: Backport for Start mobile DiscussionTools A/B test (T321961) (duration: 09m 57s)
  • 22:19 samtar@deploy1002: samtar and matmarex: Backport for Start mobile DiscussionTools A/B test (T321961) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 22:17 samtar@deploy1002: Started scap: Backport for Start mobile DiscussionTools A/B test (T321961)
  • 22:16 samtar@deploy1002: Finished scap: Backport for Deemphasize "Learn more about this page" link (T324702), Reinitialize edit links after page content is reloaded (T324686) (duration: 10m 06s)
  • 22:08 samtar@deploy1002: samtar and matmarex: Backport for Deemphasize "Learn more about this page" link (T324702), Reinitialize edit links after page content is reloaded (T324686) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 22:06 samtar@deploy1002: Started scap: Backport for Deemphasize "Learn more about this page" link (T324702), Reinitialize edit links after page content is reloaded (T324686)
  • 22:05 samtar@deploy1002: Finished scap: Backport for frwikiversity: Set wgRestrictDisplayTitle to false (T324277), extwiki: Add new logo (T318766) (duration: 11m 04s)
  • 22:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
  • 21:56 samtar@deploy1002: samtar and stang: Backport for frwikiversity: Set wgRestrictDisplayTitle to false (T324277), extwiki: Add new logo (T318766) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:54 samtar@deploy1002: Started scap: Backport for frwikiversity: Set wgRestrictDisplayTitle to false (T324277), extwiki: Add new logo (T318766)
  • 21:53 samtar@deploy1002: Finished scap: Backport for specieswiki: Install GeoData extension (T324348) (duration: 09m 16s)
  • 21:46 samtar@deploy1002: samtar and stang: Backport for specieswiki: Install GeoData extension (T324348) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:44 samtar@deploy1002: Started scap: Backport for specieswiki: Install GeoData extension (T324348)
  • 21:39 TheresNoTime: T324348 : `[samtar@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php specieswiki geodata`
  • 21:37 samtar@deploy1002: Finished scap: Backport for createExtensionTables: Add extension GeoData (T324348) (duration: 08m 01s)
  • 21:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
  • 21:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
  • 21:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 21:31 samtar@deploy1002: samtar and stang: Backport for createExtensionTables: Add extension GeoData (T324348) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:29 samtar@deploy1002: Started scap: Backport for createExtensionTables: Add extension GeoData (T324348)
  • 21:27 jdrewniak@deploy1002: Finished scap: Backport for Add elwiki and arwiki to desktop-improvements group (T322391) (duration: 08m 31s)
  • 21:24 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 21:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 21:21 jdrewniak@deploy1002: jdrewniak and jdrewniak: Backport for Add elwiki and arwiki to desktop-improvements group (T322391) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:19 jdrewniak@deploy1002: Started scap: Backport for Add elwiki and arwiki to desktop-improvements group (T322391)
  • 21:17 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 55s)
  • 21:15 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 21:10 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 07s)
  • 20:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
  • 20:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 20:34 ryankemper: [Cloudelastic] Cleaned up stale (not running but files not removed) elasticsearch 6 units which broke the previous rolling upgrade run on cloudelastic1005
  • 20:31 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 20:27 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 20:27 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 20:22 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 20:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: Plugin upgrade for T322776
  • 20:21 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: Plugin upgrade for T322776
  • 20:17 ryankemper: T323064 Merged https://gerrit.wikimedia.org/r/c/operations/grafana-grizzly/+/862178 and deployed new dashboard, visible here: https://grafana.wikimedia.org/d/slo-wdqs-tmpl/wdqs-slos-grizzly-template?orgId=1
  • 20:12 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.13 refs T320518
  • 20:09 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 19:59 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 19:59 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - ryankemper@cumin1001 - T322776
  • 19:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 16:14 eevans@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cassandra-dev2001
  • 16:14 eevans@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cassandra-dev2001
  • 16:13 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:13 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev2001 to cassandra-dev2001 - eevans@cumin1001"
  • 16:12 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename restbase-dev2001 to cassandra-dev2001 - eevans@cumin1001"
  • 16:10 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:08 eevans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 16:08 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2002.codfw.wmnet with OS bullseye
  • 15:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 365 days, 0:00:00 on contint1001.wikimedia.org with reason: awaiting decom
  • 15:48 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 365 days, 0:00:00 on contint1001.wikimedia.org with reason: awaiting decom
  • 15:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2002.codfw.wmnet with reason: host reimage
  • 15:42 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2002.codfw.wmnet with reason: host reimage
  • 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T322618)', diff saved to https://phabricator.wikimedia.org/P42654 and previous config saved to /var/cache/conftool/dbconfig/20221208-153123-ladsgroup.json
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5002.eqsin.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 15:27 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 15:26 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:25 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2002.codfw.wmnet with OS bullseye
  • 15:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 15:21 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42653 and previous config saved to /var/cache/conftool/dbconfig/20221208-151616-ladsgroup.json
  • 15:15 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase-dev2001.codfw.wmnet
  • 15:15 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 15:13 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase-dev2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 15:12 hashar: Restarted Gerrit TWICE on gerrit1001.wikimedia.org to apply `-Dh2.maxCompactTime` and get it to trigger compaction # T323754
  • 15:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5002.eqsin.wmnet
  • 15:10 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:09 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 15:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 15:08 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 15:08 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 15:08 jiji@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 15:08 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 15:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 15:07 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:05 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 15:05 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 15:05 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 15:05 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 15:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 15:05 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase-dev2001.codfw.wmnet
  • 15:05 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 15:05 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 15:05 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42650 and previous config saved to /var/cache/conftool/dbconfig/20221208-150109-ladsgroup.json
  • 14:59 hashar: Restarting Gerrit replica TWICE on gerrit2002.wikimedia.org to apply `-Dh2.maxCompactTime` and get it to trigger compaction # T323754
  • 14:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:52 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:52 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:52 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:52 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:52 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:50 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:50 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:47 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:47 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:47 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:47 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:47 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T322618)', diff saved to https://phabricator.wikimedia.org/P42649 and previous config saved to /var/cache/conftool/dbconfig/20221208-144602-ladsgroup.json
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T322618)', diff saved to https://phabricator.wikimedia.org/P42648 and previous config saved to /var/cache/conftool/dbconfig/20221208-144152-ladsgroup.json
  • 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T322618)', diff saved to https://phabricator.wikimedia.org/P42647 and previous config saved to /var/cache/conftool/dbconfig/20221208-144131-ladsgroup.json
  • 14:40 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:40 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42646 and previous config saved to /var/cache/conftool/dbconfig/20221208-142625-ladsgroup.json
  • 14:21 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:20 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:20 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:19 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:19 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:18 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:16 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 14:13 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42645 and previous config saved to /var/cache/conftool/dbconfig/20221208-141118-ladsgroup.json
  • 14:10 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 14:09 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 14:08 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 14:07 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T322618)', diff saved to https://phabricator.wikimedia.org/P42644 and previous config saved to /var/cache/conftool/dbconfig/20221208-135611-ladsgroup.json
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T322618)', diff saved to https://phabricator.wikimedia.org/P42643 and previous config saved to /var/cache/conftool/dbconfig/20221208-135402-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T322618)', diff saved to https://phabricator.wikimedia.org/P42642 and previous config saved to /var/cache/conftool/dbconfig/20221208-135341-ladsgroup.json
  • 13:43 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 13:43 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P42641 and previous config saved to /var/cache/conftool/dbconfig/20221208-133835-ladsgroup.json
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P42640 and previous config saved to /var/cache/conftool/dbconfig/20221208-132329-ladsgroup.json
  • 13:20 aqu@deploy1002: Finished deploy [airflow-dags/analytics@455d142]: Hotfix on HDFS usage (Remove the specific unicode char in comment) - analytics [airflow-dags@455d142] (duration: 00m 15s)
  • 13:20 aqu@deploy1002: Started deploy [airflow-dags/analytics@455d142]: Hotfix on HDFS usage (Remove the specific unicode char in comment) - analytics [airflow-dags@455d142]
  • 13:19 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@455d142]: Hotfix on HDFS usage (Unicode in comment) - analytics_test [airflow-dags@455d142] (duration: 00m 09s)
  • 13:19 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@455d142]: Hotfix on HDFS usage (Unicode in comment) - analytics_test [airflow-dags@455d142]
  • 13:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T322618)', diff saved to https://phabricator.wikimedia.org/P42639 and previous config saved to /var/cache/conftool/dbconfig/20221208-130822-ladsgroup.json
  • 13:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T322618)', diff saved to https://phabricator.wikimedia.org/P42638 and previous config saved to /var/cache/conftool/dbconfig/20221208-130612-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T322618)', diff saved to https://phabricator.wikimedia.org/P42637 and previous config saved to /var/cache/conftool/dbconfig/20221208-130551-ladsgroup.json
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42635 and previous config saved to /var/cache/conftool/dbconfig/20221208-125045-ladsgroup.json
  • 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti5002.eqsin.wmnet with reason: Remove for eventual decom
  • 12:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti5002.eqsin.wmnet with reason: Remove for eventual decom
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T322618)', diff saved to https://phabricator.wikimedia.org/P42634 and previous config saved to /var/cache/conftool/dbconfig/20221208-124435-ladsgroup.json
  • 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42633 and previous config saved to /var/cache/conftool/dbconfig/20221208-123538-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P42632 and previous config saved to /var/cache/conftool/dbconfig/20221208-122928-ladsgroup.json
  • 12:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 12:22 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T322618)', diff saved to https://phabricator.wikimedia.org/P42631 and previous config saved to /var/cache/conftool/dbconfig/20221208-122032-ladsgroup.json
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T322618)', diff saved to https://phabricator.wikimedia.org/P42630 and previous config saved to /var/cache/conftool/dbconfig/20221208-121823-ladsgroup.json
  • 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42629 and previous config saved to /var/cache/conftool/dbconfig/20221208-121801-ladsgroup.json
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P42628 and previous config saved to /var/cache/conftool/dbconfig/20221208-121422-ladsgroup.json
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P42627 and previous config saved to /var/cache/conftool/dbconfig/20221208-120255-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T322618)', diff saved to https://phabricator.wikimedia.org/P42626 and previous config saved to /var/cache/conftool/dbconfig/20221208-115915-ladsgroup.json
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T322618)', diff saved to https://phabricator.wikimedia.org/P42625 and previous config saved to /var/cache/conftool/dbconfig/20221208-115659-ladsgroup.json
  • 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42624 and previous config saved to /var/cache/conftool/dbconfig/20221208-115627-ladsgroup.json
  • 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P42623 and previous config saved to /var/cache/conftool/dbconfig/20221208-114748-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P42622 and previous config saved to /var/cache/conftool/dbconfig/20221208-114120-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42621 and previous config saved to /var/cache/conftool/dbconfig/20221208-113240-ladsgroup.json
  • 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42620 and previous config saved to /var/cache/conftool/dbconfig/20221208-113030-ladsgroup.json
  • 11:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42619 and previous config saved to /var/cache/conftool/dbconfig/20221208-112951-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P42618 and previous config saved to /var/cache/conftool/dbconfig/20221208-112612-ladsgroup.json
  • 11:23 aqu@deploy1002: Finished deploy [airflow-dags/analytics@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics [airflow-dags@73d1267] (duration: 00m 18s)
  • 11:22 aqu@deploy1002: Started deploy [airflow-dags/analytics@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics [airflow-dags@73d1267]
  • 11:21 moritzm: drain ganeti5002 for eventual decom T324610
  • 11:20 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics_test [airflow-dags@73d1267] (duration: 00m 09s)
  • 11:20 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@73d1267]: Create dag generating weekly snapshot of HDFS usage - analytics_test [airflow-dags@73d1267]
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P42617 and previous config saved to /var/cache/conftool/dbconfig/20221208-111444-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42616 and previous config saved to /var/cache/conftool/dbconfig/20221208-111105-ladsgroup.json
  • 11:10 steve_munene: batch restarting varnishkafka-webrequest.service in batches of 3 30 seconds in between T323771
  • 11:09 steve_munene: batch restarting varnishkafka-webrequest.service in batches of 3 30 seconds in between T323771
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42615 and previous config saved to /var/cache/conftool/dbconfig/20221208-110849-ladsgroup.json
  • 11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42614 and previous config saved to /var/cache/conftool/dbconfig/20221208-110828-ladsgroup.json
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P42613 and previous config saved to /var/cache/conftool/dbconfig/20221208-105938-ladsgroup.json
  • 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
  • 10:56 steve_munene: batch restarting varnishkafka-statsv.service in batches of 3 30 seconds in between T323771
  • 10:56 steve_munene: batch restarting varnishkafka-statsv.service in batches of 3 30 seconds in between T323771
  • 10:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P42612 and previous config saved to /var/cache/conftool/dbconfig/20221208-105321-ladsgroup.json
  • 10:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
  • 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42611 and previous config saved to /var/cache/conftool/dbconfig/20221208-104432-ladsgroup.json
  • 10:43 steve_munene: batch restarting varnishkafka-eventlogging.service in batches of 3 30 seconds in between T323771
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42610 and previous config saved to /var/cache/conftool/dbconfig/20221208-104322-ladsgroup.json
  • 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T322618)', diff saved to https://phabricator.wikimedia.org/P42609 and previous config saved to /var/cache/conftool/dbconfig/20221208-104300-ladsgroup.json
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P42608 and previous config saved to /var/cache/conftool/dbconfig/20221208-103815-ladsgroup.json
  • 10:36 ladsgroup@deploy1002: Finished scap: Backport for Set externallinks migration to WRITE_BOTH in testwiki (T321662) (duration: 09m 17s)
  • 10:35 steve_munene: batch restarting varnishkafka-eventlogging.service in batches of 3 30 seconds in between
  • 10:35 steve_munene: batch restarting varnishkafka-eventlogging.service in batches of 3 30 seconds in between
  • 10:28 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Set externallinks migration to WRITE_BOTH in testwiki (T321662) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P42606 and previous config saved to /var/cache/conftool/dbconfig/20221208-102754-ladsgroup.json
  • 10:26 ladsgroup@deploy1002: Started scap: Backport for Set externallinks migration to WRITE_BOTH in testwiki (T321662)
  • 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42605 and previous config saved to /var/cache/conftool/dbconfig/20221208-102308-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42604 and previous config saved to /var/cache/conftool/dbconfig/20221208-102052-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 10:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T322618)', diff saved to https://phabricator.wikimedia.org/P42603 and previous config saved to /var/cache/conftool/dbconfig/20221208-102030-ladsgroup.json
  • 10:18 hashar: contint1002: activated Icinga monitoring , all services are up and running # T313832
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P42602 and previous config saved to /var/cache/conftool/dbconfig/20221208-101247-ladsgroup.json
  • 10:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
  • 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P42600 and previous config saved to /var/cache/conftool/dbconfig/20221208-100524-ladsgroup.json
  • 10:01 claime: Deploying puppet enforcement of zuul-merger on contint1002
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T322618)', diff saved to https://phabricator.wikimedia.org/P42599 and previous config saved to /var/cache/conftool/dbconfig/20221208-095741-ladsgroup.json
  • 09:57 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host test-reimage2001.codfw.wmnet
  • 09:56 steve_munene: restarting varnishkafka-webrequest.service on host cp1075 T323771
  • 09:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P42598 and previous config saved to /var/cache/conftool/dbconfig/20221208-095017-ladsgroup.json
  • 09:50 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) test-reimage2001.codfw.wmnet on all recursors
  • 09:50 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache test-reimage2001.codfw.wmnet on all recursors
  • 09:50 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:50 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM test-reimage2001.codfw.wmnet - slyngshede@cumin1001"
  • 09:49 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM test-reimage2001.codfw.wmnet - slyngshede@cumin1001"
  • 09:46 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 09:46 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host test-reimage2001.codfw.wmnet
  • 09:43 hashar: contint1002: stopped puppet and manually started zuul-merger. I am monitoring it cause last time we have bring up a new one it had some issues here and there # T313832
  • 09:38 hashar: contint1001: manually stopped and masked zuul-merger. It is under maintenance mode in Icinga # T313832
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T322618)', diff saved to https://phabricator.wikimedia.org/P42597 and previous config saved to /var/cache/conftool/dbconfig/20221208-093511-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T322618)', diff saved to https://phabricator.wikimedia.org/P42596 and previous config saved to /var/cache/conftool/dbconfig/20221208-093255-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 09:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T322618)', diff saved to https://phabricator.wikimedia.org/P42595 and previous config saved to /var/cache/conftool/dbconfig/20221208-093218-ladsgroup.json
  • 09:25 hashar@deploy1002: Finished deploy [zuul/deploy@4c6859c]: Install Zuul virtualenv on contint1002 # T313832 (duration: 00m 07s)
  • 09:24 hashar@deploy1002: Started deploy [zuul/deploy@4c6859c]: Install Zuul virtualenv on contint1002 # T313832
  • 09:17 hashar@deploy1002: Finished deploy [integration/docroot@2e0d44b]: Warm up contint1002 and test php-fpm restart # T313832 (duration: 00m 03s)
  • 09:17 hashar@deploy1002: Started deploy [integration/docroot@2e0d44b]: Warm up contint1002 and test php-fpm restart # T313832
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P42594 and previous config saved to /var/cache/conftool/dbconfig/20221208-091712-ladsgroup.json
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P42593 and previous config saved to /var/cache/conftool/dbconfig/20221208-090205-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T322618)', diff saved to https://phabricator.wikimedia.org/P42592 and previous config saved to /var/cache/conftool/dbconfig/20221208-085724-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T322618)', diff saved to https://phabricator.wikimedia.org/P42591 and previous config saved to /var/cache/conftool/dbconfig/20221208-085657-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T322618)', diff saved to https://phabricator.wikimedia.org/P42590 and previous config saved to /var/cache/conftool/dbconfig/20221208-084659-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T322618)', diff saved to https://phabricator.wikimedia.org/P42589 and previous config saved to /var/cache/conftool/dbconfig/20221208-084442-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T322618)', diff saved to https://phabricator.wikimedia.org/P42588 and previous config saved to /var/cache/conftool/dbconfig/20221208-084421-ladsgroup.json
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P42587 and previous config saved to /var/cache/conftool/dbconfig/20221208-084151-ladsgroup.json
  • 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P42586 and previous config saved to /var/cache/conftool/dbconfig/20221208-082914-ladsgroup.json
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P42585 and previous config saved to /var/cache/conftool/dbconfig/20221208-082644-ladsgroup.json
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P42584 and previous config saved to /var/cache/conftool/dbconfig/20221208-081408-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T322618)', diff saved to https://phabricator.wikimedia.org/P42583 and previous config saved to /var/cache/conftool/dbconfig/20221208-081138-ladsgroup.json
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T322618)', diff saved to https://phabricator.wikimedia.org/P42582 and previous config saved to /var/cache/conftool/dbconfig/20221208-075901-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T322618)', diff saved to https://phabricator.wikimedia.org/P42581 and previous config saved to /var/cache/conftool/dbconfig/20221208-075645-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T322618)', diff saved to https://phabricator.wikimedia.org/P42580 and previous config saved to /var/cache/conftool/dbconfig/20221208-075624-ladsgroup.json
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P42579 and previous config saved to /var/cache/conftool/dbconfig/20221208-074117-ladsgroup.json
  • 07:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T322618)', diff saved to https://phabricator.wikimedia.org/P42578 and previous config saved to /var/cache/conftool/dbconfig/20221208-073122-ladsgroup.json
  • 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42577 and previous config saved to /var/cache/conftool/dbconfig/20221208-073101-ladsgroup.json
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P42576 and previous config saved to /var/cache/conftool/dbconfig/20221208-072611-ladsgroup.json
  • 07:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P42575 and previous config saved to /var/cache/conftool/dbconfig/20221208-071554-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T322618)', diff saved to https://phabricator.wikimedia.org/P42574 and previous config saved to /var/cache/conftool/dbconfig/20221208-071104-ladsgroup.json
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T322618)', diff saved to https://phabricator.wikimedia.org/P42573 and previous config saved to /var/cache/conftool/dbconfig/20221208-070847-ladsgroup.json
  • 07:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T322618)', diff saved to https://phabricator.wikimedia.org/P42572 and previous config saved to /var/cache/conftool/dbconfig/20221208-070825-ladsgroup.json
  • 07:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P42571 and previous config saved to /var/cache/conftool/dbconfig/20221208-070048-ladsgroup.json
  • 06:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 31800
  • 06:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 31800
  • 06:55 bblack: lvs1017: restarting pybal to take back text traffic (med reverted to normal, underlying problem w/ ipv6 addressed)
  • 06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P42570 and previous config saved to /var/cache/conftool/dbconfig/20221208-065319-ladsgroup.json
  • 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42569 and previous config saved to /var/cache/conftool/dbconfig/20221208-064541-ladsgroup.json
  • 06:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P42568 and previous config saved to /var/cache/conftool/dbconfig/20221208-063813-ladsgroup.json
  • 06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T322618)', diff saved to https://phabricator.wikimedia.org/P42567 and previous config saved to /var/cache/conftool/dbconfig/20221208-062306-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T322618)', diff saved to https://phabricator.wikimedia.org/P42566 and previous config saved to /var/cache/conftool/dbconfig/20221208-062050-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 06:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T322618)', diff saved to https://phabricator.wikimedia.org/P42565 and previous config saved to /var/cache/conftool/dbconfig/20221208-062028-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42564 and previous config saved to /var/cache/conftool/dbconfig/20221208-062028-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 06:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42563 and previous config saved to /var/cache/conftool/dbconfig/20221208-062006-ladsgroup.json
  • 06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T322618)', diff saved to https://phabricator.wikimedia.org/P42562 and previous config saved to /var/cache/conftool/dbconfig/20221208-061436-ladsgroup.json
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42561 and previous config saved to /var/cache/conftool/dbconfig/20221208-060551-ladsgroup.json
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P42560 and previous config saved to /var/cache/conftool/dbconfig/20221208-060522-ladsgroup.json
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P42559 and previous config saved to /var/cache/conftool/dbconfig/20221208-060500-ladsgroup.json
  • 05:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P42558 and previous config saved to /var/cache/conftool/dbconfig/20221208-055930-ladsgroup.json
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42557 and previous config saved to /var/cache/conftool/dbconfig/20221208-055046-ladsgroup.json
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P42556 and previous config saved to /var/cache/conftool/dbconfig/20221208-055015-ladsgroup.json
  • 05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P42555 and previous config saved to /var/cache/conftool/dbconfig/20221208-054953-ladsgroup.json
  • 05:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P42554 and previous config saved to /var/cache/conftool/dbconfig/20221208-054423-ladsgroup.json
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42553 and previous config saved to /var/cache/conftool/dbconfig/20221208-053541-ladsgroup.json
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T322618)', diff saved to https://phabricator.wikimedia.org/P42552 and previous config saved to /var/cache/conftool/dbconfig/20221208-053509-ladsgroup.json
  • 05:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42551 and previous config saved to /var/cache/conftool/dbconfig/20221208-053447-ladsgroup.json
  • 05:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T322618)', diff saved to https://phabricator.wikimedia.org/P42550 and previous config saved to /var/cache/conftool/dbconfig/20221208-053253-ladsgroup.json
  • 05:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 05:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P42549 and previous config saved to /var/cache/conftool/dbconfig/20221208-053236-ladsgroup.json
  • 05:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 05:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 05:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 05:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T322618)', diff saved to https://phabricator.wikimedia.org/P42548 and previous config saved to /var/cache/conftool/dbconfig/20221208-052917-ladsgroup.json
  • 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T322618)', diff saved to https://phabricator.wikimedia.org/P42547 and previous config saved to /var/cache/conftool/dbconfig/20221208-052705-ladsgroup.json
  • 05:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42546 and previous config saved to /var/cache/conftool/dbconfig/20221208-052036-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 03:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 03:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 02:24 bblack: lvs1017 - restary pybal manually again, back on bgp_med=101 (traffic goes back to lvs1020)
  • 02:21 bblack: restarting pybal on lvs1017 manually again with bgp_med=0 (should take traffic, may or may not do so very usefully!)
  • 02:05 bblack: sretest1001 - puppet disabled, manipulating routing on this host to conduct tests...
  • 01:56 bblack: lvs1017 - manually setting BGP MED to 101 and starting pybal (should come back and and speak BGP, but not steal traffic from lvs1020)
  • 01:29 bblack: lvs1017 - disable puppet and stop pybal to fix ipv6 for now
  • 01:27 bblack: lvs1017: restart pybal, attempt to fix text-ipv6 service
  • 01:05 bblack: lvsNNNN: restart pybal to apply etcd key changes on all "high-traffic1" lvs at all sites - T324336
  • 01:00 bblack: lvsNNNN: restart pybal to apply etcd key changes on all "high-traffic2" lvs at all sites - T324336
  • 00:47 bblack: lvsNNNN: restart pybal to apply etcd key changes on all "secondary" lvs at all sites - T324336 (5 hosts, ulsfo completed previously)
  • 00:45 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1012.eqiad.wmnet with OS bullseye
  • 00:29 bblack: lvs4010: restart pybal to test etcd key changes - T324336
  • 00:16 bblack: disabling puppet on all cp and lvs hosts for conftool key changes. Please coordinate if any lvs/pybal/cpNNNN depooling/work is needed during this transition!
  • 00:12 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=cdn
  • 00:12 bblack@cumin1001: conftool action : set/weight=1; selector: service=cdn
  • 00:07 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1012.eqiad.wmnet with reason: host reimage
  • 00:04 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1012.eqiad.wmnet with reason: host reimage

2022-12-07

  • 23:38 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1012.eqiad.wmnet with OS bullseye
  • 23:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 23:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 23:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T322618)', diff saved to https://phabricator.wikimedia.org/P42545 and previous config saved to /var/cache/conftool/dbconfig/20221207-233130-ladsgroup.json
  • 23:24 mutante: mx1001 about to run out of disk again - apt-get clean, gzip /var/log/exim4/mainlog.1 find -mtime +31 -delete in /var/log/exim4 - deleting old logs to prevent mail server running out of disk - it was alerting in Icinga but same as conf* - monitoring works, alerting does not T305567
  • 23:23 mutante: mx1001 - apt-get clean, gzip /var/log/exim4/mainlog.1 find -mtime +31 -delete in /var/log/exim4 - deleting old logs to prevent mail server running out of disk - it was alerting in Icinga but same as conf* - monitoring works, alerting does not
  • 23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42544 and previous config saved to /var/cache/conftool/dbconfig/20221207-231623-ladsgroup.json
  • 23:14 samtar@deploy1002: Finished scap: Backport for Make parsoid accept all content models. (T324711) (duration: 13m 57s)
  • 23:02 samtar@deploy1002: samtar and samtar: Backport for Make parsoid accept all content models. (T324711) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42543 and previous config saved to /var/cache/conftool/dbconfig/20221207-230116-ladsgroup.json
  • 23:00 samtar@deploy1002: Started scap: Backport for Make parsoid accept all content models. (T324711)
  • 22:51 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:51 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:51 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:50 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 22:49 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:49 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 22:49 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:49 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 22:48 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 22:48 TheresNoTime: Going to backport gerrit:865749 to wmf/1.40.0-wmf.13 for T324711
  • 22:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T322618)', diff saved to https://phabricator.wikimedia.org/P42542 and previous config saved to /var/cache/conftool/dbconfig/20221207-224610-ladsgroup.json
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T322618)', diff saved to https://phabricator.wikimedia.org/P42541 and previous config saved to /var/cache/conftool/dbconfig/20221207-224502-ladsgroup.json
  • 22:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 22:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T322618)', diff saved to https://phabricator.wikimedia.org/P42540 and previous config saved to /var/cache/conftool/dbconfig/20221207-224440-ladsgroup.json
  • 22:41 ryankemper: T301167 Downtimed `wdqs20[09-12]` for 7 days
  • 22:37 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:36 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=wdqs2009.*
  • 22:36 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=wdqs2010.*
  • 22:35 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:32 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:29 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:29 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42539 and previous config saved to /var/cache/conftool/dbconfig/20221207-222934-ladsgroup.json
  • 22:29 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:28 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:26 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 22:25 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 22:25 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 22:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42538 and previous config saved to /var/cache/conftool/dbconfig/20221207-221427-ladsgroup.json
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T322618)', diff saved to https://phabricator.wikimedia.org/P42537 and previous config saved to /var/cache/conftool/dbconfig/20221207-220110-ladsgroup.json
  • 21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T322618)', diff saved to https://phabricator.wikimedia.org/P42536 and previous config saved to /var/cache/conftool/dbconfig/20221207-215921-ladsgroup.json
  • 21:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T322618)', diff saved to https://phabricator.wikimedia.org/P42535 and previous config saved to /var/cache/conftool/dbconfig/20221207-215712-ladsgroup.json
  • 21:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 21:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T322618)', diff saved to https://phabricator.wikimedia.org/P42534 and previous config saved to /var/cache/conftool/dbconfig/20221207-215651-ladsgroup.json
  • 21:56 TheresNoTime: UTC late backport window done
  • 21:51 samtar@deploy1002: backport aborted: (duration: 00m 15s)
  • 21:49 samtar@deploy1002: Sync cancelled.
  • 21:47 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 865773"
  • 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42533 and previous config saved to /var/cache/conftool/dbconfig/20221207-214603-ladsgroup.json
  • 21:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5003.eqsin.wmnet
  • 21:44 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:44 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5003.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 21:43 samtar@deploy1002: samtar and stang: Backport for specieswiki: Install GeoData extension (T324348) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:43 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5003.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 21:41 samtar@deploy1002: Started scap: Backport for specieswiki: Install GeoData extension (T324348)
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42532 and previous config saved to /var/cache/conftool/dbconfig/20221207-214145-ladsgroup.json
  • 21:41 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 21:40 samtar@deploy1002: Finished scap: Backport for Remove Research Incentive survey from frwiki (T321930) (duration: 09m 04s)
  • 21:36 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5003.eqsin.wmnet
  • 21:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5003.eqsin.wmnet with reason: downtimed, in the process of decom
  • 21:36 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5003.eqsin.wmnet with reason: downtimed, in the process of decom
  • 21:34 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 865742"
  • 21:32 samtar@deploy1002: samtar and dani: Backport for Remove Research Incentive survey from frwiki (T321930) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5006.eqsin.wmnet with OS buster
  • 21:32 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 21:31 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42530 and previous config saved to /var/cache/conftool/dbconfig/20221207-213057-ladsgroup.json
  • 21:30 samtar@deploy1002: Started scap: Backport for Remove Research Incentive survey from frwiki (T321930)
  • 21:28 samtar@deploy1002: Finished scap: Backport for hewiki: enable parser cache writes for parsoid's page/html endpoint. (T322672 T320534 T320529), Page 5% of calls to parsoid's page/html endpoint write to PC (T322672) (duration: 20m 35s)
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42529 and previous config saved to /var/cache/conftool/dbconfig/20221207-212638-ladsgroup.json
  • 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T322618)', diff saved to https://phabricator.wikimedia.org/P42528 and previous config saved to /var/cache/conftool/dbconfig/20221207-211551-ladsgroup.json
  • 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T322618)', diff saved to https://phabricator.wikimedia.org/P42527 and previous config saved to /var/cache/conftool/dbconfig/20221207-211338-ladsgroup.json
  • 21:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 21:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42526 and previous config saved to /var/cache/conftool/dbconfig/20221207-211317-ladsgroup.json
  • 21:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T322618)', diff saved to https://phabricator.wikimedia.org/P42525 and previous config saved to /var/cache/conftool/dbconfig/20221207-211132-ladsgroup.json
  • 21:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
  • 21:10 samtar@deploy1002: samtar and daniel: Backport for hewiki: enable parser cache writes for parsoid's page/html endpoint. (T322672 T320534 T320529), Page 5% of calls to parsoid's page/html endpoint write to PC (T322672) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T322618)', diff saved to https://phabricator.wikimedia.org/P42524 and previous config saved to /var/cache/conftool/dbconfig/20221207-210923-ladsgroup.json
  • 21:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 21:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T322618)', diff saved to https://phabricator.wikimedia.org/P42523 and previous config saved to /var/cache/conftool/dbconfig/20221207-210902-ladsgroup.json
  • 21:08 samtar@deploy1002: Started scap: Backport for hewiki: enable parser cache writes for parsoid's page/html endpoint. (T322672 T320534 T320529), Page 5% of calls to parsoid's page/html endpoint write to PC (T322672)
  • 21:07 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
  • 21:02 mutante: contint1002 a2dismod mpm_event - https://phabricator.wikimedia.org/T208108 Bug: T313832
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42522 and previous config saved to /var/cache/conftool/dbconfig/20221207-205811-ladsgroup.json
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42521 and previous config saved to /var/cache/conftool/dbconfig/20221207-205356-ladsgroup.json
  • 20:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5006.eqsin.wmnet with OS buster
  • 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42520 and previous config saved to /var/cache/conftool/dbconfig/20221207-204304-ladsgroup.json
  • 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42519 and previous config saved to /var/cache/conftool/dbconfig/20221207-203849-ladsgroup.json
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42517 and previous config saved to /var/cache/conftool/dbconfig/20221207-202758-ladsgroup.json
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42516 and previous config saved to /var/cache/conftool/dbconfig/20221207-202545-ladsgroup.json
  • 20:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 20:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42515 and previous config saved to /var/cache/conftool/dbconfig/20221207-202524-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T322618)', diff saved to https://phabricator.wikimedia.org/P42514 and previous config saved to /var/cache/conftool/dbconfig/20221207-202343-ladsgroup.json
  • 20:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T322618)', diff saved to https://phabricator.wikimedia.org/P42513 and previous config saved to /var/cache/conftool/dbconfig/20221207-202134-ladsgroup.json
  • 20:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 20:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 20:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T322618)', diff saved to https://phabricator.wikimedia.org/P42512 and previous config saved to /var/cache/conftool/dbconfig/20221207-202113-ladsgroup.json
  • 20:20 demon@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.13 refs T320518 (duration: 07m 03s)
  • 20:13 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.13 refs T320518
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42511 and previous config saved to /var/cache/conftool/dbconfig/20221207-201016-ladsgroup.json
  • 20:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on contint1002.wikimedia.org with reason: new setup
  • 20:09 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on contint1002.wikimedia.org with reason: new setup
  • 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42510 and previous config saved to /var/cache/conftool/dbconfig/20221207-200606-ladsgroup.json
  • 20:00 mutante: contint* - deploying firewall changes to add contint1002 - T313832
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42509 and previous config saved to /var/cache/conftool/dbconfig/20221207-195510-ladsgroup.json
  • 19:53 mutante: registry* (docker registry HA) - adding contint1002 to allowed hosts gerrit:865680 T313832
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42508 and previous config saved to /var/cache/conftool/dbconfig/20221207-195100-ladsgroup.json
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42507 and previous config saved to /var/cache/conftool/dbconfig/20221207-194003-ladsgroup.json
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42506 and previous config saved to /var/cache/conftool/dbconfig/20221207-193751-ladsgroup.json
  • 19:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 19:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T322618)', diff saved to https://phabricator.wikimedia.org/P42505 and previous config saved to /var/cache/conftool/dbconfig/20221207-193730-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T322618)', diff saved to https://phabricator.wikimedia.org/P42504 and previous config saved to /var/cache/conftool/dbconfig/20221207-193553-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T322618)', diff saved to https://phabricator.wikimedia.org/P42503 and previous config saved to /var/cache/conftool/dbconfig/20221207-193445-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 19:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T322618)', diff saved to https://phabricator.wikimedia.org/P42502 and previous config saved to /var/cache/conftool/dbconfig/20221207-193350-ladsgroup.json
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42501 and previous config saved to /var/cache/conftool/dbconfig/20221207-192223-ladsgroup.json
  • 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42500 and previous config saved to /var/cache/conftool/dbconfig/20221207-191843-ladsgroup.json
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T322618)', diff saved to https://phabricator.wikimedia.org/P42499 and previous config saved to /var/cache/conftool/dbconfig/20221207-191328-ladsgroup.json
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42498 and previous config saved to /var/cache/conftool/dbconfig/20221207-190717-ladsgroup.json
  • 19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42497 and previous config saved to /var/cache/conftool/dbconfig/20221207-190337-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42496 and previous config saved to /var/cache/conftool/dbconfig/20221207-185821-ladsgroup.json
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T322618)', diff saved to https://phabricator.wikimedia.org/P42495 and previous config saved to /var/cache/conftool/dbconfig/20221207-185210-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T322618)', diff saved to https://phabricator.wikimedia.org/P42494 and previous config saved to /var/cache/conftool/dbconfig/20221207-184958-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 18:49 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1026.eqiad.wmnet with OS bullseye
  • 18:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 18:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T322618)', diff saved to https://phabricator.wikimedia.org/P42493 and previous config saved to /var/cache/conftool/dbconfig/20221207-184851-ladsgroup.json
  • 18:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T322618)', diff saved to https://phabricator.wikimedia.org/P42492 and previous config saved to /var/cache/conftool/dbconfig/20221207-184830-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T322618)', diff saved to https://phabricator.wikimedia.org/P42491 and previous config saved to /var/cache/conftool/dbconfig/20221207-184722-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 18:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42490 and previous config saved to /var/cache/conftool/dbconfig/20221207-184700-ladsgroup.json
  • 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1023.eqiad.wmnet with OS bullseye
  • 18:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
  • 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 18:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P42489 and previous config saved to /var/cache/conftool/dbconfig/20221207-184315-ladsgroup.json
  • 18:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42488 and previous config saved to /var/cache/conftool/dbconfig/20221207-183344-ladsgroup.json
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42487 and previous config saved to /var/cache/conftool/dbconfig/20221207-183154-ladsgroup.json
  • 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T322618)', diff saved to https://phabricator.wikimedia.org/P42486 and previous config saved to /var/cache/conftool/dbconfig/20221207-182808-ladsgroup.json
  • 18:27 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1026.eqiad.wmnet with reason: host reimage
  • 18:23 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1026.eqiad.wmnet with reason: host reimage
  • 18:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42485 and previous config saved to /var/cache/conftool/dbconfig/20221207-181838-ladsgroup.json
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42484 and previous config saved to /var/cache/conftool/dbconfig/20221207-181647-ladsgroup.json
  • 18:06 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1026.eqiad.wmnet with OS bullseye
  • 18:04 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1026']
  • 18:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T322618)', diff saved to https://phabricator.wikimedia.org/P42483 and previous config saved to /var/cache/conftool/dbconfig/20221207-180331-ladsgroup.json
  • 18:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2002.codfw.wmnet with OS bullseye
  • 18:03 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42482 and previous config saved to /var/cache/conftool/dbconfig/20221207-180140-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42481 and previous config saved to /var/cache/conftool/dbconfig/20221207-180132-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T322618)', diff saved to https://phabricator.wikimedia.org/P42480 and previous config saved to /var/cache/conftool/dbconfig/20221207-180119-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42479 and previous config saved to /var/cache/conftool/dbconfig/20221207-180110-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T322618)', diff saved to https://phabricator.wikimedia.org/P42478 and previous config saved to /var/cache/conftool/dbconfig/20221207-180058-ladsgroup.json
  • 18:00 sukhe: restart pybal on lvs5003 to pick up bgp-med change
  • 17:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:56 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1026']
  • 17:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 17:53 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1026']
  • 17:50 aqu@deploy1002: Finished deploy [analytics/refinery@349e1cc] (hadoop-test): Deploy HDFS usage dataset generation scripts TEST [analytics/refinery@349e1cc] (duration: 01m 15s)
  • 17:49 aqu@deploy1002: Started deploy [analytics/refinery@349e1cc] (hadoop-test): Deploy HDFS usage dataset generation scripts TEST [analytics/refinery@349e1cc]
  • 17:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 17:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 17:48 aqu@deploy1002: Finished deploy [analytics/refinery@349e1cc] (thin): Deploy HDFS usage dataset generation scripts THIN [analytics/refinery@349e1cc] (duration: 00m 07s)
  • 17:48 aqu@deploy1002: Started deploy [analytics/refinery@349e1cc] (thin): Deploy HDFS usage dataset generation scripts THIN [analytics/refinery@349e1cc]
  • 17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2112 T324692', diff saved to https://phabricator.wikimedia.org/P42477 and previous config saved to /var/cache/conftool/dbconfig/20221207-174811-ladsgroup.json
  • 17:46 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1026']
  • 17:46 aqu@deploy1002: Finished deploy [analytics/refinery@349e1cc]: Deploy HDFS usage dataset generation scripts [analytics/refinery@349e1cc] (duration: 79m 12s)
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42476 and previous config saved to /var/cache/conftool/dbconfig/20221207-174604-ladsgroup.json
  • 17:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42475 and previous config saved to /var/cache/conftool/dbconfig/20221207-174551-ladsgroup.json
  • 17:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2103 to s1 primary T324692', diff saved to https://phabricator.wikimedia.org/P42474 and previous config saved to /var/cache/conftool/dbconfig/20221207-174540-ladsgroup.json
  • 17:45 Amir1: Starting s1 codfw failover from db2112 to db2103 - T324692
  • 17:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
  • 17:42 sukhe: restart pybal on lvs5005 to pick up bgp-med
  • 17:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
  • 17:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 17:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
  • 17:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T322618)', diff saved to https://phabricator.wikimedia.org/P42473 and previous config saved to /var/cache/conftool/dbconfig/20221207-173350-ladsgroup.json
  • 17:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T322618)', diff saved to https://phabricator.wikimedia.org/P42472 and previous config saved to /var/cache/conftool/dbconfig/20221207-173329-ladsgroup.json
  • 17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42471 and previous config saved to /var/cache/conftool/dbconfig/20221207-173057-ladsgroup.json
  • 17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42470 and previous config saved to /var/cache/conftool/dbconfig/20221207-173045-ladsgroup.json
  • 17:27 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
  • 17:26 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmjohnson@cumin1001"
  • 17:26 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1026.eqiad.wmnet with OS bullseye
  • 17:25 sukhe: running homer for Gerrit: 865712
  • 17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42469 and previous config saved to /var/cache/conftool/dbconfig/20221207-171822-ladsgroup.json
  • 17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T322618)', diff saved to https://phabricator.wikimedia.org/P42468 and previous config saved to /var/cache/conftool/dbconfig/20221207-171803-ladsgroup.json
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5002.eqsin.wmnet
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42467 and previous config saved to /var/cache/conftool/dbconfig/20221207-171551-ladsgroup.json
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T322618)', diff saved to https://phabricator.wikimedia.org/P42466 and previous config saved to /var/cache/conftool/dbconfig/20221207-171538-ladsgroup.json
  • 17:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1023.eqiad.wmnet with reason: host reimage
  • 17:14 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2103 with weight 0 T324692', diff saved to https://phabricator.wikimedia.org/P42465 and previous config saved to /var/cache/conftool/dbconfig/20221207-171416-ladsgroup.json
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42464 and previous config saved to /var/cache/conftool/dbconfig/20221207-171342-ladsgroup.json
  • 17:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T322618)', diff saved to https://phabricator.wikimedia.org/P42463 and previous config saved to /var/cache/conftool/dbconfig/20221207-171326-ladsgroup.json
  • 17:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42462 and previous config saved to /var/cache/conftool/dbconfig/20221207-171321-ladsgroup.json
  • 17:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T322618)', diff saved to https://phabricator.wikimedia.org/P42461 and previous config saved to /var/cache/conftool/dbconfig/20221207-171305-ladsgroup.json
  • 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1024.eqiad.wmnet with reason: host reimage
  • 17:12 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 38 hosts with reason: Primary switchover s1 T324692
  • 17:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 38 hosts with reason: Primary switchover s1 T324692
  • 17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1023.eqiad.wmnet with reason: host reimage
  • 17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1024.eqiad.wmnet with reason: host reimage
  • 17:08 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5002.eqsin.wmnet
  • 17:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P42460 and previous config saved to /var/cache/conftool/dbconfig/20221207-170316-ladsgroup.json
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P42459 and previous config saved to /var/cache/conftool/dbconfig/20221207-170256-ladsgroup.json
  • 17:01 jiji@deploy1002: Finished scap: Backport for ProductionServices: Use redis_misc servers for LockManager (6/6) (T267581) (duration: 14m 46s)
  • 16:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1023.eqiad.wmnet with OS bullseye
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42458 and previous config saved to /var/cache/conftool/dbconfig/20221207-165815-ladsgroup.json
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42457 and previous config saved to /var/cache/conftool/dbconfig/20221207-165758-ladsgroup.json
  • 16:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 16:56 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 16:55 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 16:55 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 16:55 eevans@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 16:48 jiji@deploy1002: jiji and jiji: Backport for ProductionServices: Use redis_misc servers for LockManager (6/6) (T267581) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T322618)', diff saved to https://phabricator.wikimedia.org/P42456 and previous config saved to /var/cache/conftool/dbconfig/20221207-164809-ladsgroup.json
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P42455 and previous config saved to /var/cache/conftool/dbconfig/20221207-164748-ladsgroup.json
  • 16:46 jiji@deploy1002: Started scap: Backport for ProductionServices: Use redis_misc servers for LockManager (6/6) (T267581)
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42454 and previous config saved to /var/cache/conftool/dbconfig/20221207-164308-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T322618)', diff saved to https://phabricator.wikimedia.org/P42453 and previous config saved to /var/cache/conftool/dbconfig/20221207-164258-ladsgroup.json
  • 16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42452 and previous config saved to /var/cache/conftool/dbconfig/20221207-164252-ladsgroup.json
  • 16:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5002.eqsin.wmnet with reason: downtimed, in the process of decom
  • 16:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 16:42 sukhe: restart pybal on lvs5002
  • 16:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5002.eqsin.wmnet with reason: downtimed, in the process of decom
  • 16:38 sukhe: cr[23]-eqsin*: set routing-options static route 103.102.166.240/28 next-hop 10.132.0.6: T322048
  • 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T322618)', diff saved to https://phabricator.wikimedia.org/P42451 and previous config saved to /var/cache/conftool/dbconfig/20221207-163242-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T322618)', diff saved to https://phabricator.wikimedia.org/P42450 and previous config saved to /var/cache/conftool/dbconfig/20221207-163031-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 16:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 16:29 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1026.eqiad.wmnet with OS bullseye
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42449 and previous config saved to /var/cache/conftool/dbconfig/20221207-162802-ladsgroup.json
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T322618)', diff saved to https://phabricator.wikimedia.org/P42448 and previous config saved to /var/cache/conftool/dbconfig/20221207-162745-ladsgroup.json
  • 16:27 aqu@deploy1002: Started deploy [analytics/refinery@349e1cc]: Deploy HDFS usage dataset generation scripts [analytics/refinery@349e1cc]
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T322618)', diff saved to https://phabricator.wikimedia.org/P42447 and previous config saved to /var/cache/conftool/dbconfig/20221207-162553-ladsgroup.json
  • 16:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T322618)', diff saved to https://phabricator.wikimedia.org/P42446 and previous config saved to /var/cache/conftool/dbconfig/20221207-162533-ladsgroup.json
  • 16:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 16:25 aqu: Deploying analytics/refinery (HDFS usage scripts)
  • 16:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 16:24 sukhe: run homer in cr*-eqsin for Gerrit: 865615
  • 16:09 denisse: sync rancid in netmon2001 and netmon2002
  • 16:08 denisse: sync librenms RRD in netmon2002
  • 16:08 jiji@deploy1002: Finished scap: Backport for ProductionServices: Use redis_misc servers for LockManager (5/6) (T267581) (duration: 10m 59s)
  • 16:06 sukhe: run homer in cr*-eqsin for Gerrit: 865660
  • 16:02 denisse: Sync LibreNMS RRD in netmon2001
  • 15:59 jiji@deploy1002: jiji and jiji: Backport for ProductionServices: Use redis_misc servers for LockManager (5/6) (T267581) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 15:57 jiji@deploy1002: Started scap: Backport for ProductionServices: Use redis_misc servers for LockManager (5/6) (T267581)
  • 15:46 jiji@deploy1002: Finished scap: Backport for ProductionServices: Use redis_misc servers for LockManager (4/6) (T267581) (duration: 08m 29s)
  • 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5005.eqsin.wmnet with OS buster
  • 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:40 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:39 jiji@deploy1002: jiji and jiji: Backport for ProductionServices: Use redis_misc servers for LockManager (4/6) (T267581) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 15:37 jiji@deploy1002: Started scap: Backport for ProductionServices: Use redis_misc servers for LockManager (4/6) (T267581)
  • 15:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5003.wikimedia.org with OS buster
  • 15:36 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:34 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:24 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
  • 15:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
  • 15:11 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 15:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
  • 15:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
  • 14:56 krinkle@deploy1002: Finished deploy [performance/navtiming@6caa033]: (no justification provided) (duration: 00m 07s)
  • 14:56 krinkle@deploy1002: Started deploy [performance/navtiming@6caa033]: (no justification provided)
  • 14:44 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5005.eqsin.wmnet with OS buster
  • 14:38 XioNoX: draining Arelion eqiad-codfw circuit for optic replacement
  • 14:32 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5003.wikimedia.org with OS buster
  • 14:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns5002.wikimedia.org
  • 14:28 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:27 moritzm: restarting ntpd
  • 14:26 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:24 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:20 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns5002.wikimedia.org
  • 14:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dns5002.wikimedia.org with reason: downtimed, to be depooled
  • 14:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dns5002.wikimedia.org with reason: downtimed, to be depooled
  • 14:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on an-tool1005.eqiad.wmnet with reason: redeploying an-tool1005 as bullseye
  • 14:02 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on an-tool1005.eqiad.wmnet with reason: redeploying an-tool1005 as bullseye
  • 13:42 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.13 refs T320518 (duration: 07m 45s)
  • 13:34 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13 refs T320518
  • 13:25 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Created cloudcumin instances - volans@cumin1001"
  • 13:22 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Created cloudcumin instances - volans@cumin1001"
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P42443 and previous config saved to /var/cache/conftool/dbconfig/20221207-131858-marostegui.json
  • 13:09 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudcumin1001.eqiad.wmnet with reason: First installation
  • 13:09 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudcumin1001.eqiad.wmnet with reason: First installation
  • 12:37 moritzm: upgrading mwmaint servers to PHP 7.4.33
  • 12:33 moritzm: upgrading deployment servers to PHP 7.4.33
  • 12:32 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
  • 12:32 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
  • 12:30 moritzm: upgrading cloudweb to PHP 7.4.33
  • 12:28 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
  • 12:28 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
  • 12:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 12:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 12:17 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 12:16 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 12:16 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 12:15 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 12:15 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:15 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:15 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:14 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
  • 12:14 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:14 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:13 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:12 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:12 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:11 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 12:11 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 12:10 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudcumin2001.codfw.wmnet with reason: First installation
  • 12:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 12:10 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 12:09 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 12:09 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 11:59 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:59 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 11:59 moritzm: imported rsyslog 8.2102.0-2+deb11u1~buster1 to component/rsyslog-openssl T324623
  • 11:57 moritzm: imported librelp 1.10.0-1~buster1 to component/rsyslog-openssl T324623
  • 11:55 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:55 sukhe: running authdns-update for Gerrit: 865605
  • 11:55 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 11:55 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:54 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 11:51 hashar@deploy1002: Finished deploy [integration/docroot@2e0d44b]: Spelling, coobooks -> cookbooks (duration: 00m 14s)
  • 11:50 hashar@deploy1002: Started deploy [integration/docroot@2e0d44b]: Spelling, coobooks -> cookbooks
  • 11:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:33 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 11:11 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudcumin2001.codfw.wmnet
  • 11:06 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcumin2001.codfw.wmnet on all recursors
  • 11:06 volans@cumin2002: START - Cookbook sre.dns.wipe-cache cloudcumin2001.codfw.wmnet on all recursors
  • 11:06 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:06 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudcumin2001.codfw.wmnet - volans@cumin2002"
  • 11:05 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudcumin2001.codfw.wmnet - volans@cumin2002"
  • 11:03 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:01 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 11:01 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host cloudcumin2001.codfw.wmnet
  • 10:58 volans@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudcumin1001.eqiad.wmnet
  • 10:53 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcumin1001.eqiad.wmnet on all recursors
  • 10:53 volans@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcumin1001.eqiad.wmnet on all recursors
  • 10:53 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:53 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudcumin1001.eqiad.wmnet - volans@cumin1001"
  • 10:52 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudcumin1001.eqiad.wmnet - volans@cumin1001"
  • 10:50 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:50 volans@cumin1001: START - Cookbook sre.ganeti.makevm for new host cloudcumin1001.eqiad.wmnet
  • 10:35 jiji@deploy1002: Finished scap: Backport for ProductionServices: Use redis_misc servers for LockManager (3/6) (T267581) (duration: 10m 29s)
  • 10:27 jiji@deploy1002: jiji and jiji: Backport for ProductionServices: Use redis_misc servers for LockManager (3/6) (T267581) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 10:26 claime: rebooted contin1001.eqiad.wmnet
  • 10:25 jiji@deploy1002: Started scap: Backport for ProductionServices: Use redis_misc servers for LockManager (3/6) (T267581)
  • 10:17 jiji@deploy1002: Finished scap: Backport for ProductionServices: Use redis_misc servers for LockManager (2/6) (T267581) (duration: 10m 48s)
  • 10:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40217
  • 10:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 40217
  • 10:17 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 714
  • 10:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 714
  • 10:12 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 16276
  • 10:09 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:09 jiji@deploy1002: jiji and jiji: Backport for ProductionServices: Use redis_misc servers for LockManager (2/6) (T267581) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 10:07 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:07 jiji@deploy1002: Started scap: Backport for ProductionServices: Use redis_misc servers for LockManager (2/6) (T267581)
  • 10:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16276
  • 10:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35320
  • 10:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35320
  • 10:00 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 8932
  • 09:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8932
  • 09:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13150
  • 09:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13150
  • 09:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 138064
  • 09:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 138064
  • 09:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16276
  • 09:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16276
  • 09:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32098
  • 09:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32098
  • 09:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31800
  • 09:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31800
  • 09:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 31800
  • 09:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31800
  • 09:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45430
  • 09:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45430
  • 09:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7568
  • 09:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7568
  • 09:37 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:36 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:36 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:36 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:35 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:35 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:35 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:34 jiji@deploy1002: Finished scap: Backport for ProductionServices: Use redis_misc servers for LockManager (1/6) (T267581) (duration: 09m 08s)
  • 09:27 jiji@deploy1002: jiji and jiji: Backport for ProductionServices: Use redis_misc servers for LockManager (1/6) (T267581) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 09:25 jiji@deploy1002: Started scap: Backport for ProductionServices: Use redis_misc servers for LockManager (1/6) (T267581)
  • 09:23 jiji@deploy1002: backport aborted: (duration: 00m 18s)
  • 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 395570
  • 09:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 395570
  • 09:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 397715
  • 09:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 397715
  • 09:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42
  • 09:11 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1108.eqiad.wmnet
  • 09:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1108.eqiad.wmnet
  • 09:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 42
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42439 and previous config saved to /var/cache/conftool/dbconfig/20221207-071831-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42438 and previous config saved to /var/cache/conftool/dbconfig/20221207-070326-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42437 and previous config saved to /var/cache/conftool/dbconfig/20221207-064821-root.json
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42436 and previous config saved to /var/cache/conftool/dbconfig/20221207-063316-root.json
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42435 and previous config saved to /var/cache/conftool/dbconfig/20221207-061810-root.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42434 and previous config saved to /var/cache/conftool/dbconfig/20221207-060305-root.json
  • 05:58 marostegui: Drop phab1001 grants from m3 databases T323418
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Testing new RAID controller', diff saved to https://phabricator.wikimedia.org/P42433 and previous config saved to /var/cache/conftool/dbconfig/20221207-054759-root.json
  • 03:59 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1011.eqiad.wmnet with OS bullseye
  • 03:18 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1011.eqiad.wmnet with reason: host reimage
  • 03:15 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1011.eqiad.wmnet with reason: host reimage
  • 02:49 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1011.eqiad.wmnet with OS bullseye
  • 02:39 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1011']
  • 02:33 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1011']
  • 00:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 00:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1023.eqiad.wmnet with OS bullseye

2022-12-06

  • 23:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 23:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1023.eqiad.wmnet with OS bullseye
  • 23:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 22:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes1023 - cmjohnson@cumin1001"
  • 22:37 tgr_: UTC late backports done
  • 22:36 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes1023 - cmjohnson@cumin1001"
  • 22:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 22:26 tgr@deploy1002: Finished scap: Backport for Fix UserDatabaseHelper::hasMainspaceEdits() (T324285), Fix UserDatabaseHelper::hasMainspaceEdits() (T324285) (duration: 18m 58s)
  • 22:09 tgr@deploy1002: tgr and tgr: Backport for Fix UserDatabaseHelper::hasMainspaceEdits() (T324285), Fix UserDatabaseHelper::hasMainspaceEdits() (T324285) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 22:07 tgr@deploy1002: Started scap: Backport for Fix UserDatabaseHelper::hasMainspaceEdits() (T324285), Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)
  • 21:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
  • 21:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:07 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1029.eqiad.wmnet with OS bullseye
  • 20:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:45 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1029.eqiad.wmnet with reason: host reimage
  • 20:42 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1029.eqiad.wmnet with reason: host reimage
  • 20:25 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1029.eqiad.wmnet with OS bullseye
  • 20:25 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1029.eqiad.wmnet with OS bullseye
  • 20:24 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1029.eqiad.wmnet with OS bullseye
  • 20:24 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1028.eqiad.wmnet with OS bullseye
  • 20:22 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1029']
  • 20:16 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1029']
  • 20:13 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1029']
  • 20:06 eileen: civicrm upgraded from c9761fee to 3ae68ab4
  • 20:06 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1029']
  • 20:01 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1028.eqiad.wmnet with reason: host reimage
  • 20:00 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5005.eqsin.wmnet with OS bullseye
  • 19:57 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1028.eqiad.wmnet with reason: host reimage
  • 19:56 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5007.eqsin.wmnet with OS bullseye
  • 19:55 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5006.eqsin.wmnet with OS bullseye
  • 19:45 ejegg: payments-wiki upgraded from a875f2b9 to 1914b6c7
  • 19:40 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1028.eqiad.wmnet with OS bullseye
  • 19:39 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5005.eqsin.wmnet with reason: host reimage
  • 19:38 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1027.eqiad.wmnet with OS bullseye
  • 19:37 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5007.eqsin.wmnet with reason: host reimage
  • 19:35 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5006.eqsin.wmnet with reason: host reimage
  • 19:32 ladsgroup@deploy1002: Finished scap: Backport for Avoid syntax error on hover in grade C browsers (T324514) (duration: 12m 43s)
  • 19:32 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5007.eqsin.wmnet with reason: host reimage
  • 19:32 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5005.eqsin.wmnet with reason: host reimage
  • 19:32 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5006.eqsin.wmnet with reason: host reimage
  • 19:32 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1028']
  • 19:22 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
  • 19:21 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1028']
  • 19:21 ladsgroup@deploy1002: ladsgroup and jdlrobson: Backport for Avoid syntax error on hover in grade C browsers (T324514) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 19:19 ladsgroup@deploy1002: Started scap: Backport for Avoid syntax error on hover in grade C browsers (T324514)
  • 19:19 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.13 refs T320518
  • 19:16 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1027.eqiad.wmnet with reason: host reimage
  • 19:14 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
  • 19:12 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1027.eqiad.wmnet with reason: host reimage
  • 19:03 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5007.eqsin.wmnet with OS bullseye
  • 19:03 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5006.eqsin.wmnet with OS bullseye
  • 19:03 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5005.eqsin.wmnet with OS bullseye
  • 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti5006']
  • 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti5007']
  • 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti5005']
  • 18:56 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1027.eqiad.wmnet with OS bullseye
  • 18:55 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
  • 18:45 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
  • 18:45 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti5007']
  • 18:45 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti5006']
  • 18:44 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti5005']
  • 18:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
  • 18:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
  • 18:42 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns5003']
  • 18:42 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs5006']
  • 18:42 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs5005']
  • 18:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1027']
  • 18:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
  • 18:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
  • 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
  • 18:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
  • 18:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns5003']
  • 18:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs5006']
  • 18:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs5005']
  • 18:28 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['logstash1028']
  • 18:28 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
  • 18:27 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1028']
  • 18:27 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
  • 18:26 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
  • 18:18 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1028.eqiad.wmnet with OS bullseye
  • 18:08 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
  • 18:07 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs5006
  • 18:07 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs5006
  • 18:07 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:06 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@4925134]: Revert Deploying image_suggestions 0.5.0 on platform_eng Airflow instance (duration: 00m 09s)
  • 18:06 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:05 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@4925134]: Revert Deploying image_suggestions 0.5.0 on platform_eng Airflow instance
  • 18:02 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti5007.mgmt.eqsin.wmnet with reboot policy FORCED
  • 18:02 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti5006.mgmt.eqsin.wmnet with reboot policy FORCED
  • 18:02 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1028']
  • 18:02 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
  • 17:56 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
  • 17:56 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
  • 17:47 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti5007.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:47 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti5006.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:47 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns5003.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:47 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs5006.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:47 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti5005.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:29 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti5005.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:29 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:29 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1027.eqiad.wmnet with OS bullseye
  • 17:28 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns5003.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:27 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs5006.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:27 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:25 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns5003
  • 17:25 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns5003
  • 17:25 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti5007
  • 17:25 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti5007
  • 17:24 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti5006
  • 17:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 17:24 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti5006
  • 17:24 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti5005
  • 17:23 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti5005
  • 17:22 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1028.eqiad.wmnet with OS bullseye
  • 17:21 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs5006
  • 17:21 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs5006
  • 17:17 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1029.eqiad.wmnet with OS bullseye
  • 17:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 17:12 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1028.eqiad.wmnet with OS bullseye
  • 17:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 17:01 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:46 kostajh: UTC afternoon backports done
  • 16:44 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: Start oldimpact experiment (T323526) (duration: 10m 54s)
  • 16:35 kharlan@deploy1002: kharlan and kharlan: Backport for GrowthExperiments: Start oldimpact experiment (T323526) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 16:33 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: Start oldimpact experiment (T323526)
  • 16:32 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1027.eqiad.wmnet with OS bullseye
  • 16:30 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: Enable new impact module on pilot wikis (T323686) (duration: 10m 14s)
  • 16:23 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1028.eqiad.wmnet with OS bullseye
  • 16:21 kharlan@deploy1002: kharlan and kharlan: Backport for GrowthExperiments: Enable new impact module on pilot wikis (T323686) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 16:21 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs5005
  • 16:21 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs5005
  • 16:21 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1029.eqiad.wmnet with OS bullseye
  • 16:21 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:21 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqsin new hosts - robh@cumin2002"
  • 16:19 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: Enable new impact module on pilot wikis (T323686)
  • 16:18 kharlan@deploy1002: backport aborted: (duration: 02m 53s)
  • 16:16 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqsin new hosts - robh@cumin2002"
  • 16:15 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:13 kharlan@deploy1002: Finished scap: Backport for User impact: Do not show impact module if user has no mainspace edits (T324285), Localisation updates from https://translatewiki.net., NewImpact: Show "999+" when we could not count edits/thanks (T324286) (duration: 29m 43s)
  • 16:12 robh@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:10 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:02 kharlan@deploy1002: kharlan and urbanecm and kharlan: Backport for User impact: Do not show impact module if user has no mainspace edits (T324285), Localisation updates from https://translatewiki.net., NewImpact: Show "999+" when we could not count edits/thanks (T324286) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.
  • 15:45 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@6377d4c]: Deploying image_suggestions 0.5.0 on platform_eng Airflow instance (duration: 00m 17s)
  • 15:44 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@6377d4c]: Deploying image_suggestions 0.5.0 on platform_eng Airflow instance
  • 15:43 kharlan@deploy1002: Started scap: Backport for User impact: Do not show impact module if user has no mainspace edits (T324285), Localisation updates from https://translatewiki.net., NewImpact: Show "999+" when we could not count edits/thanks (T324286)
  • 15:41 reedy@deploy1002: Synchronized php-1.40.0-wmf.13/extensions/SecurePoll/includes/Pages/ListPager.php: T324556 (duration: 07m 01s)
  • 15:33 reedy@deploy1002: Synchronized php-1.40.0-wmf.12/extensions/SecurePoll/includes/Pages/ListPager.php: T324556 (duration: 07m 13s)
  • 15:20 kharlan@deploy1002: Finished scap: Backport for Localisation updates from https://translatewiki.net. (duration: 10m 48s)
  • 15:13 kharlan@deploy1002: kharlan and urbanecm: Backport for Localisation updates from https://translatewiki.net. synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 15:10 kharlan@deploy1002: Started scap: Backport for Localisation updates from https://translatewiki.net.
  • 14:52 kharlan@deploy1002: Finished scap: Backport for Instrumentation: Monitor navigation duration, transferSize, first paint (T324198) (duration: 10m 07s)
  • 14:44 kharlan@deploy1002: kharlan and kharlan: Backport for Instrumentation: Monitor navigation duration, transferSize, first paint (T324198) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:42 kharlan@deploy1002: Started scap: Backport for Instrumentation: Monitor navigation duration, transferSize, first paint (T324198)
  • 14:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:34 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:34 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:33 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:32 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:32 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:31 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:31 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:31 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:28 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:27 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:25 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:24 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:23 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:23 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:21 kharlan@deploy1002: Finished scap: Backport for NewImpact: Adjust hasMainspaceEditsCache check (T324285) (duration: 09m 04s)
  • 14:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
  • 14:13 kharlan@deploy1002: kharlan and kharlan: Backport for NewImpact: Adjust hasMainspaceEditsCache check (T324285) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 14:12 kharlan@deploy1002: Started scap: Backport for NewImpact: Adjust hasMainspaceEditsCache check (T324285)
  • 14:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
  • 13:18 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 13:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 13:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:02 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 13:00 kharlan@deploy1002: Finished scap: Backport for Revert "resourceloader: Modern ES6 code should be forced to target mobile" (T323542) (duration: 07m 57s)
  • 12:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 12:54 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 12:54 kharlan@deploy1002: kharlan and kharlan: Backport for Revert "resourceloader: Modern ES6 code should be forced to target mobile" (T323542) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 12:52 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
  • 12:52 kharlan@deploy1002: Started scap: Backport for Revert "resourceloader: Modern ES6 code should be forced to target mobile" (T323542)
  • 12:49 moritzm: installing glibc security updates on buster
  • 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 12:29 jnuche@deploy1002: Pruned MediaWiki: 1.40.0-wmf.10 (duration: 02m 09s)
  • 12:27 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 12:27 jmm@cumin2002: END (FAIL) - Cookbook sre.wdqs.restart-nginx (exit_code=1) rolling restart_daemons on A:wcqs-public
  • 12:27 jnuche@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.13 refs T320518 (duration: 05m 52s)
  • 12:21 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13 refs T320518
  • 12:14 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13 refs T320518
  • 12:10 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 11:20 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 10:59 kostajh: UTC morning deploys done
  • 10:56 moritzm: installing freetype security updates
  • 10:48 kharlan@deploy1002: Finished scap: Backport for NewImpact: Show "999+" when we could not count edits/thanks (T324286) (duration: 31m 25s)
  • 10:36 kharlan@deploy1002: kharlan and kharlan: Backport for NewImpact: Show "999+" when we could not count edits/thanks (T324286) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 10:16 kharlan@deploy1002: Started scap: Backport for NewImpact: Show "999+" when we could not count edits/thanks (T324286)
  • 09:37 kharlan@deploy1002: Finished scap: Backport for User impact: Do not show impact module if user has no mainspace edits (T324285) (duration: 28m 05s)
  • 09:11 kharlan@deploy1002: kharlan and kharlan: Backport for User impact: Do not show impact module if user has no mainspace edits (T324285) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:09 kharlan@deploy1002: Started scap: Backport for User impact: Do not show impact module if user has no mainspace edits (T324285)
  • 06:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T323907)', diff saved to https://phabricator.wikimedia.org/P42426 and previous config saved to /var/cache/conftool/dbconfig/20221206-064402-ladsgroup.json
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42425 and previous config saved to /var/cache/conftool/dbconfig/20221206-062856-ladsgroup.json
  • 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42424 and previous config saved to /var/cache/conftool/dbconfig/20221206-061349-ladsgroup.json
  • 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T323907)', diff saved to https://phabricator.wikimedia.org/P42423 and previous config saved to /var/cache/conftool/dbconfig/20221206-055843-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T323907)', diff saved to https://phabricator.wikimedia.org/P42422 and previous config saved to /var/cache/conftool/dbconfig/20221206-054030-ladsgroup.json
  • 05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T323907)', diff saved to https://phabricator.wikimedia.org/P42421 and previous config saved to /var/cache/conftool/dbconfig/20221206-053911-ladsgroup.json
  • 05:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 05:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T323907)', diff saved to https://phabricator.wikimedia.org/P42420 and previous config saved to /var/cache/conftool/dbconfig/20221206-053850-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42419 and previous config saved to /var/cache/conftool/dbconfig/20221206-052523-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42418 and previous config saved to /var/cache/conftool/dbconfig/20221206-052343-ladsgroup.json
  • 05:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42417 and previous config saved to /var/cache/conftool/dbconfig/20221206-051016-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42416 and previous config saved to /var/cache/conftool/dbconfig/20221206-050837-ladsgroup.json
  • 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T323907)', diff saved to https://phabricator.wikimedia.org/P42415 and previous config saved to /var/cache/conftool/dbconfig/20221206-045510-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T323907)', diff saved to https://phabricator.wikimedia.org/P42414 and previous config saved to /var/cache/conftool/dbconfig/20221206-045330-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T323907)', diff saved to https://phabricator.wikimedia.org/P42413 and previous config saved to /var/cache/conftool/dbconfig/20221206-043348-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T323907)', diff saved to https://phabricator.wikimedia.org/P42412 and previous config saved to /var/cache/conftool/dbconfig/20221206-043326-ladsgroup.json
  • 04:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T323907)', diff saved to https://phabricator.wikimedia.org/P42411 and previous config saved to /var/cache/conftool/dbconfig/20221206-042850-ladsgroup.json
  • 04:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 04:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 04:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42410 and previous config saved to /var/cache/conftool/dbconfig/20221206-042828-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42409 and previous config saved to /var/cache/conftool/dbconfig/20221206-041820-ladsgroup.json
  • 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42408 and previous config saved to /var/cache/conftool/dbconfig/20221206-041322-ladsgroup.json
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42407 and previous config saved to /var/cache/conftool/dbconfig/20221206-040313-ladsgroup.json
  • 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13 refs T320518
  • 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42406 and previous config saved to /var/cache/conftool/dbconfig/20221206-035815-ladsgroup.json
  • 03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T323907)', diff saved to https://phabricator.wikimedia.org/P42405 and previous config saved to /var/cache/conftool/dbconfig/20221206-034806-ladsgroup.json
  • 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42404 and previous config saved to /var/cache/conftool/dbconfig/20221206-034309-ladsgroup.json
  • 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T323907)', diff saved to https://phabricator.wikimedia.org/P42403 and previous config saved to /var/cache/conftool/dbconfig/20221206-032818-ladsgroup.json
  • 03:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 03:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 03:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T323907)', diff saved to https://phabricator.wikimedia.org/P42402 and previous config saved to /var/cache/conftool/dbconfig/20221206-032756-ladsgroup.json
  • 03:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42401 and previous config saved to /var/cache/conftool/dbconfig/20221206-031250-ladsgroup.json
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42400 and previous config saved to /var/cache/conftool/dbconfig/20221206-025831-ladsgroup.json
  • 02:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 02:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42399 and previous config saved to /var/cache/conftool/dbconfig/20221206-025821-ladsgroup.json
  • 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42398 and previous config saved to /var/cache/conftool/dbconfig/20221206-025743-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42397 and previous config saved to /var/cache/conftool/dbconfig/20221206-024314-ladsgroup.json
  • 02:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T323907)', diff saved to https://phabricator.wikimedia.org/P42396 and previous config saved to /var/cache/conftool/dbconfig/20221206-024236-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 02:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T322618)', diff saved to https://phabricator.wikimedia.org/P42395 and previous config saved to /var/cache/conftool/dbconfig/20221206-022817-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42394 and previous config saved to /var/cache/conftool/dbconfig/20221206-022808-ladsgroup.json
  • 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T323907)', diff saved to https://phabricator.wikimedia.org/P42393 and previous config saved to /var/cache/conftool/dbconfig/20221206-021638-ladsgroup.json
  • 02:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 02:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T323907)', diff saved to https://phabricator.wikimedia.org/P42392 and previous config saved to /var/cache/conftool/dbconfig/20221206-021617-ladsgroup.json
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42391 and previous config saved to /var/cache/conftool/dbconfig/20221206-021310-ladsgroup.json
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42390 and previous config saved to /var/cache/conftool/dbconfig/20221206-021301-ladsgroup.json
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42389 and previous config saved to /var/cache/conftool/dbconfig/20221206-020110-ladsgroup.json
  • 01:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42388 and previous config saved to /var/cache/conftool/dbconfig/20221206-015757-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42387 and previous config saved to /var/cache/conftool/dbconfig/20221206-014604-ladsgroup.json
  • 01:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T322618)', diff saved to https://phabricator.wikimedia.org/P42386 and previous config saved to /var/cache/conftool/dbconfig/20221206-014251-ladsgroup.json
  • 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T322618)', diff saved to https://phabricator.wikimedia.org/P42385 and previous config saved to /var/cache/conftool/dbconfig/20221206-014046-ladsgroup.json
  • 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T322618)', diff saved to https://phabricator.wikimedia.org/P42384 and previous config saved to /var/cache/conftool/dbconfig/20221206-014038-ladsgroup.json
  • 01:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 01:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T322618)', diff saved to https://phabricator.wikimedia.org/P42383 and previous config saved to /var/cache/conftool/dbconfig/20221206-014017-ladsgroup.json
  • 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T323907)', diff saved to https://phabricator.wikimedia.org/P42382 and previous config saved to /var/cache/conftool/dbconfig/20221206-013057-ladsgroup.json
  • 01:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42381 and previous config saved to /var/cache/conftool/dbconfig/20221206-012812-ladsgroup.json
  • 01:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 01:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 01:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T323907)', diff saved to https://phabricator.wikimedia.org/P42380 and previous config saved to /var/cache/conftool/dbconfig/20221206-012750-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42379 and previous config saved to /var/cache/conftool/dbconfig/20221206-012539-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42378 and previous config saved to /var/cache/conftool/dbconfig/20221206-012510-ladsgroup.json
  • 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42377 and previous config saved to /var/cache/conftool/dbconfig/20221206-011244-ladsgroup.json
  • 01:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T323907)', diff saved to https://phabricator.wikimedia.org/P42376 and previous config saved to /var/cache/conftool/dbconfig/20221206-011128-ladsgroup.json
  • 01:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 01:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42375 and previous config saved to /var/cache/conftool/dbconfig/20221206-011033-ladsgroup.json
  • 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42374 and previous config saved to /var/cache/conftool/dbconfig/20221206-011003-ladsgroup.json
  • 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42373 and previous config saved to /var/cache/conftool/dbconfig/20221206-005737-ladsgroup.json
  • 00:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T322618)', diff saved to https://phabricator.wikimedia.org/P42372 and previous config saved to /var/cache/conftool/dbconfig/20221206-005526-ladsgroup.json
  • 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T322618)', diff saved to https://phabricator.wikimedia.org/P42371 and previous config saved to /var/cache/conftool/dbconfig/20221206-005457-ladsgroup.json
  • 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T322618)', diff saved to https://phabricator.wikimedia.org/P42370 and previous config saved to /var/cache/conftool/dbconfig/20221206-005401-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42369 and previous config saved to /var/cache/conftool/dbconfig/20221206-005339-ladsgroup.json
  • 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T322618)', diff saved to https://phabricator.wikimedia.org/P42368 and previous config saved to /var/cache/conftool/dbconfig/20221206-005244-ladsgroup.json
  • 00:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 00:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T322618)', diff saved to https://phabricator.wikimedia.org/P42367 and previous config saved to /var/cache/conftool/dbconfig/20221206-005223-ladsgroup.json
  • 00:51 cstone: payments-wiki upgraded from b613ddfb to 0cd7e779
  • 00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T323907)', diff saved to https://phabricator.wikimedia.org/P42366 and previous config saved to /var/cache/conftool/dbconfig/20221206-004231-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42365 and previous config saved to /var/cache/conftool/dbconfig/20221206-003833-ladsgroup.json
  • 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42364 and previous config saved to /var/cache/conftool/dbconfig/20221206-003716-ladsgroup.json
  • 00:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 00:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 00:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T323907)', diff saved to https://phabricator.wikimedia.org/P42363 and previous config saved to /var/cache/conftool/dbconfig/20221206-002945-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42362 and previous config saved to /var/cache/conftool/dbconfig/20221206-002326-ladsgroup.json
  • 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42361 and previous config saved to /var/cache/conftool/dbconfig/20221206-002210-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42360 and previous config saved to /var/cache/conftool/dbconfig/20221206-001438-ladsgroup.json
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42359 and previous config saved to /var/cache/conftool/dbconfig/20221206-000820-ladsgroup.json
  • 00:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T322618)', diff saved to https://phabricator.wikimedia.org/P42358 and previous config saved to /var/cache/conftool/dbconfig/20221206-000703-ladsgroup.json
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42357 and previous config saved to /var/cache/conftool/dbconfig/20221206-000654-ladsgroup.json
  • 00:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T322618)', diff saved to https://phabricator.wikimedia.org/P42356 and previous config saved to /var/cache/conftool/dbconfig/20221206-000633-ladsgroup.json
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T322618)', diff saved to https://phabricator.wikimedia.org/P42355 and previous config saved to /var/cache/conftool/dbconfig/20221206-000444-ladsgroup.json
  • 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42354 and previous config saved to /var/cache/conftool/dbconfig/20221206-000329-ladsgroup.json

2022-12-05

  • 23:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42353 and previous config saved to /var/cache/conftool/dbconfig/20221205-235932-ladsgroup.json
  • 23:57 tzatziki: removing 2 files for legal compliance
  • 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T323907)', diff saved to https://phabricator.wikimedia.org/P42352 and previous config saved to /var/cache/conftool/dbconfig/20221205-235724-ladsgroup.json
  • 23:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 23:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 23:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 23:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42351 and previous config saved to /var/cache/conftool/dbconfig/20221205-235126-ladsgroup.json
  • 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42350 and previous config saved to /var/cache/conftool/dbconfig/20221205-234822-ladsgroup.json
  • 23:47 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@1d3ba41]: import_cirrus: Update doc cleaning to match cirrus updates (duration: 02m 30s)
  • 23:44 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@1d3ba41]: import_cirrus: Update doc cleaning to match cirrus updates
  • 23:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T323907)', diff saved to https://phabricator.wikimedia.org/P42349 and previous config saved to /var/cache/conftool/dbconfig/20221205-234425-ladsgroup.json
  • 23:41 tzatziki: removing 5 files for legal compliance
  • 23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42348 and previous config saved to /var/cache/conftool/dbconfig/20221205-233620-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42347 and previous config saved to /var/cache/conftool/dbconfig/20221205-233316-ladsgroup.json
  • 23:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T323907)', diff saved to https://phabricator.wikimedia.org/P42346 and previous config saved to /var/cache/conftool/dbconfig/20221205-232453-ladsgroup.json
  • 23:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 23:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 23:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42345 and previous config saved to /var/cache/conftool/dbconfig/20221205-232432-ladsgroup.json
  • 23:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T322618)', diff saved to https://phabricator.wikimedia.org/P42344 and previous config saved to /var/cache/conftool/dbconfig/20221205-232113-ladsgroup.json
  • 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T322618)', diff saved to https://phabricator.wikimedia.org/P42343 and previous config saved to /var/cache/conftool/dbconfig/20221205-231948-ladsgroup.json
  • 23:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 23:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42342 and previous config saved to /var/cache/conftool/dbconfig/20221205-231926-ladsgroup.json
  • 23:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42341 and previous config saved to /var/cache/conftool/dbconfig/20221205-231809-ladsgroup.json
  • 23:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T323907)', diff saved to https://phabricator.wikimedia.org/P42340 and previous config saved to /var/cache/conftool/dbconfig/20221205-231608-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42339 and previous config saved to /var/cache/conftool/dbconfig/20221205-231556-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42338 and previous config saved to /var/cache/conftool/dbconfig/20221205-231535-ladsgroup.json
  • 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42337 and previous config saved to /var/cache/conftool/dbconfig/20221205-230925-ladsgroup.json
  • 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42336 and previous config saved to /var/cache/conftool/dbconfig/20221205-230419-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42335 and previous config saved to /var/cache/conftool/dbconfig/20221205-230102-ladsgroup.json
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42334 and previous config saved to /var/cache/conftool/dbconfig/20221205-230028-ladsgroup.json
  • 22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42333 and previous config saved to /var/cache/conftool/dbconfig/20221205-225419-ladsgroup.json
  • 22:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42332 and previous config saved to /var/cache/conftool/dbconfig/20221205-224913-ladsgroup.json
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42331 and previous config saved to /var/cache/conftool/dbconfig/20221205-224555-ladsgroup.json
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42330 and previous config saved to /var/cache/conftool/dbconfig/20221205-224522-ladsgroup.json
  • 22:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42329 and previous config saved to /var/cache/conftool/dbconfig/20221205-223912-ladsgroup.json
  • 22:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42328 and previous config saved to /var/cache/conftool/dbconfig/20221205-223406-ladsgroup.json
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42326 and previous config saved to /var/cache/conftool/dbconfig/20221205-223140-ladsgroup.json
  • 22:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 22:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T322618)', diff saved to https://phabricator.wikimedia.org/P42325 and previous config saved to /var/cache/conftool/dbconfig/20221205-223119-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T323907)', diff saved to https://phabricator.wikimedia.org/P42324 and previous config saved to /var/cache/conftool/dbconfig/20221205-223049-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42323 and previous config saved to /var/cache/conftool/dbconfig/20221205-223015-ladsgroup.json
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42322 and previous config saved to /var/cache/conftool/dbconfig/20221205-222903-ladsgroup.json
  • 22:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T322618)', diff saved to https://phabricator.wikimedia.org/P42321 and previous config saved to /var/cache/conftool/dbconfig/20221205-222852-ladsgroup.json
  • 22:24 tzatziki: removing 1 file for legal compliance
  • 22:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42320 and previous config saved to /var/cache/conftool/dbconfig/20221205-221612-ladsgroup.json
  • 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42319 and previous config saved to /var/cache/conftool/dbconfig/20221205-221346-ladsgroup.json
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42317 and previous config saved to /var/cache/conftool/dbconfig/20221205-220105-ladsgroup.json
  • 22:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:59 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:59 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: deleted phab1001-vcs.eqiad.wmnet IPs - dzahn@cumin2002"
  • 21:59 mutante: deleting special DNS entries for "phab10010-vcs.eqiad.wmnet", IPv4 and IPv6 (Role: VIP), from netbox and syncing netbox data - T296022
  • 21:58 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: deleted phab1001-vcs.eqiad.wmnet IPs - dzahn@cumin2002"
  • 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42316 and previous config saved to /var/cache/conftool/dbconfig/20221205-215839-ladsgroup.json
  • 21:55 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 21:55 mutante: deleting special DNS entries for "phab10010-vcs.eqiad.wmnet", IPv4 and IPv6 (Role: VIP), from netbox - T280597
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42315 and previous config saved to /var/cache/conftool/dbconfig/20221205-215436-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 21:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42314 and previous config saved to /var/cache/conftool/dbconfig/20221205-215415-ladsgroup.json
  • 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T323907)', diff saved to https://phabricator.wikimedia.org/P42313 and previous config saved to /var/cache/conftool/dbconfig/20221205-214801-ladsgroup.json
  • 21:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 21:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T323907)', diff saved to https://phabricator.wikimedia.org/P42312 and previous config saved to /var/cache/conftool/dbconfig/20221205-214740-ladsgroup.json
  • 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T322618)', diff saved to https://phabricator.wikimedia.org/P42311 and previous config saved to /var/cache/conftool/dbconfig/20221205-214558-ladsgroup.json
  • 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T322618)', diff saved to https://phabricator.wikimedia.org/P42310 and previous config saved to /var/cache/conftool/dbconfig/20221205-214333-ladsgroup.json
  • 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T322618)', diff saved to https://phabricator.wikimedia.org/P42309 and previous config saved to /var/cache/conftool/dbconfig/20221205-214332-ladsgroup.json
  • 21:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 21:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 21:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 21:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 21:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T322618)', diff saved to https://phabricator.wikimedia.org/P42308 and previous config saved to /var/cache/conftool/dbconfig/20221205-214255-ladsgroup.json
  • 21:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T322618)', diff saved to https://phabricator.wikimedia.org/P42307 and previous config saved to /var/cache/conftool/dbconfig/20221205-214120-ladsgroup.json
  • 21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T322618)', diff saved to https://phabricator.wikimedia.org/P42306 and previous config saved to /var/cache/conftool/dbconfig/20221205-214058-ladsgroup.json
  • 21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42305 and previous config saved to /var/cache/conftool/dbconfig/20221205-213908-ladsgroup.json
  • 21:33 TheresNoTime: close UTC late backport window
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42304 and previous config saved to /var/cache/conftool/dbconfig/20221205-213233-ladsgroup.json
  • 21:31 samtar@deploy1002: Finished scap: Backport for Adjust to changes to redlink behavior from parsoid (T324352) (duration: 09m 05s)
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42303 and previous config saved to /var/cache/conftool/dbconfig/20221205-212748-ladsgroup.json
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42302 and previous config saved to /var/cache/conftool/dbconfig/20221205-212552-ladsgroup.json
  • 21:24 samtar@deploy1002: samtar and matmarex: Backport for Adjust to changes to redlink behavior from parsoid (T324352) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42301 and previous config saved to /var/cache/conftool/dbconfig/20221205-212402-ladsgroup.json
  • 21:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:22 samtar@deploy1002: Started scap: Backport for Adjust to changes to redlink behavior from parsoid (T324352)
  • 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42300 and previous config saved to /var/cache/conftool/dbconfig/20221205-211727-ladsgroup.json
  • 21:17 samtar@deploy1002: Finished scap: Backport for Use new DiscussionTools heading markup on group0 wikis (T314714) (duration: 09m 55s)
  • 21:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T312984)', diff saved to https://phabricator.wikimedia.org/P42299 and previous config saved to /var/cache/conftool/dbconfig/20221205-211405-ladsgroup.json
  • 21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42298 and previous config saved to /var/cache/conftool/dbconfig/20221205-211242-ladsgroup.json
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42297 and previous config saved to /var/cache/conftool/dbconfig/20221205-211045-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42296 and previous config saved to /var/cache/conftool/dbconfig/20221205-210855-ladsgroup.json
  • 21:08 samtar@deploy1002: samtar and matmarex: Backport for Use new DiscussionTools heading markup on group0 wikis (T314714) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:07 samtar@deploy1002: Started scap: Backport for Use new DiscussionTools heading markup on group0 wikis (T314714)
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T323907)', diff saved to https://phabricator.wikimedia.org/P42295 and previous config saved to /var/cache/conftool/dbconfig/20221205-210220-ladsgroup.json
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42294 and previous config saved to /var/cache/conftool/dbconfig/20221205-205859-ladsgroup.json
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T322618)', diff saved to https://phabricator.wikimedia.org/P42293 and previous config saved to /var/cache/conftool/dbconfig/20221205-205735-ladsgroup.json
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T322618)', diff saved to https://phabricator.wikimedia.org/P42292 and previous config saved to /var/cache/conftool/dbconfig/20221205-205610-ladsgroup.json
  • 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 20:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T322618)', diff saved to https://phabricator.wikimedia.org/P42291 and previous config saved to /var/cache/conftool/dbconfig/20221205-205547-ladsgroup.json
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T322618)', diff saved to https://phabricator.wikimedia.org/P42290 and previous config saved to /var/cache/conftool/dbconfig/20221205-205537-ladsgroup.json
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T322618)', diff saved to https://phabricator.wikimedia.org/P42289 and previous config saved to /var/cache/conftool/dbconfig/20221205-205324-ladsgroup.json
  • 20:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 20:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42288 and previous config saved to /var/cache/conftool/dbconfig/20221205-205303-ladsgroup.json
  • 20:47 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts phab1001.eqiad.wmnet
  • 20:47 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:47 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: phab1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
  • 20:44 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: phab1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
  • 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42287 and previous config saved to /var/cache/conftool/dbconfig/20221205-204352-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42286 and previous config saved to /var/cache/conftool/dbconfig/20221205-204034-ladsgroup.json
  • 20:38 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42285 and previous config saved to /var/cache/conftool/dbconfig/20221205-203756-ladsgroup.json
  • 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T312984)', diff saved to https://phabricator.wikimedia.org/P42284 and previous config saved to /var/cache/conftool/dbconfig/20221205-202846-ladsgroup.json
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42283 and previous config saved to /var/cache/conftool/dbconfig/20221205-202528-ladsgroup.json
  • 20:25 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts phab1001.eqiad.wmnet
  • 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42282 and previous config saved to /var/cache/conftool/dbconfig/20221205-202250-ladsgroup.json
  • 20:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42281 and previous config saved to /var/cache/conftool/dbconfig/20221205-202029-ladsgroup.json
  • 20:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42280 and previous config saved to /var/cache/conftool/dbconfig/20221205-202008-ladsgroup.json
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T323907)', diff saved to https://phabricator.wikimedia.org/P42279 and previous config saved to /var/cache/conftool/dbconfig/20221205-201831-ladsgroup.json
  • 20:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 20:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T323907)', diff saved to https://phabricator.wikimedia.org/P42278 and previous config saved to /var/cache/conftool/dbconfig/20221205-201810-ladsgroup.json
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T322618)', diff saved to https://phabricator.wikimedia.org/P42277 and previous config saved to /var/cache/conftool/dbconfig/20221205-201021-ladsgroup.json
  • 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T322618)', diff saved to https://phabricator.wikimedia.org/P42276 and previous config saved to /var/cache/conftool/dbconfig/20221205-200755-ladsgroup.json
  • 20:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42275 and previous config saved to /var/cache/conftool/dbconfig/20221205-200743-ladsgroup.json
  • 20:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 20:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 20:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T322618)', diff saved to https://phabricator.wikimedia.org/P42274 and previous config saved to /var/cache/conftool/dbconfig/20221205-200530-ladsgroup.json
  • 20:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 20:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42273 and previous config saved to /var/cache/conftool/dbconfig/20221205-200501-ladsgroup.json
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42272 and previous config saved to /var/cache/conftool/dbconfig/20221205-200303-ladsgroup.json
  • 20:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on phab1001.eqiad.wmnet with reason: decom, replaced by phab1004
  • 20:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on phab1001.eqiad.wmnet with reason: decom, replaced by phab1004
  • 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T312984)', diff saved to https://phabricator.wikimedia.org/P42271 and previous config saved to /var/cache/conftool/dbconfig/20221205-195842-ladsgroup.json
  • 19:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 19:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 19:57 mutante: phab1004 (prod) - removing phab1001 from firewall rules, rsync config | phab1001 (formerly prod) - removing prod role T323418 T280597
  • 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42270 and previous config saved to /var/cache/conftool/dbconfig/20221205-194955-ladsgroup.json
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42269 and previous config saved to /var/cache/conftool/dbconfig/20221205-194757-ladsgroup.json
  • 19:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T323907)', diff saved to https://phabricator.wikimedia.org/P42268 and previous config saved to /var/cache/conftool/dbconfig/20221205-193949-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42267 and previous config saved to /var/cache/conftool/dbconfig/20221205-193448-ladsgroup.json
  • 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T323907)', diff saved to https://phabricator.wikimedia.org/P42266 and previous config saved to /var/cache/conftool/dbconfig/20221205-193250-ladsgroup.json
  • 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T323827)', diff saved to https://phabricator.wikimedia.org/P42265 and previous config saved to /var/cache/conftool/dbconfig/20221205-193203-ladsgroup.json
  • 19:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42264 and previous config saved to /var/cache/conftool/dbconfig/20221205-192442-ladsgroup.json
  • 19:24 mutante: phab1001, previous long time phabricator host, is about to be shut down, made a final copy of /srv/deployment, /root, /home, /etc and synced it to phab1004 - T323418
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P42263 and previous config saved to /var/cache/conftool/dbconfig/20221205-191656-ladsgroup.json
  • 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P42262 and previous config saved to /var/cache/conftool/dbconfig/20221205-190935-ladsgroup.json
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42261 and previous config saved to /var/cache/conftool/dbconfig/20221205-190710-ladsgroup.json
  • 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P42260 and previous config saved to /var/cache/conftool/dbconfig/20221205-190150-ladsgroup.json
  • 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T323907)', diff saved to https://phabricator.wikimedia.org/P42259 and previous config saved to /var/cache/conftool/dbconfig/20221205-185429-ladsgroup.json
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42258 and previous config saved to /var/cache/conftool/dbconfig/20221205-185205-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T323907)', diff saved to https://phabricator.wikimedia.org/P42257 and previous config saved to /var/cache/conftool/dbconfig/20221205-184950-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T323907)', diff saved to https://phabricator.wikimedia.org/P42256 and previous config saved to /var/cache/conftool/dbconfig/20221205-184944-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T323827)', diff saved to https://phabricator.wikimedia.org/P42255 and previous config saved to /var/cache/conftool/dbconfig/20221205-184643-ladsgroup.json
  • 18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T323827)', diff saved to https://phabricator.wikimedia.org/P42254 and previous config saved to /var/cache/conftool/dbconfig/20221205-183851-ladsgroup.json
  • 18:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 18:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T323907)', diff saved to https://phabricator.wikimedia.org/P42253 and previous config saved to /var/cache/conftool/dbconfig/20221205-183712-ladsgroup.json
  • 18:37 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1033.eqiad.wmnet with OS bullseye
  • 18:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42252 and previous config saved to /var/cache/conftool/dbconfig/20221205-183700-ladsgroup.json
  • 18:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42251 and previous config saved to /var/cache/conftool/dbconfig/20221205-182155-ladsgroup.json
  • 18:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 18:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 18:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp5016.eqsin.wmnet
  • 17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5016.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:40 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5016.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:39 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:34 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp5016.eqsin.wmnet
  • 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp5016.eqsin.wmnet with reason: downtimed, to be depooled
  • 17:30 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp5016.eqsin.wmnet with reason: downtimed, to be depooled
  • 17:30 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5016.eqsin.wmnet,service=varnish-fe
  • 17:30 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5016.eqsin.wmnet,service=ats-be
  • 17:30 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5016.eqsin.wmnet,service=ats-tls
  • 17:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet,service=varnish-fe
  • 17:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet,service=ats-tls
  • 17:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet,service=ats-be
  • 17:28 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5024.eqsin.wmnet,service=varnish-fe
  • 17:28 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5024.eqsin.wmnet,service=ats-tls
  • 17:28 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5024.eqsin.wmnet,service=ats-be
  • 17:21 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1034.eqiad.wmnet with OS bullseye
  • 17:21 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1035.eqiad.wmnet with OS bullseye
  • 17:02 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1033.eqiad.wmnet with reason: host reimage
  • 16:59 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1033.eqiad.wmnet with reason: host reimage
  • 16:59 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1034.eqiad.wmnet with reason: host reimage
  • 16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp5015.eqsin.wmnet
  • 16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5015.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 16:56 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5015.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 16:56 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1035.eqiad.wmnet with reason: host reimage
  • 16:56 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1034.eqiad.wmnet with reason: host reimage
  • 16:53 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1035.eqiad.wmnet with reason: host reimage
  • 16:53 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 16:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 16:48 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp5015.eqsin.wmnet
  • 16:44 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1033.eqiad.wmnet with OS bullseye
  • 16:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp5015.eqsin.wmnet with reason: downtimed, to be depooled
  • 16:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp5015.eqsin.wmnet with reason: downtimed, to be depooled
  • 16:41 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1034.eqiad.wmnet with OS bullseye
  • 16:40 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5015.eqsin.wmnet,service=varnish-fe
  • 16:40 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5015.eqsin.wmnet,service=ats-be
  • 16:40 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5015.eqsin.wmnet,service=ats-tls
  • 16:40 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1010.eqiad.wmnet with OS bullseye
  • 16:38 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1035.eqiad.wmnet with OS bullseye
  • 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet,service=varnish-fe
  • 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet,service=ats-tls
  • 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet,service=ats-be
  • 16:38 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5027.eqsin.wmnet,service=varnish-fe
  • 16:38 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5027.eqsin.wmnet,service=ats-tls
  • 16:38 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5027.eqsin.wmnet,service=ats-be
  • 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet,service=varnish-fe
  • 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet,service=ats-tls
  • 16:38 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet,service=ats-be
  • 16:38 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5023.eqsin.wmnet,service=varnish-fe
  • 16:38 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5023.eqsin.wmnet,service=ats-tls
  • 16:38 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5023.eqsin.wmnet,service=ats-be
  • 16:27 klausman: restarted kube-apiserver on ml-staging-ctrl2001 to adress high latency
  • 16:14 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1010.eqiad.wmnet with reason: host reimage
  • 16:11 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1010.eqiad.wmnet with reason: host reimage
  • 16:06 klausman: restarted kube-apiserver on ml-serve-ctrl1001 to adress high latency and large number of 504s
  • 16:06 moritzm: installing glibc security updates on buster
  • 15:46 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1010.eqiad.wmnet with OS bullseye
  • 15:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[5012,5014].eqsin.wmnet
  • 15:45 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:45 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5012,5014].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 15:44 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5012,5014].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 15:41 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:36 moritzm: installing apache2 security updates on buster
  • 15:35 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp[5012,5014].eqsin.wmnet
  • 15:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp[5012,5014].eqsin.wmnet with reason: downtimed, to be depooled
  • 15:30 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp[5012,5014].eqsin.wmnet with reason: downtimed, to be depooled
  • 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5014.eqsin.wmnet,service=varnish-fe
  • 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5014.eqsin.wmnet,service=ats-be
  • 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5014.eqsin.wmnet,service=ats-tls
  • 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5012.eqsin.wmnet,service=varnish-fe
  • 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5012.eqsin.wmnet,service=ats-be
  • 15:28 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5012.eqsin.wmnet,service=ats-tls
  • 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=varnish-fe
  • 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=ats-tls
  • 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=ats-be
  • 15:25 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5026.eqsin.wmnet,service=varnish-fe
  • 15:25 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5026.eqsin.wmnet,service=ats-tls
  • 15:25 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5026.eqsin.wmnet,service=ats-be
  • 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet,service=varnish-fe
  • 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet,service=ats-tls
  • 15:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet,service=ats-be
  • 15:25 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5022.eqsin.wmnet,service=varnish-fe
  • 15:25 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5022.eqsin.wmnet,service=ats-tls
  • 15:25 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5022.eqsin.wmnet,service=ats-be
  • 15:14 andrewbogott: deleted wikitech-static-ord-prebuster image backup in rackspace cloud. Here concludes the wikitech-static upgrade to Buster and php7.4
  • 15:07 root@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:06 root@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:06 root@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:05 root@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[5011,5013].eqsin.wmnet
  • 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5011,5013].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:56 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5011,5013].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:55 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:54 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:48 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp[5011,5013].eqsin.wmnet
  • 14:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp[5011,5013].eqsin.wmnet with reason: downtimed, to be depooled
  • 14:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp[5011,5013].eqsin.wmnet with reason: downtimed, to be depooled
  • 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5013.eqsin.wmnet,service=varnish-fe
  • 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5013.eqsin.wmnet,service=ats-be
  • 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5013.eqsin.wmnet,service=ats-tls
  • 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5011.eqsin.wmnet,service=varnish-fe
  • 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5011.eqsin.wmnet,service=ats-be
  • 14:41 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5011.eqsin.wmnet,service=ats-tls
  • 14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:37 TheresNoTime: closing UTC afternoon backport window
  • 14:36 samtar@deploy1002: Finished scap: Backport for logos: icon could be not square, trwiki: Add 20 years celebration logos (T324393) (duration: 08m 37s)
  • 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=varnish-fe
  • 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=ats-tls
  • 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=ats-be
  • 14:34 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5025.eqsin.wmnet,service=varnish-fe
  • 14:34 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5025.eqsin.wmnet,service=ats-tls
  • 14:34 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5025.eqsin.wmnet,service=ats-be
  • 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=varnish-fe
  • 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=ats-tls
  • 14:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=ats-be
  • 14:34 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5021.eqsin.wmnet,service=varnish-fe
  • 14:34 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp5021.eqsin.wmnet,service=ats-tls
  • 14:34 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp5021.eqsin.wmnet,service=ats-be
  • 14:29 samtar@deploy1002: samtar and stang: Backport for logos: icon could be not square, trwiki: Add 20 years celebration logos (T324393) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P42249 and previous config saved to /var/cache/conftool/dbconfig/20221205-142752-marostegui.json
  • 14:27 samtar@deploy1002: Started scap: Backport for logos: icon could be not square, trwiki: Add 20 years celebration logos (T324393)
  • 14:26 samtar@deploy1002: Finished scap: Backport for Add Property (120) to Wikidata content Namespace (T321282) (duration: 16m 59s)
  • 14:18 samtar@deploy1002: samtar and gtzatchkova: Backport for Add Property (120) to Wikidata content Namespace (T321282) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 14:09 samtar@deploy1002: Started scap: Backport for Add Property (120) to Wikidata content Namespace (T321282)
  • 14:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2127 T324180', diff saved to https://phabricator.wikimedia.org/P42247 and previous config saved to /var/cache/conftool/dbconfig/20221205-135932-ladsgroup.json
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2105 to s3 primary T324180', diff saved to https://phabricator.wikimedia.org/P42246 and previous config saved to /var/cache/conftool/dbconfig/20221205-135539-ladsgroup.json
  • 13:55 Amir1: Starting s3 codfw failover from db2127 to db2105 - T324180
  • 13:51 dcausse: repooling wdqs1004
  • 13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 55818
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2105 with weight 0 T324180', diff saved to https://phabricator.wikimedia.org/P42245 and previous config saved to /var/cache/conftool/dbconfig/20221205-134346-ladsgroup.json
  • 13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 T324180
  • 13:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s3 T324180
  • 13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 55818
  • 13:31 TheresNoTime: T302486 : [samtar@mwmaint1002 ~]$ mwscript maintenance/fixMergeHistoryCorruption.php --wiki enwiki --ns 828 --delete
  • 13:24 moritzm: installing postgresql-common bugfix updates from Buster 10.13 point release
  • 13:17 moritzm: installing distro-info-data bugfix updates from Buster 10.13 point release
  • 13:12 moritzm: installing libnet-ssleay-perl bugfix updates from Buster 10.13 point release
  • 12:50 moritzm: installing python-keystoneauth1 bugfix updates from Buster 10.13 point release
  • 12:41 root@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:41 root@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 12:41 root@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:39 root@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 11:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 11:59 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 11:59 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 11:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 11:53 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 11:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 11:51 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 11:50 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42243 and previous config saved to /var/cache/conftool/dbconfig/20221205-113746-marostegui.json
  • 11:31 moritzm: installing librsvg bugfix updates from buster point release
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42242 and previous config saved to /var/cache/conftool/dbconfig/20221205-111836-marostegui.json
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on idp-test1002.wikimedia.org with reason: Various tests which may cause temporary breakage on idp-test.w.o
  • 11:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on idp-test1002.wikimedia.org with reason: Various tests which may cause temporary breakage on idp-test.w.o
  • 11:07 hashar: Restarted Zuul to clear a stuck ssh connection with Gerrit - T309376
  • 10:33 kostajh: UTC morning deploys done
  • 10:32 godog: contint1001 - racadm serveraction powercyle - crashed
  • 10:31 kharlan@deploy1002: Finished scap: Backport for User impact: Show discovery notice to mobile users (T323619) (duration: 09m 30s)
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42241 and previous config saved to /var/cache/conftool/dbconfig/20221205-103028-marostegui.json
  • 10:23 kharlan@deploy1002: kharlan and kharlan: Backport for User impact: Show discovery notice to mobile users (T323619) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 10:22 kharlan@deploy1002: Started scap: Backport for User impact: Show discovery notice to mobile users (T323619)
  • 10:14 Emperor: rebalance thanos rings T311690
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42240 and previous config saved to /var/cache/conftool/dbconfig/20221205-100607-marostegui.json
  • 10:05 kharlan@deploy1002: Finished scap: Backport for User impact: Show discovery tour to desktop users who had old module (T323619) (duration: 27m 33s)
  • 09:50 kharlan@deploy1002: kharlan and kharlan: Backport for User impact: Show discovery tour to desktop users who had old module (T323619) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 09:39 moritzm: restarting mediawiki canaries to pick up freetype security updates
  • 09:38 godog: force a puppet run on physical hosts to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/860572
  • 09:37 kharlan@deploy1002: Started scap: Backport for User impact: Show discovery tour to desktop users who had old module (T323619)
  • 09:36 moritzm: installing freetype security updates
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42239 and previous config saved to /var/cache/conftool/dbconfig/20221205-091547-marostegui.json
  • 09:15 kharlan@deploy1002: backport aborted: (duration: 00m 25s)
  • 09:14 kharlan@deploy1002: Finished scap: Backport for Fix ExpensiveUserImpact input validation (T324312) (duration: 09m 10s)
  • 09:06 kharlan@deploy1002: kharlan and kharlan: Backport for Fix ExpensiveUserImpact input validation (T324312) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 09:05 kharlan@deploy1002: Started scap: Backport for Fix ExpensiveUserImpact input validation (T324312)
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42238 and previous config saved to /var/cache/conftool/dbconfig/20221205-090214-marostegui.json
  • 09:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 59689
  • 09:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 59689
  • 09:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 58308
  • 08:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 58308
  • 08:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141731
  • 08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 141731
  • 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52580
  • 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 52580
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42237 and previous config saved to /var/cache/conftool/dbconfig/20221205-085235-marostegui.json
  • 08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 136907
  • 08:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 136907
  • 08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55818
  • 08:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55818
  • 08:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38623
  • 08:48 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: End imagerecommendation experiment (T323686) (duration: 09m 26s)
  • 08:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38623
  • 08:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4788
  • 08:40 kharlan@deploy1002: kharlan and kharlan: Backport for GrowthExperiments: End imagerecommendation experiment (T323686) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 08:38 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: End imagerecommendation experiment (T323686)
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4788
  • 08:35 kartik@deploy1002: Finished scap: Backport for Enable Section Translation on 8 Wikipedias (T319176) (duration: 09m 57s)
  • 08:29 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-web,name=eqiad
  • 08:27 kartik@deploy1002: kartik and kartik: Backport for Enable Section Translation on 8 Wikipedias (T319176) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:25 kartik@deploy1002: Started scap: Backport for Enable Section Translation on 8 Wikipedias (T319176)
  • 08:24 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet,service=thanos-web
  • 08:24 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfw.wmnet,service=thanos-web
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with more weight', diff saved to https://phabricator.wikimedia.org/P42236 and previous config saved to /var/cache/conftool/dbconfig/20221205-082320-marostegui.json
  • 08:22 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-web,name=eqiad
  • 08:21 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-2002.codfw.wmnet,service=thanos-web
  • 08:21 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-2003.codfw.wmnet,service=thanos-web
  • 08:20 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 15 Wikipedias (T323825 T319177) (duration: 17m 25s)
  • 08:11 kartik@deploy1002: kartik and kartik: Backport for testwiki: Enable Section Translation for 15 Wikipedias (T323825 T319177) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:05 dcausse: restarting blazegraph on wdqs1004 (stuck with 2000+ threads, T242453)
  • 08:02 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 15 Wikipedias (T323825 T319177)
  • 07:57 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
  • 07:56 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1003.eqiad.wmnet,service=thanos-web
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P42234 and previous config saved to /var/cache/conftool/dbconfig/20221205-074804-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 100%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42233 and previous config saved to /var/cache/conftool/dbconfig/20221205-074655-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P42232 and previous config saved to /var/cache/conftool/dbconfig/20221205-073259-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 75%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42231 and previous config saved to /var/cache/conftool/dbconfig/20221205-073150-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P42230 and previous config saved to /var/cache/conftool/dbconfig/20221205-071754-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 50%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42229 and previous config saved to /var/cache/conftool/dbconfig/20221205-071645-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P42228 and previous config saved to /var/cache/conftool/dbconfig/20221205-070250-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 25%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42227 and previous config saved to /var/cache/conftool/dbconfig/20221205-070140-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with minimal weight', diff saved to https://phabricator.wikimedia.org/P42226 and previous config saved to /var/cache/conftool/dbconfig/20221205-065151-marostegui.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P42225 and previous config saved to /var/cache/conftool/dbconfig/20221205-064745-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 10%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42224 and previous config saved to /var/cache/conftool/dbconfig/20221205-064635-root.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 with minimal weight', diff saved to https://phabricator.wikimedia.org/P42223 and previous config saved to /var/cache/conftool/dbconfig/20221205-063743-marostegui.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P42222 and previous config saved to /var/cache/conftool/dbconfig/20221205-063240-root.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 5%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42221 and previous config saved to /var/cache/conftool/dbconfig/20221205-063130-root.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to dbctl (depooled)', diff saved to https://phabricator.wikimedia.org/P42220 and previous config saved to /var/cache/conftool/dbconfig/20221205-063020-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After schema change', diff saved to https://phabricator.wikimedia.org/P42219 and previous config saved to /var/cache/conftool/dbconfig/20221205-061735-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2173 (re)pooling @ 1%: After HW issues', diff saved to https://phabricator.wikimedia.org/P42218 and previous config saved to /var/cache/conftool/dbconfig/20221205-061625-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P42217 and previous config saved to /var/cache/conftool/dbconfig/20221205-061616-marostegui.json

2022-12-04

  • 04:19 TheresNoTime: T302486 : `[samtar@mwmaint1002 ~]$ mwscript maintenance/fixMergeHistoryCorruption.php --wiki enwiki --dry-run --ns 828`

2022-12-03

  • 00:17 cwhite: draining shards from logstash1010, logstash1033, logstash1034, logstash1035 - T321410

2022-12-02

  • 19:42 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:42 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
  • 19:41 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
  • 19:39 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:37 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:36 volans: fixed git checkout permissions T324334
  • 19:11 sukhe: restart pybal on lvs5004
  • 19:07 mutante: gitlab-runner* - upgrading gitlab-runner package version
  • 18:55 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 863383"
  • 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5001.eqsin.wmnet
  • 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:51 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:49 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:44 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5001.eqsin.wmnet
  • 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
  • 18:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
  • 18:20 sukhe: decomm lvs5001: restarting pybal
  • 18:14 sukhe: cr[23]-eqsin*: set routing-options static route 103.102.166.224/28 next-hop 10.132.0.39
  • 18:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:05 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
  • 18:03 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
  • 18:01 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:00 volans: performed git gc on all (auth)dns hosts in /srv/git/netbox_dns_snippets - T324334
  • 17:36 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862944"
  • 16:56 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 16:53 jnuche@deploy1002: Finished scap: testing k8s deployment (duration: 08m 35s)
  • 16:49 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 16:49 bblack: (above agent runs completed on all text nodes for requestctl-for-misc patch)
  • 16:44 jnuche@deploy1002: Started scap: testing k8s deployment
  • 16:44 bblack: running agent on A:cp-text for https://gerrit.wikimedia.org/r/c/operations/puppet/+/863375 (requestctl for misc)
  • 16:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 16:28 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
  • 16:21 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 16:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 16:02 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
  • 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
  • 15:55 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 15:48 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862998"
  • 15:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 15:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster
  • 15:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:40 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:33 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 15:28 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:22 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 15:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 15:06 volans: run `git gc` on /srv/netbox-exports/dns.git on netbox[12]002 - T324334
  • 14:48 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
  • 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
  • 12:09 jynus: dropping all databases from db1133
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5001.eqsin.wmnet
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5001.eqsin.wmnet
  • 10:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
  • 10:01 vgutierrez: upload acme-chief 0.36 to apt.wm.o (bullseye) - T321309
  • 09:58 moritzm: installing publicsuffix updates from bullseye/buster point releases
  • 09:54 moritzm: installing debootstrap updates from bullseye point release
  • 09:53 moritzm: rebalance ganeti codfw/C T323222
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
  • 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42215 and previous config saved to /var/cache/conftool/dbconfig/20221202-091126-root.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42214 and previous config saved to /var/cache/conftool/dbconfig/20221202-085621-root.json
  • 08:41 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42213 and previous config saved to /var/cache/conftool/dbconfig/20221202-084116-root.json
  • 08:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:40 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42212 and previous config saved to /var/cache/conftool/dbconfig/20221202-082611-root.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42211 and previous config saved to /var/cache/conftool/dbconfig/20221202-081106-root.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42210 and previous config saved to /var/cache/conftool/dbconfig/20221202-075601-root.json
  • 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42209 and previous config saved to /var/cache/conftool/dbconfig/20221202-074300-ladsgroup.json
  • 07:41 moritzm: draining ganeti5001 for eventual decom T322048
  • 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42208 and previous config saved to /var/cache/conftool/dbconfig/20221202-072755-ladsgroup.json
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42207 and previous config saved to /var/cache/conftool/dbconfig/20221202-071250-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42206 and previous config saved to /var/cache/conftool/dbconfig/20221202-065745-ladsgroup.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P42204 and previous config saved to /var/cache/conftool/dbconfig/20221202-061259-marostegui.json
  • 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(45|46).eqiad.wmnet,cluster=jobrunner
  • 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(39|40).eqiad.wmnet,cluster=videoscaler
  • 00:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster

2022-12-01

  • 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1347-1348].eqiad.wmnet
  • 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 23:45 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 23:43 rzl@cumin1001: START - Cookbook sre.dns.netbox
  • 23:37 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1347-1348].eqiad.wmnet
  • 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1327-1346].eqiad.wmnet
  • 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 23:34 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 23:31 rzl@cumin1001: START - Cookbook sre.dns.netbox
  • 22:59 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1327-1346].eqiad.wmnet
  • 22:57 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue (duration: 07m 28s)
  • 22:57 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1320.eqiad.wmnet # T306162
  • 22:56 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1312.eqiad.wmnet # T306162
  • 22:54 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1307-1326].eqiad.wmnet
  • 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1307-1326].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 22:50 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue
  • 22:49 urbanecm@deploy1002: backport aborted: (duration: 00m 03s)
  • 22:42 andrewbogott: upgradedwikitech-static-ord (aka wikitech-static) to Debian Buster, installed php7.4, upgraded MW to 1_39. Will delete the rackspace backup image in a few days.
  • 22:19 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1307-1326].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 22:07 rzl@cumin1001: START - Cookbook sre.dns.netbox
  • 22:02 cwhite: restart swift-proxy on thanos::frontend eqiad
  • 22:01 brennen: end of utc late backport & config window
  • 21:46 brennen@deploy1002: Finished scap: Backport for GrowthExperiments: Enable user impact refresh script on pilot wikis (T322541) (duration: 07m 48s)
  • 21:40 brennen@deploy1002: brennen and kharlan: Backport for GrowthExperiments: Enable user impact refresh script on pilot wikis (T322541) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:38 brennen@deploy1002: Started scap: Backport for GrowthExperiments: Enable user impact refresh script on pilot wikis (T322541)
  • 21:34 brennen@deploy1002: Finished scap: Backport for New configs for android schemas (duration: 09m 49s)
  • 21:26 brennen@deploy1002: brennen and sharvaniharan: Backport for New configs for android schemas synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:25 andrewbogott: saving an image of wikitech-static-ord (aka wikitech-static) before upgrading the host to Buster
  • 21:25 brennen@deploy1002: Started scap: Backport for New configs for android schemas
  • 21:22 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1307-1326].eqiad.wmnet
  • 21:21 brennen@deploy1002: Finished scap: Backport for Start writing to cul_actor on test wikis (T233004) (duration: 14m 56s)
  • 21:13 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts mw[1307-1326].eqiad.wmnet
  • 21:10 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1307-1326].eqiad.wmnet
  • 21:08 brennen@deploy1002: brennen and zabe: Backport for Start writing to cul_actor on test wikis (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:06 brennen@deploy1002: Started scap: Backport for Start writing to cul_actor on test wikis (T233004)
  • 20:47 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gitlab1004.wikimedia.org
  • 20:47 aokoth@cumin1001: START - Cookbook sre.hosts.remove-downtime for gitlab1004.wikimedia.org
  • 20:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1061.eqiad.wmnet with OS bullseye
  • 20:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 20:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 20:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 20:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 20:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version https://phabricator.wikmiedia.org/T324195
  • 19:59 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version https://phabricator.wikmiedia.org/T324195
  • 19:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1061.eqiad.wmnet with OS bullseye
  • 19:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1061']
  • 19:44 mutante: gitlab-runner1002 - upgrading gitlab-runner package
  • 19:44 rzl@cumin2002: conftool action : set/pooled=inactive; selector: name=mw13(0[7-9]|[1-3]\d|4[0-8])\..*
  • 19:43 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 42 hosts with reason: decom
  • 19:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T323907)', diff saved to https://phabricator.wikimedia.org/P42201 and previous config saved to /var/cache/conftool/dbconfig/20221201-194301-ladsgroup.json
  • 19:42 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 42 hosts with reason: decom
  • 19:41 mutante: gitlab2002 (gitlab-replica) - upgrading gitlab-ce
  • 19:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
  • 19:39 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns5004.wikimedia.org with OS buster
  • 19:38 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 19:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1061']
  • 19:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 19:28 dancy@deploy1002: Finished scap: testing k8s deployment (duration: 06m 17s)
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42200 and previous config saved to /var/cache/conftool/dbconfig/20221201-192755-ladsgroup.json
  • 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 19:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1061']
  • 19:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
  • 19:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 19:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1060.eqiad.wmnet with OS bullseye
  • 19:21 dancy@deploy1002: Started scap: testing k8s deployment
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 19:16 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.12 refs T320517
  • 19:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1061']
  • 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42199 and previous config saved to /var/cache/conftool/dbconfig/20221201-191248-ladsgroup.json
  • 19:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 19:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 19:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 19:02 dancy@deploy1002: Installation of scap version "4.30.0" completed for 601 hosts
  • 19:01 dancy@deploy1002: Installing scap version "4.30.0" for 601 hosts
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T323907)', diff saved to https://phabricator.wikimedia.org/P42197 and previous config saved to /var/cache/conftool/dbconfig/20221201-185742-ladsgroup.json
  • 18:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 18:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 18:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 18:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1061']
  • 18:37 rzl@cumin2002: conftool action : set/pooled=no; selector: name=mw13(0[7-9]|[1-3]\d|4[0-8])\..*
  • 18:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 18:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 18:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 18:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 18:26 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 18:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 18:25 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 18:21 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:19 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:19 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:17 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:17 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:16 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1059.eqiad.wmnet with OS bullseye
  • 18:14 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 18:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1060.eqiad.wmnet with OS bullseye
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T323907)', diff saved to https://phabricator.wikimedia.org/P42196 and previous config saved to /var/cache/conftool/dbconfig/20221201-181215-ladsgroup.json
  • 18:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T323907)', diff saved to https://phabricator.wikimedia.org/P42195 and previous config saved to /var/cache/conftool/dbconfig/20221201-181153-ladsgroup.json
  • 18:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1060']
  • 18:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
  • 18:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1058.eqiad.wmnet with OS bullseye
  • 18:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
  • 18:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 17:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42194 and previous config saved to /var/cache/conftool/dbconfig/20221201-175647-ladsgroup.json
  • 17:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 17:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 17:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1060']
  • 17:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
  • 17:47 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1060']
  • 17:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
  • 17:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1060']
  • 17:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1059.eqiad.wmnet with OS bullseye
  • 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1058.eqiad.wmnet with OS bullseye
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42193 and previous config saved to /var/cache/conftool/dbconfig/20221201-174140-ladsgroup.json
  • 17:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1058']
  • 17:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1059']
  • 17:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1057']
  • 17:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
  • 17:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1057']
  • 17:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1056.eqiad.wmnet with OS bullseye
  • 17:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1057']
  • 17:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1059']
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T323907)', diff saved to https://phabricator.wikimedia.org/P42192 and previous config saved to /var/cache/conftool/dbconfig/20221201-172634-ladsgroup.json
  • 17:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1058']
  • 17:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1058']
  • 17:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1059']
  • 17:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
  • 17:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T323907)', diff saved to https://phabricator.wikimedia.org/P42191 and previous config saved to /var/cache/conftool/dbconfig/20221201-171335-ladsgroup.json
  • 17:08 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1059']
  • 17:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1058']
  • 17:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1056.eqiad.wmnet with OS bullseye
  • 17:01 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:59 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1057']
  • 16:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42190 and previous config saved to /var/cache/conftool/dbconfig/20221201-165828-ladsgroup.json
  • 16:56 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:55 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 16:50 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns5004
  • 16:50 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns5004
  • 16:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1057']
  • 16:49 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:49 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5004 fix - robh@cumin2002"
  • 16:48 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5004 fix - robh@cumin2002"
  • 16:46 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T323907)', diff saved to https://phabricator.wikimedia.org/P42189 and previous config saved to /var/cache/conftool/dbconfig/20221201-164509-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T323907)', diff saved to https://phabricator.wikimedia.org/P42188 and previous config saved to /var/cache/conftool/dbconfig/20221201-164437-ladsgroup.json
  • 16:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 16:43 moritzm: installing ini4j security updates
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42187 and previous config saved to /var/cache/conftool/dbconfig/20221201-164322-ladsgroup.json
  • 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1056']
  • 16:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 16:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 16:36 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 16:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1057']
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42185 and previous config saved to /var/cache/conftool/dbconfig/20221201-162930-ladsgroup.json
  • 16:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T323907)', diff saved to https://phabricator.wikimedia.org/P42184 and previous config saved to /var/cache/conftool/dbconfig/20221201-162815-ladsgroup.json
  • 16:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1056']
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42183 and previous config saved to /var/cache/conftool/dbconfig/20221201-161424-ladsgroup.json
  • 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1055']
  • 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1056']
  • 16:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1054']
  • 16:00 effie: php7.4 upgrade + apache upgrade + rolling restarts of parsoid servers - T323358
  • 16:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1055']
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T323907)', diff saved to https://phabricator.wikimedia.org/P42182 and previous config saved to /var/cache/conftool/dbconfig/20221201-155917-ladsgroup.json
  • 15:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1055']
  • 15:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1056']
  • 15:57 effie: php7.4 upgrade + apache upgrade + rolling restarts of jobrunners/videoscalers servers - T323358
  • 15:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1054']
  • 15:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1054']
  • 15:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1055']
  • 15:41 effie: php7.4 upgrade + apache upgrade + rolling restarts of api servers - T323358
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T323907)', diff saved to https://phabricator.wikimedia.org/P42181 and previous config saved to /var/cache/conftool/dbconfig/20221201-153918-ladsgroup.json
  • 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42180 and previous config saved to /var/cache/conftool/dbconfig/20221201-153856-ladsgroup.json
  • 15:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns5001.wikimedia.org
  • 15:38 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 15:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1054']
  • 15:36 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 15:34 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:28 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns5001.wikimedia.org
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42179 and previous config saved to /var/cache/conftool/dbconfig/20221201-152350-ladsgroup.json
  • 15:12 effie: php7.4 upgrade + apache upgrade + rolling restarts of app servers - T323358
  • 15:11 sukhe: [done] homer "cr*-eqsin*" commit "running homer for Gerrit: 862321"
  • 15:10 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862321"
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42178 and previous config saved to /var/cache/conftool/dbconfig/20221201-150843-ladsgroup.json
  • 15:01 Lucas_WMDE: UTC afternoon backport+config window done
  • 15:00 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable limited width on plwikisource MAIN namespace (T323185) (duration: 08m 06s)
  • 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:53 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and soda: Backport for Enable limited width on plwikisource MAIN namespace (T323185) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42177 and previous config saved to /var/cache/conftool/dbconfig/20221201-145337-ladsgroup.json
  • 14:52 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable limited width on plwikisource MAIN namespace (T323185)
  • 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:50 moritzm: installing krb5 security updates
  • 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:45 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: Enable new impact module on testwiki (T323526) (duration: 06m 12s)
  • 14:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:42 XioNoX: add BGP sessions to RIPE RIS in drmrs
  • 14:40 kharlan@deploy1002: kharlan and kharlan: Backport for GrowthExperiments: Enable new impact module on testwiki (T323526) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:39 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: Enable new impact module on testwiki (T323526)
  • 14:36 kharlan@deploy1002: Finished scap: Backport for [no-op] GrowthExperiments: Enable D3 in production (T318854) (duration: 06m 04s)
  • 14:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:31 kharlan@deploy1002: kharlan and tgr: Backport for [no-op] GrowthExperiments: Enable D3 in production (T318854) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:30 kharlan@deploy1002: Started scap: Backport for [no-op] GrowthExperiments: Enable D3 in production (T318854)
  • 14:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:27 kharlan@deploy1002: Finished scap: Backport for DatabaseUserImpactStore: Fix parameter style for upsert keys (T324188) (duration: 07m 25s)
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T323907)', diff saved to https://phabricator.wikimedia.org/P42176 and previous config saved to /var/cache/conftool/dbconfig/20221201-142735-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:21 kharlan@deploy1002: kharlan and kharlan: Backport for DatabaseUserImpactStore: Fix parameter style for upsert keys (T324188) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:20 kharlan@deploy1002: Started scap: Backport for DatabaseUserImpactStore: Fix parameter style for upsert keys (T324188)
  • 14:00 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust DNS for LVS eqsin. - cmooney@cumin1001"
  • 13:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust DNS for LVS eqsin. - cmooney@cumin1001"
  • 13:28 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42175 and previous config saved to /var/cache/conftool/dbconfig/20221201-132000-ladsgroup.json
  • 13:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T323907)', diff saved to https://phabricator.wikimedia.org/P42174 and previous config saved to /var/cache/conftool/dbconfig/20221201-131950-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42172 and previous config saved to /var/cache/conftool/dbconfig/20221201-130443-ladsgroup.json
  • 12:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42171 and previous config saved to /var/cache/conftool/dbconfig/20221201-125821-ladsgroup.json
  • 12:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 12:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 12:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 12:49 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42170 and previous config saved to /var/cache/conftool/dbconfig/20221201-124936-ladsgroup.json
  • 12:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 12:48 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 12:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 12:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 12:43 moritzm: installing glibc security updates on buster
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42169 and previous config saved to /var/cache/conftool/dbconfig/20221201-124314-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T323907)', diff saved to https://phabricator.wikimedia.org/P42168 and previous config saved to /var/cache/conftool/dbconfig/20221201-123430-ladsgroup.json
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42167 and previous config saved to /var/cache/conftool/dbconfig/20221201-122807-ladsgroup.json
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42166 and previous config saved to /var/cache/conftool/dbconfig/20221201-121301-ladsgroup.json
  • 12:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T318605)', diff saved to https://phabricator.wikimedia.org/P42165 and previous config saved to /var/cache/conftool/dbconfig/20221201-120102-ladsgroup.json
  • 11:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 11:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 11:47 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 11:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 11:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42164 and previous config saved to /var/cache/conftool/dbconfig/20221201-114555-ladsgroup.json
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42163 and previous config saved to /var/cache/conftool/dbconfig/20221201-113049-ladsgroup.json
  • 11:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:18 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Fix broken search with vector-2022 on www.wikidata.org (T324148) (duration: 06m 56s)
  • 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T318605)', diff saved to https://phabricator.wikimedia.org/P42162 and previous config saved to /var/cache/conftool/dbconfig/20221201-111542-ladsgroup.json
  • 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:12 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and migr: Backport for Fix broken search with vector-2022 on www.wikidata.org (T324148) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 11:11 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Fix broken search with vector-2022 on www.wikidata.org (T324148)
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T318605)', diff saved to https://phabricator.wikimedia.org/P42161 and previous config saved to /var/cache/conftool/dbconfig/20221201-110938-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T318605)', diff saved to https://phabricator.wikimedia.org/P42160 and previous config saved to /var/cache/conftool/dbconfig/20221201-110916-ladsgroup.json
  • 11:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 11:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T323907)', diff saved to https://phabricator.wikimedia.org/P42159 and previous config saved to /var/cache/conftool/dbconfig/20221201-105938-ladsgroup.json
  • 10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42158 and previous config saved to /var/cache/conftool/dbconfig/20221201-105916-ladsgroup.json
  • 10:57 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-web
  • 10:56 elukey: deleted knative controller + net-istio controllers on ml-serve-eqiad to clear out some weird state (causing high latencies for the k8s api)
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42157 and previous config saved to /var/cache/conftool/dbconfig/20221201-105410-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42156 and previous config saved to /var/cache/conftool/dbconfig/20221201-104409-ladsgroup.json
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42155 and previous config saved to /var/cache/conftool/dbconfig/20221201-103903-ladsgroup.json
  • 10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42154 and previous config saved to /var/cache/conftool/dbconfig/20221201-103448-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42153 and previous config saved to /var/cache/conftool/dbconfig/20221201-103426-ladsgroup.json
  • 10:34 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 10:34 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42152 and previous config saved to /var/cache/conftool/dbconfig/20221201-102903-ladsgroup.json
  • 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T318605)', diff saved to https://phabricator.wikimedia.org/P42151 and previous config saved to /var/cache/conftool/dbconfig/20221201-102357-ladsgroup.json
  • 10:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42150 and previous config saved to /var/cache/conftool/dbconfig/20221201-101920-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T318605)', diff saved to https://phabricator.wikimedia.org/P42149 and previous config saved to /var/cache/conftool/dbconfig/20221201-101754-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42148 and previous config saved to /var/cache/conftool/dbconfig/20221201-101733-ladsgroup.json
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42147 and previous config saved to /var/cache/conftool/dbconfig/20221201-101356-ladsgroup.json
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42146 and previous config saved to /var/cache/conftool/dbconfig/20221201-100413-ladsgroup.json
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42145 and previous config saved to /var/cache/conftool/dbconfig/20221201-100227-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42144 and previous config saved to /var/cache/conftool/dbconfig/20221201-094907-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42143 and previous config saved to /var/cache/conftool/dbconfig/20221201-094720-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42142 and previous config saved to /var/cache/conftool/dbconfig/20221201-093214-ladsgroup.json
  • 09:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42141 and previous config saved to /var/cache/conftool/dbconfig/20221201-092455-ladsgroup.json
  • 09:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T318605)', diff saved to https://phabricator.wikimedia.org/P42140 and previous config saved to /var/cache/conftool/dbconfig/20221201-092434-ladsgroup.json
  • 09:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:19 kostajh: UTC morning deploys done
  • 09:18 kharlan@deploy1002: Finished scap: Backport for User impact: Fix per-page pageview numbers (T323253) (duration: 08m 31s)
  • 09:15 Emperor: depool, restart, repool swift-proxy on ms-fe1011
  • 09:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:11 kharlan@deploy1002: kharlan and kharlan: Backport for User impact: Fix per-page pageview numbers (T323253) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:09 kharlan@deploy1002: Started scap: Backport for User impact: Fix per-page pageview numbers (T323253)
  • 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42139 and previous config saved to /var/cache/conftool/dbconfig/20221201-090927-ladsgroup.json
  • 09:07 moritzm: rebuilding raid on ganeti2013 T323222
  • 09:01 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2013.codfw.wmnet
  • 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42138 and previous config saved to /var/cache/conftool/dbconfig/20221201-085421-ladsgroup.json
  • 08:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
  • 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 08:49 volans: restart idrac on mw1334, ipmi and remote ipmi works fine, ssh not responding
  • 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42137 and previous config saved to /var/cache/conftool/dbconfig/20221201-084147-ladsgroup.json
  • 08:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 08:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T323907)', diff saved to https://phabricator.wikimedia.org/P42136 and previous config saved to /var/cache/conftool/dbconfig/20221201-084125-ladsgroup.json
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P42135 and previous config saved to /var/cache/conftool/dbconfig/20221201-084026-ladsgroup.json
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T318605)', diff saved to https://phabricator.wikimedia.org/P42134 and previous config saved to /var/cache/conftool/dbconfig/20221201-083914-ladsgroup.json
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42131 and previous config saved to /var/cache/conftool/dbconfig/20221201-082619-ladsgroup.json
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P42130 and previous config saved to /var/cache/conftool/dbconfig/20221201-082519-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T318605)', diff saved to https://phabricator.wikimedia.org/P42129 and previous config saved to /var/cache/conftool/dbconfig/20221201-082215-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T318605)', diff saved to https://phabricator.wikimedia.org/P42128 and previous config saved to /var/cache/conftool/dbconfig/20221201-082154-ladsgroup.json
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42127 and previous config saved to /var/cache/conftool/dbconfig/20221201-081444-ladsgroup.json
  • 08:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T323907)', diff saved to https://phabricator.wikimedia.org/P42126 and previous config saved to /var/cache/conftool/dbconfig/20221201-081433-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42125 and previous config saved to /var/cache/conftool/dbconfig/20221201-081112-ladsgroup.json
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P42124 and previous config saved to /var/cache/conftool/dbconfig/20221201-081013-ladsgroup.json
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42123 and previous config saved to /var/cache/conftool/dbconfig/20221201-080647-ladsgroup.json
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42122 and previous config saved to /var/cache/conftool/dbconfig/20221201-075927-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T323907)', diff saved to https://phabricator.wikimedia.org/P42120 and previous config saved to /var/cache/conftool/dbconfig/20221201-075606-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P42119 and previous config saved to /var/cache/conftool/dbconfig/20221201-075506-ladsgroup.json
  • 07:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 400474
  • 07:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42118 and previous config saved to /var/cache/conftool/dbconfig/20221201-075140-ladsgroup.json
  • 07:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 400474
  • 07:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42117 and previous config saved to /var/cache/conftool/dbconfig/20221201-074420-ladsgroup.json
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T318605)', diff saved to https://phabricator.wikimedia.org/P42116 and previous config saved to /var/cache/conftool/dbconfig/20221201-073634-ladsgroup.json
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T318605)', diff saved to https://phabricator.wikimedia.org/P42115 and previous config saved to /var/cache/conftool/dbconfig/20221201-073015-ladsgroup.json
  • 07:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T323907)', diff saved to https://phabricator.wikimedia.org/P42114 and previous config saved to /var/cache/conftool/dbconfig/20221201-072914-ladsgroup.json
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42113 and previous config saved to /var/cache/conftool/dbconfig/20221201-072659-ladsgroup.json
  • 07:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T323907)', diff saved to https://phabricator.wikimedia.org/P42111 and previous config saved to /var/cache/conftool/dbconfig/20221201-071641-ladsgroup.json
  • 07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T323907)', diff saved to https://phabricator.wikimedia.org/P42110 and previous config saved to /var/cache/conftool/dbconfig/20221201-071615-ladsgroup.json
  • 07:14 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 07:13 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 07:13 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 07:13 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 07:12 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 07:12 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42109 and previous config saved to /var/cache/conftool/dbconfig/20221201-071153-ladsgroup.json
  • 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1163 T323547', diff saved to https://phabricator.wikimedia.org/P42108 and previous config saved to /var/cache/conftool/dbconfig/20221201-070758-ladsgroup.json
  • 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1118 to s1 primary and set section read-write T323547', diff saved to https://phabricator.wikimedia.org/P42107 and previous config saved to /var/cache/conftool/dbconfig/20221201-070203-ladsgroup.json
  • 07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T323547', diff saved to https://phabricator.wikimedia.org/P42106 and previous config saved to /var/cache/conftool/dbconfig/20221201-070131-ladsgroup.json
  • 07:01 Amir1: Starting s1 eqiad failover from db1163 to db1118 - T323547
  • 07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42105 and previous config saved to /var/cache/conftool/dbconfig/20221201-070108-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T318605)', diff saved to https://phabricator.wikimedia.org/P42104 and previous config saved to /var/cache/conftool/dbconfig/20221201-065737-ladsgroup.json
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42103 and previous config saved to /var/cache/conftool/dbconfig/20221201-065646-ladsgroup.json
  • 06:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42102 and previous config saved to /var/cache/conftool/dbconfig/20221201-064602-ladsgroup.json
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42101 and previous config saved to /var/cache/conftool/dbconfig/20221201-064230-ladsgroup.json
  • 06:42 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 06:42 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42100 and previous config saved to /var/cache/conftool/dbconfig/20221201-064140-ladsgroup.json
  • 06:41 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 06:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42099 and previous config saved to /var/cache/conftool/dbconfig/20221201-063930-ladsgroup.json
  • 06:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 06:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42098 and previous config saved to /var/cache/conftool/dbconfig/20221201-063908-ladsgroup.json
  • 06:36 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 06:35 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 06:31 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T323907)', diff saved to https://phabricator.wikimedia.org/P42097 and previous config saved to /var/cache/conftool/dbconfig/20221201-063055-ladsgroup.json
  • 06:30 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42096 and previous config saved to /var/cache/conftool/dbconfig/20221201-062724-ladsgroup.json
  • 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42095 and previous config saved to /var/cache/conftool/dbconfig/20221201-062402-ladsgroup.json
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T318605)', diff saved to https://phabricator.wikimedia.org/P42094 and previous config saved to /var/cache/conftool/dbconfig/20221201-061218-ladsgroup.json
  • 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42093 and previous config saved to /var/cache/conftool/dbconfig/20221201-060855-ladsgroup.json
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T318605)', diff saved to https://phabricator.wikimedia.org/P42092 and previous config saved to /var/cache/conftool/dbconfig/20221201-060230-ladsgroup.json
  • 06:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42091 and previous config saved to /var/cache/conftool/dbconfig/20221201-060206-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1118 with weight 0 T323547', diff saved to https://phabricator.wikimedia.org/P42090 and previous config saved to /var/cache/conftool/dbconfig/20221201-060157-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 37 hosts with reason: Primary switchover s1 T323547
  • 06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 37 hosts with reason: Primary switchover s1 T323547
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T323907)', diff saved to https://phabricator.wikimedia.org/P42089 and previous config saved to /var/cache/conftool/dbconfig/20221201-055359-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42088 and previous config saved to /var/cache/conftool/dbconfig/20221201-055349-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T323907)', diff saved to https://phabricator.wikimedia.org/P42087 and previous config saved to /var/cache/conftool/dbconfig/20221201-055337-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42086 and previous config saved to /var/cache/conftool/dbconfig/20221201-055239-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 05:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42085 and previous config saved to /var/cache/conftool/dbconfig/20221201-055218-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T323907)', diff saved to https://phabricator.wikimedia.org/P42084 and previous config saved to /var/cache/conftool/dbconfig/20221201-055142-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T323907)', diff saved to https://phabricator.wikimedia.org/P42083 and previous config saved to /var/cache/conftool/dbconfig/20221201-055120-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42082 and previous config saved to /var/cache/conftool/dbconfig/20221201-054653-ladsgroup.json
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42081 and previous config saved to /var/cache/conftool/dbconfig/20221201-053831-ladsgroup.json
  • 05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42080 and previous config saved to /var/cache/conftool/dbconfig/20221201-053711-ladsgroup.json
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42079 and previous config saved to /var/cache/conftool/dbconfig/20221201-053613-ladsgroup.json
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42078 and previous config saved to /var/cache/conftool/dbconfig/20221201-053147-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T322618)', diff saved to https://phabricator.wikimedia.org/P42077 and previous config saved to /var/cache/conftool/dbconfig/20221201-052524-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42076 and previous config saved to /var/cache/conftool/dbconfig/20221201-052325-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T322618)', diff saved to https://phabricator.wikimedia.org/P42075 and previous config saved to /var/cache/conftool/dbconfig/20221201-052223-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42074 and previous config saved to /var/cache/conftool/dbconfig/20221201-052205-ladsgroup.json
  • 05:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42073 and previous config saved to /var/cache/conftool/dbconfig/20221201-052107-ladsgroup.json
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T322618)', diff saved to https://phabricator.wikimedia.org/P42072 and previous config saved to /var/cache/conftool/dbconfig/20221201-052014-ladsgroup.json
  • 05:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 05:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T322618)', diff saved to https://phabricator.wikimedia.org/P42071 and previous config saved to /var/cache/conftool/dbconfig/20221201-051942-ladsgroup.json
  • 05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42070 and previous config saved to /var/cache/conftool/dbconfig/20221201-051640-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T323907)', diff saved to https://phabricator.wikimedia.org/P42069 and previous config saved to /var/cache/conftool/dbconfig/20221201-050818-ladsgroup.json
  • 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42068 and previous config saved to /var/cache/conftool/dbconfig/20221201-050658-ladsgroup.json
  • 05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T323907)', diff saved to https://phabricator.wikimedia.org/P42067 and previous config saved to /var/cache/conftool/dbconfig/20221201-050600-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42066 and previous config saved to /var/cache/conftool/dbconfig/20221201-050548-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 05:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P42065 and previous config saved to /var/cache/conftool/dbconfig/20221201-050527-ladsgroup.json
  • 05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P42064 and previous config saved to /var/cache/conftool/dbconfig/20221201-050435-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42063 and previous config saved to /var/cache/conftool/dbconfig/20221201-045020-ladsgroup.json
  • 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P42062 and previous config saved to /var/cache/conftool/dbconfig/20221201-044929-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42061 and previous config saved to /var/cache/conftool/dbconfig/20221201-044053-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42060 and previous config saved to /var/cache/conftool/dbconfig/20221201-044031-ladsgroup.json
  • 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42059 and previous config saved to /var/cache/conftool/dbconfig/20221201-043514-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T322618)', diff saved to https://phabricator.wikimedia.org/P42058 and previous config saved to /var/cache/conftool/dbconfig/20221201-043422-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T322618)', diff saved to https://phabricator.wikimedia.org/P42057 and previous config saved to /var/cache/conftool/dbconfig/20221201-043315-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T322618)', diff saved to https://phabricator.wikimedia.org/P42056 and previous config saved to /var/cache/conftool/dbconfig/20221201-043253-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42055 and previous config saved to /var/cache/conftool/dbconfig/20221201-042525-ladsgroup.json
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T323907)', diff saved to https://phabricator.wikimedia.org/P42054 and previous config saved to /var/cache/conftool/dbconfig/20221201-042251-ladsgroup.json
  • 04:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 04:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42053 and previous config saved to /var/cache/conftool/dbconfig/20221201-042229-ladsgroup.json
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P42052 and previous config saved to /var/cache/conftool/dbconfig/20221201-042008-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P42051 and previous config saved to /var/cache/conftool/dbconfig/20221201-041758-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P42050 and previous config saved to /var/cache/conftool/dbconfig/20221201-041747-ladsgroup.json
  • 04:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 04:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 04:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P42049 and previous config saved to /var/cache/conftool/dbconfig/20221201-041652-ladsgroup.json
  • 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T322618)', diff saved to https://phabricator.wikimedia.org/P42048 and previous config saved to /var/cache/conftool/dbconfig/20221201-041322-ladsgroup.json
  • 04:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42047 and previous config saved to /var/cache/conftool/dbconfig/20221201-041018-ladsgroup.json
  • 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42046 and previous config saved to /var/cache/conftool/dbconfig/20221201-040723-ladsgroup.json
  • 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P42045 and previous config saved to /var/cache/conftool/dbconfig/20221201-040240-ladsgroup.json
  • 04:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42044 and previous config saved to /var/cache/conftool/dbconfig/20221201-040145-ladsgroup.json
  • 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P42043 and previous config saved to /var/cache/conftool/dbconfig/20221201-035816-ladsgroup.json
  • 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42042 and previous config saved to /var/cache/conftool/dbconfig/20221201-035512-ladsgroup.json
  • 03:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42041 and previous config saved to /var/cache/conftool/dbconfig/20221201-035216-ladsgroup.json
  • 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T322618)', diff saved to https://phabricator.wikimedia.org/P42040 and previous config saved to /var/cache/conftool/dbconfig/20221201-034734-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42039 and previous config saved to /var/cache/conftool/dbconfig/20221201-034639-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T322618)', diff saved to https://phabricator.wikimedia.org/P42038 and previous config saved to /var/cache/conftool/dbconfig/20221201-034627-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T322618)', diff saved to https://phabricator.wikimedia.org/P42037 and previous config saved to /var/cache/conftool/dbconfig/20221201-034527-ladsgroup.json
  • 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P42036 and previous config saved to /var/cache/conftool/dbconfig/20221201-034309-ladsgroup.json
  • 03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42035 and previous config saved to /var/cache/conftool/dbconfig/20221201-033710-ladsgroup.json
  • 03:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5027.eqsin.wmnet with OS buster
  • 03:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T323907)', diff saved to https://phabricator.wikimedia.org/P42034 and previous config saved to /var/cache/conftool/dbconfig/20221201-033449-ladsgroup.json
  • 03:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 03:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 03:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P42033 and previous config saved to /var/cache/conftool/dbconfig/20221201-033132-ladsgroup.json
  • 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P42032 and previous config saved to /var/cache/conftool/dbconfig/20221201-033020-ladsgroup.json
  • 03:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P42031 and previous config saved to /var/cache/conftool/dbconfig/20221201-032922-ladsgroup.json
  • 03:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 03:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 03:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P42030 and previous config saved to /var/cache/conftool/dbconfig/20221201-032901-ladsgroup.json
  • 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T322618)', diff saved to https://phabricator.wikimedia.org/P42029 and previous config saved to /var/cache/conftool/dbconfig/20221201-032803-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T322618)', diff saved to https://phabricator.wikimedia.org/P42028 and previous config saved to /var/cache/conftool/dbconfig/20221201-032553-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42027 and previous config saved to /var/cache/conftool/dbconfig/20221201-032531-ladsgroup.json
  • 03:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42026 and previous config saved to /var/cache/conftool/dbconfig/20221201-031608-ladsgroup.json
  • 03:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 03:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 03:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42025 and previous config saved to /var/cache/conftool/dbconfig/20221201-031546-ladsgroup.json
  • 03:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P42024 and previous config saved to /var/cache/conftool/dbconfig/20221201-031514-ladsgroup.json
  • 03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42023 and previous config saved to /var/cache/conftool/dbconfig/20221201-031354-ladsgroup.json
  • 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P42022 and previous config saved to /var/cache/conftool/dbconfig/20221201-031024-ladsgroup.json
  • 03:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
  • 03:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
  • 03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42021 and previous config saved to /var/cache/conftool/dbconfig/20221201-030040-ladsgroup.json
  • 03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T322618)', diff saved to https://phabricator.wikimedia.org/P42020 and previous config saved to /var/cache/conftool/dbconfig/20221201-030007-ladsgroup.json
  • 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T322618)', diff saved to https://phabricator.wikimedia.org/P42019 and previous config saved to /var/cache/conftool/dbconfig/20221201-025900-ladsgroup.json
  • 02:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42018 and previous config saved to /var/cache/conftool/dbconfig/20221201-025848-ladsgroup.json
  • 02:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T322618)', diff saved to https://phabricator.wikimedia.org/P42017 and previous config saved to /var/cache/conftool/dbconfig/20221201-025838-ladsgroup.json
  • 02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P42016 and previous config saved to /var/cache/conftool/dbconfig/20221201-025517-ladsgroup.json
  • 02:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42015 and previous config saved to /var/cache/conftool/dbconfig/20221201-024533-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P42014 and previous config saved to /var/cache/conftool/dbconfig/20221201-024341-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P42013 and previous config saved to /var/cache/conftool/dbconfig/20221201-024331-ladsgroup.json
  • 02:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P42012 and previous config saved to /var/cache/conftool/dbconfig/20221201-024131-ladsgroup.json
  • 02:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 02:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 02:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P42011 and previous config saved to /var/cache/conftool/dbconfig/20221201-024110-ladsgroup.json
  • 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42010 and previous config saved to /var/cache/conftool/dbconfig/20221201-024011-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42009 and previous config saved to /var/cache/conftool/dbconfig/20221201-023801-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P42008 and previous config saved to /var/cache/conftool/dbconfig/20221201-023750-ladsgroup.json
  • 02:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
  • 02:33 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5027.eqsin.wmnet with OS buster
  • 02:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42007 and previous config saved to /var/cache/conftool/dbconfig/20221201-023027-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P42006 and previous config saved to /var/cache/conftool/dbconfig/20221201-022825-ladsgroup.json
  • 02:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42005 and previous config saved to /var/cache/conftool/dbconfig/20221201-022603-ladsgroup.json
  • 02:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P42004 and previous config saved to /var/cache/conftool/dbconfig/20221201-022244-ladsgroup.json
  • 02:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
  • 02:21 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5027.eqsin.wmnet with OS buster
  • 02:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
  • 02:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5027.eqsin.wmnet with OS buster
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T322618)', diff saved to https://phabricator.wikimedia.org/P42003 and previous config saved to /var/cache/conftool/dbconfig/20221201-021318-ladsgroup.json
  • 02:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-coord - cmjohnson@cumin1001"
  • 02:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T322618)', diff saved to https://phabricator.wikimedia.org/P42002 and previous config saved to /var/cache/conftool/dbconfig/20221201-021211-ladsgroup.json
  • 02:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 02:12 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-coord - cmjohnson@cumin1001"
  • 02:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 02:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T322618)', diff saved to https://phabricator.wikimedia.org/P42001 and previous config saved to /var/cache/conftool/dbconfig/20221201-021149-ladsgroup.json
  • 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42000 and previous config saved to /var/cache/conftool/dbconfig/20221201-021057-ladsgroup.json
  • 02:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 02:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 02:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P41999 and previous config saved to /var/cache/conftool/dbconfig/20221201-020737-ladsgroup.json
  • 02:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 02:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 02:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P41998 and previous config saved to /var/cache/conftool/dbconfig/20221201-020308-ladsgroup.json
  • 02:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 02:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 01:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cephosd - cmjohnson@cumin1001"
  • 01:58 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cephosd - cmjohnson@cumin1001"
  • 01:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P41997 and previous config saved to /var/cache/conftool/dbconfig/20221201-015643-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P41996 and previous config saved to /var/cache/conftool/dbconfig/20221201-015550-ladsgroup.json
  • 01:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 01:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P41995 and previous config saved to /var/cache/conftool/dbconfig/20221201-015340-ladsgroup.json
  • 01:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 01:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P41994 and previous config saved to /var/cache/conftool/dbconfig/20221201-015332-ladsgroup.json
  • 01:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 01:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 01:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41993 and previous config saved to /var/cache/conftool/dbconfig/20221201-015230-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P41992 and previous config saved to /var/cache/conftool/dbconfig/20221201-015115-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 01:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41991 and previous config saved to /var/cache/conftool/dbconfig/20221201-015020-ladsgroup.json
  • 01:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 01:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41990 and previous config saved to /var/cache/conftool/dbconfig/20221201-015010-ladsgroup.json
  • 01:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P41989 and previous config saved to /var/cache/conftool/dbconfig/20221201-014136-ladsgroup.json
  • 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P41988 and previous config saved to /var/cache/conftool/dbconfig/20221201-013503-ladsgroup.json
  • 01:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
  • 01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T322618)', diff saved to https://phabricator.wikimedia.org/P41987 and previous config saved to /var/cache/conftool/dbconfig/20221201-012630-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T322618)', diff saved to https://phabricator.wikimedia.org/P41986 and previous config saved to /var/cache/conftool/dbconfig/20221201-012522-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 01:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T322618)', diff saved to https://phabricator.wikimedia.org/P41985 and previous config saved to /var/cache/conftool/dbconfig/20221201-012500-ladsgroup.json
  • 01:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5026.eqsin.wmnet with OS buster
  • 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P41984 and previous config saved to /var/cache/conftool/dbconfig/20221201-011957-ladsgroup.json
  • 01:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P41983 and previous config saved to /var/cache/conftool/dbconfig/20221201-010954-ladsgroup.json
  • 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41982 and previous config saved to /var/cache/conftool/dbconfig/20221201-010450-ladsgroup.json
  • 01:04 ejegg: payments-wiki upgraded from 96c74911 to c52a6a39
  • 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41981 and previous config saved to /var/cache/conftool/dbconfig/20221201-010240-ladsgroup.json
  • 01:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T322618)', diff saved to https://phabricator.wikimedia.org/P41980 and previous config saved to /var/cache/conftool/dbconfig/20221201-010219-ladsgroup.json
  • 00:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
  • 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P41979 and previous config saved to /var/cache/conftool/dbconfig/20221201-005447-ladsgroup.json
  • 00:53 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
  • 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P41978 and previous config saved to /var/cache/conftool/dbconfig/20221201-004712-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T322618)', diff saved to https://phabricator.wikimedia.org/P41977 and previous config saved to /var/cache/conftool/dbconfig/20221201-003941-ladsgroup.json
  • 00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T322618)', diff saved to https://phabricator.wikimedia.org/P41976 and previous config saved to /var/cache/conftool/dbconfig/20221201-003533-ladsgroup.json
  • 00:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 00:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T322618)', diff saved to https://phabricator.wikimedia.org/P41975 and previous config saved to /var/cache/conftool/dbconfig/20221201-003511-ladsgroup.json
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P41974 and previous config saved to /var/cache/conftool/dbconfig/20221201-003205-ladsgroup.json
  • 00:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS buster
  • 00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1206.eqiad.wmnet with OS bullseye
  • 00:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P41973 and previous config saved to /var/cache/conftool/dbconfig/20221201-002005-ladsgroup.json
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T322618)', diff saved to https://phabricator.wikimedia.org/P41972 and previous config saved to /var/cache/conftool/dbconfig/20221201-001659-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T322618)', diff saved to https://phabricator.wikimedia.org/P41971 and previous config saved to /var/cache/conftool/dbconfig/20221201-001449-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T322618)', diff saved to https://phabricator.wikimedia.org/P41970 and previous config saved to /var/cache/conftool/dbconfig/20221201-001427-ladsgroup.json
  • 00:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1206.eqiad.wmnet with reason: host reimage
  • 00:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1206.eqiad.wmnet with reason: host reimage
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P41969 and previous config saved to /var/cache/conftool/dbconfig/20221201-000458-ladsgroup.json

Other archives

2000s

2010s

2020s