Jump to content

Portal:Toolforge/Admin/Kubernetes/Upgrading Kubernetes/1.26 to 1.27 notes

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Working etherpad: https://etherpad.wikimedia.org/p/k8s-1.26-to-1.27-upgrade

Prepare packages

Toolsbeta

  • get list of nodes
 root@toolsbeta-test-k8s-control-10:~# for node in $(kubectl get nodes -o json | jq '.items[].metadata.name' -r); do echo "* [] $node"; done

prep

  • [x] run prepare upgrade cookbook
 ~ $ sudo cookbook wmcs.toolforge.k8s.prepare_upgrade --cluster-name toolsbeta --src-version 1.26.15 --dst-version 1.27.16 --task-id <id_of_task>

control nodes

  • run upgrade node cookbook
 sudo cookbook wmcs.toolforge.k8s.worker.upgrade --task-id <id_of_task> --src-version 1.26.15 --dst-version 1.27.16 --cluster-name toolsbeta --hostname <control_node_name>
  • check that services start healthy
  • depool control-<x> and <y> via haproxy, check that control-<z> is still doing ok
 ssh tools-test-k8s-haproxy-6.toolsbeta.eqiad1.wikimedia.cloud
 sudo puppet agent --disable "<user> k8s upgrade"
 sudo vim /etc/haproxy/conf.d/k8s-api-servers.cfg
 sudo systemctl reload haproxy

check:

 echo "show stat" | sudo socat stdio /run/haproxy/haproxy.sock | grep k8s-api

revert:

 sudo puppet agent --enable
 sudo run-puppet-agent
 sudo systemctl reload haproxy

toolsbeta-test-k8s-control-10

  • [x] run upgrade node cookbook
  • [x] check that services start healthy
  • [x] depool control-11 and -12 via haproxy, check that control-10 is still doing ok - not done as we have to upgrade all controls at once

toolsbeta-test-k8s-control-11

  • [x] run upgrade node cookbook
  • [x] check that services start healthy
  • [x] depool control-12 and -10 via haproxy, check that control-11 is still doing ok - not done as we have to upgrade all controls at once

toolsbeta-test-k8s-control-12

  • [x] run upgrade node cookbook
  • [x] check that services start healthy
  • [x] depool control-10 and -11 via haproxy, check that control-12 is still doing ok - not done as we have to upgrade all controls at once

worker nodes

  • run upgrade node cookbook for each
 sudo cookbook wmcs.toolforge.k8s.worker.upgrade --task-id <id_of_task> --src-version 1.26.15 --dst-version 1.27.16 --cluster-name toolsbeta --hostname <worker_node_name>
  • [x] toolsbeta-test-k8s-worker-nfs-5
  • [x] toolsbeta-test-k8s-worker-nfs-7
  • [x] toolsbeta-test-k8s-worker-nfs-8
  • [x] toolsbeta-test-k8s-worker-nfs-9
  • [x] toolsbeta-test-k8s-worker-12
  • [x] toolsbeta-test-k8s-worker-13

ingress nodes

  • run upgrade node cookbook for each
 sudo cookbook wmcs.toolforge.k8s.worker.upgrade --task-id <id_of_task> --src-version 1.26.15 --dst-version 1.27.16 --cluster-name toolsbeta --hostname <worker_node_name>
  • [x] toolsbeta-test-k8s-ingress-10
  • [x] toolsbeta-test-k8s-ingress-11
  • [x] toolsbeta-test-k8s-ingress-9

cleanup

  • [x] remove downtime
  • [x] revert topic change


Tools

  • get list of nodes
 root@tools-k8s-control-7:~# for node in $(kubectl get nodes -o json | jq '.items[].metadata.name' -r); do echo "* [] $node"; done

prep

  • [x] run prepare upgrade cookbook
 ~ $ sudo cookbook wmcs.toolforge.k8s.prepare_upgrade --cluster-name tools --src-version 1.26.15 --dst-version 1.27.16 --task-id <id_of_task>

control nodes

  • run upgrade node cookbook
 sudo cookbook wmcs.toolforge.k8s.worker.upgrade --task-id <id_of_task> --src-version 1.26.15 --dst-version 1.27.16 --cluster-name tools --hostname <control_node_name>
  • check that services start healthy
  • depool control-<x> and <y> via haproxy, check that control-<z> is still doing ok
 ssh tools-k8s-haproxy-5.tools.eqiad1.wikimedia.cloud
 sudo puppet agent --disable "<user> k8s upgrade"
 sudo nano /etc/haproxy/conf.d/k8s-api-servers.cfg
 sudo systemctl reload haproxy

check:

 echo "show stat" | sudo socat stdio /run/haproxy/haproxy.sock | grep k8s-api

revert:

 sudo puppet agent --enable
 sudo run-puppet-agent
 sudo systemctl reload haproxy

tools-k8s-control-7

  • [x] run upgrade node cookbook
  • [x] check that services start healthy
  • [x] depool control-8 and -9 via haproxy, check that control-7 is still doing ok

tools-k8s-control-8

  • [x] run upgrade node cookbook
  • [x] check that services start healthy
  • [x] depool control-7 and -9 via haproxy, check that control-8 is still doing ok

tools-k8s-control-9

  • [x] run upgrade node cookbook
  • [x] check that services start healthy
  • [x] depool control-7 and -8 via haproxy, check that control-9 is still doing ok

worker nodes

  • run upgrade node cookbook for each. it's ok to do a couple in parallel
 sudo cookbook wmcs.toolforge.k8s.worker.upgrade --task-id <id_of_task> --src-version 1.26.15 --dst-version 1.27.16 --cluster-name tools --hostname <worker_node_name>
  • [x] tools-k8s-worker-102
  • [x] tools-k8s-worker-103
  • [x] tools-k8s-worker-105
  • [x] tools-k8s-worker-106
  • [x] tools-k8s-worker-107
  • [x] tools-k8s-worker-108
  • [x] tools-k8s-worker-nfs-1
  • [x] tools-k8s-worker-nfs-10
  • [x] tools-k8s-worker-nfs-11
  • [x] tools-k8s-worker-nfs-12
  • [x] tools-k8s-worker-nfs-13
  • [x] tools-k8s-worker-nfs-14
  • [x] tools-k8s-worker-nfs-16
  • [x] tools-k8s-worker-nfs-17
  • [x] tools-k8s-worker-nfs-19
  • [x] tools-k8s-worker-nfs-2
  • [x] tools-k8s-worker-nfs-21
  • [x] tools-k8s-worker-nfs-22
  • [x] tools-k8s-worker-nfs-23
  • [x] tools-k8s-worker-nfs-24
  • [x] tools-k8s-worker-nfs-26
  • [x] tools-k8s-worker-nfs-27
  • [x] tools-k8s-worker-nfs-3
  • [x] tools-k8s-worker-nfs-32
  • [x] tools-k8s-worker-nfs-33
  • [x] tools-k8s-worker-nfs-34
  • [x] tools-k8s-worker-nfs-35
  • [x] tools-k8s-worker-nfs-36
  • [x] tools-k8s-worker-nfs-37
  • [x] tools-k8s-worker-nfs-38
  • [x] tools-k8s-worker-nfs-39
  • [x] tools-k8s-worker-nfs-40
  • [x] tools-k8s-worker-nfs-41
  • [x] tools-k8s-worker-nfs-42
  • [x] tools-k8s-worker-nfs-43
  • [x] tools-k8s-worker-nfs-44
  • [x] tools-k8s-worker-nfs-45
  • [x] tools-k8s-worker-nfs-46
  • [x] tools-k8s-worker-nfs-47
  • [x] tools-k8s-worker-nfs-48
  • [x] tools-k8s-worker-nfs-5
  • [x] tools-k8s-worker-nfs-50
  • [x] tools-k8s-worker-nfs-53
  • [x] tools-k8s-worker-nfs-54
  • [x] tools-k8s-worker-nfs-55
  • [x] tools-k8s-worker-nfs-57
  • [x] tools-k8s-worker-nfs-58
  • [x] tools-k8s-worker-nfs-61
  • [x] tools-k8s-worker-nfs-65
  • [x] tools-k8s-worker-nfs-66
  • [x] tools-k8s-worker-nfs-67
  • [x] tools-k8s-worker-nfs-68
  • [x] tools-k8s-worker-nfs-69
  • [x] tools-k8s-worker-nfs-7
  • [x] tools-k8s-worker-nfs-70
  • [x] tools-k8s-worker-nfs-71
  • [x] tools-k8s-worker-nfs-72
  • [x] tools-k8s-worker-nfs-73
  • [x] tools-k8s-worker-nfs-74
  • [x] tools-k8s-worker-nfs-75
  • [x] tools-k8s-worker-nfs-76
  • [x] tools-k8s-worker-nfs-8
  • [x] tools-k8s-worker-nfs-9

ingress nodes

  • [x] kubectl -n ingress-nginx-gen2 scale deployment ingress-nginx-gen2-controller --replicas=2

run upgrade node cookbook for each:

  • [x] tools-k8s-ingress-7
  • [x] tools-k8s-ingress-8
  • [x] tools-k8s-ingress-9
  • [x] revert afterwards: kubectl -n ingress-nginx-gen2 scale deployment ingress-nginx-gen2-controller --replicas=3

cleanup

  • [x] remove downtime
  • [x] revert topic change