Jump to content

Kubernetes/Administration/containerd migration

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Tracking task for the prep work that lead to this: https://phabricator.wikimedia.org/T362408

You can migrate your cluster to containerd right away, by:

  • Update the partman receipt in preseed.yaml to use partman/custom/kubernetes-node-containerd.cfg instead of partman/custom/kubernetes-node-overlay.cfg
    • Bigger /var/lib/kubelet partition (more space for emptyDir volumes and container logs)
    • /var/lib/containerd instead of /var/lib/docker
  • Update the nrpe_check_disk_options to also exclude the containerd partition:
profile::monitoring::nrpe_check_disk_options: -w 10% -c 5% -W 6% -K 3% -l -e -A -i '/(var/lib|run)/(docker|kubelet|containerd)/*' --exclude-type=tracefs
  • If your cluster has access to restricted container images, add the following to private hiera wherever you set profile::kubernetes::node::docker_kubernetes_user_password:
profile::containerd::registry_password: "%{lookup('kubernetes_docker_password')}"
  • Update your puppet roles for control planes and workers, replacing include profile::docker::engine with include profile::kubernetes::container_runtime
    • The include has to come after the include of profile::dragonfly::dfdaemon
    • profile::kubernetes::container_runtime will set up containerd on bookworm hosts and docker on bullseye and below
    • You can override that by hard coding the runtime on a role or host level: profile::kubernetes::container_runtime: docker

So ideally you have your nodes on bullseye still and everything above is more or less a noop until you kick of the reimage to bookworm. After that the node just comes back up with bookworm and containerd (no further action required).

There is a cookbook for rolling-reimage of stacked control planes: Add a cookbook to roll-reimage stacked k8s control planes (1081377) · Gerrit Code Review (probably not of interest to anyone outside ServiceOps as you don’t run stacked control planes).

When you're done with a cluster, please make sure to remove hiera keys that are no longer required (basically everything profile::docker::engine::* as well as profile::kubernetes::node::docker_kubernetes_user_password in private hiera.