The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This guide assumes you have a basic understanding of the various kubernetes components. If you don't, please refer to https://kubernetes.io/docs/concepts/overview/components/

This guide has been written to instruct a WMF SRE, it is NOT meant to be followed by non-SRE people.

Intro

This is a guide for adding or removing nodes from existing Kubernetes clusters.

Adding a node

Adding a node is a 4 step process, first we add the node to BGP via our network configuration manager, Homer, and then we create 3 puppet patches, which we merge one by one.

Step 0: DNS

Make sure that the node's DNS is properly configured both for IPv4 and IPv6.

Step 1: Add node to BGP

We have a calico setup, so nodes need to be able to establish BGP with their top of rack switch or the core routers.

To do so:

in Netbox, set the server's BGP custom field to True
On a Cumin host, run homer. The exact target depends on the node's location:
- If it's a VM or if it/s connected to eqiad/codfw rows A-D, target the core routers (cr*eqiad* or cr*codfw*)
- If it's a physical server in eqiad row E/F, target its top of rack switch (eg. lsw1-e1-eqiad)

Doing so before the reimage will cause BGP Status alerts. They are not a big deal, but be aware and either keep going with the reimage as soon as possible, or do your homer commit after the reimage is done.

Step 2: Node installation

Prepare a patch to assign the kubernetes worker role to the nodes, and add them to list of kubernetes workers: https://gerrit.wikimedia.org/r/c/operations/puppet/+/958487
Stop puppet on all new nodes
Merge the patch
Run puppet on Docker-registry
Run puppet on current nodes
Run puppet on new nodes, or reimage if needed
Reimage the node if needed, or simply run puppet on the host

Reimaging will make the node join the cluster, but it will be cordoned. For the new node to be fully functional, we need a puppet run on docker-registry nodes (see: task T273521) as well as on all other k8s nodes of the cluster (to set proper ferm rules). If possible, try to avoid joining nodes in deployment windows.

Command help:

sudo cumin 'A:docker-registry' 'run-puppet-agent -q'
sudo cumin 'A:wikikube-master and A:eqiad' 'run-puppet-agent -q'
sudo cumin -b 15 -s 5 'A:wikikube-worker and A:eqiad' 'run-puppet-agent -q'

Add node specific hiera data

If the node has some kubernetes related special features, you can add them via hiera

This can be done by creating the file hieradata/hosts/foo-node1001.yaml:

profile::kubernetes::node::kubelet_node_labels:
  - label-bar/foo=value1
  - label-foo/bar=value2

Note: In this past, we used this to populate region (datacentre) and zone (rack row). This no longer is needed, we do this automatically.

Step 3: Add to conftool/LVS

If the Kubernetes cluster is exposing services via LVS, you need to add the nodes FQDN to the cluster in conftool-data as well. For eqiad in conftool-data/node/eqiad.yaml like:

eqiad:
  foo:
    [...]
    foo_node1001.eqiad.wmnet: [kubesvc]

# example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894701

Merge the change, and run puppet on the datacentre's LVS hosts.

Then, pool your nodes using conftool (check the weight of your cluster's nodes first):

sudo confctl select 'name=foo_node1001.eqiad.wmnet,cluster=kubernetes,service=kubesvc' set/weight=10
sudo confctl select 'name=foo_node1001.eqiad.wmnet,cluster=kubernetes,service=kubesvc' set/pooled=yes

Done! You made it!

Please ensure you've followed all necessary steps from Server_Lifecycle#Staged_->_Active

Your node should now join the cluster and have workload scheduled automatically (like calico daemonsets). You can login to a deploy server and check the status:

kubectl get nodes

Removing a node

Drain workload

First step to remove a node is to drain workload from it. This is also to ensure that the workload actually still fits the cluster:

kubectl drain --ignore-daemonsets foo-node1001.datacenter.wmnet

However, some workloads might be using volumes mounted to local storage and to drain those you need to add a second option:

kubectl drain --ignore-daemonsets --delete-emptydir-data foo-node1.datacenter.wmnet

You can verify success by looking at what is still scheduled on the node:

kubectl describe node foo-node1001.datacenter.wmnet

Decommission

You can now follow the steps outlined in Server_Lifecycle#Active_->_Decommissioned

Ensure to also remove:

The node specific hiera data (from Kubernetes/Clusters/Add_or_remove_nodes#Add node specific hiera data)
The BGP config for homer (from Kubernetes/Clusters/Add or remove nodes#Step 1: Add node to BGP)
Remove the node from the cluster_nodes list (from Kubernetes/Clusters/Add or remove nodes#Step 3: Add to calico)

Delete the node from Kubernetes API

The step left is to delete the node from Kubernetes:

kubectl delete node foo-node1001.datacenter.wmnet