PAWS/Tools/Admin/Chico's notes

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Notes on setting up a PAWS staging env in toolsbeta VPS project (T188428)

Using https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ as docs

Doesn't seem like PAWS is properly puppetized
- I see the apt-pinning defnitions, but not where tools-paws-master-01 installs needed packages (k8s, docker, etc)
Docker and k8s repos are defined for Xenial, though tools-paws-master-01 is stretch
- Docker does have current stretch versions, k8s does not (left docker as strech, k8s as xenial)
Unsure about cgroup driver used by docker. Official docs says to place { "exec-opts": ["native.cgroupdriver=systemd"] } in /etc/docker/daemon.json since the prod version does not have that I'm ommiting it for now
swap needs to be turned off for docker, done manually
started the k8s cluster with flannel
- kubeadm init --pod-network-cidr=10.244.0.0/16
Allow user chicocvenancio to use k8s
- chicocvenancio@toolsbeta-paws-master-01:~$ mkdir -p $HOME/.kube
- chicocvenancio@toolsbeta-paws-master-01:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
- chicocvenancio@toolsbeta-paws-master-01:~$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
Docs say to set /proc/sys/net/bridge/bridge-nf-call-iptables to 1, it already was 1
get flannel pods (docs mentions v0.9.1 I used latest version)
- kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml
exported Yuvi's git-crypt key
- yuvipanda@tools-paws-master-01:~/paws$ sudo -E git-crypt export-key /tmp/paws-key
- chicocvenancio@tools-paws-master-01:~$ sudo chown chicocvenancio /tmp/paws-key
- chicocvenancio@tools-paws-master-01:~$ scp /tmp/paws-key toolsbeta-paws-master-01.toolsbeta:~/paws-key
Intalled git-crypt
- chicocvenancio@toolsbeta-paws-master-01:~$ sudo apt-get install git-crypt
ran into Error: could not find tiller
- fixed with `helm init`
Tiller won't start due to a lack of nodes
Create new node
- Is there a way to adhere to naming convention when using instance count in horizon?
- toolsbeta-paws-worker-1001
  - Since its not puppetized, going for manual again
- Node joining brings up tiller
Once tiller is up we can run "sudo ./build.py deploy prod --install" to install PAWS
- We need to fix two things Yuvi did CLI and did not push to repo (yuvi's .bash_history invaluable to get these right)
  - Tiller RBAC
    - Done now by setting a non ideal permissive clusterrolebinding
      - chicocvenancio@toolsbeta-paws-master-01:~$ kubectl create clusterrolebinding permissive-binding --clusterrole=cluster-admin --user=admin --user=kubelet --group=system:serviceaccounts
  - the hub_db pvc is not defined at all in the repo, found the definition in yuvi's .bash_history
    - chicocvenancio@toolsbeta-paws-master-01:~$ kubectl -n prod apply -f /mnt/nfs/labstore-secondary-home/yuvipanda/paw-c/hub-pv.yaml

~~TODO: setup a new OAuth consumer for PAWS-beta~~
- Waiting aproval https://meta.wikimedia.org/w/index.php?title=Special:OAuthListConsumers/view/bb705f2027fbdb8fc434d4fdf9a7c482&name=PAWS-beta&publisher=&stage=0
~~TODO: stop these annoying pre-puller deamonsets~~
- This is actually not annoying and good once I pointed them to working docker repositories
Right now paws-beta uses a completely different way (WMCS-wise) for traffic ingress, I thought this simpler than copying the paws-proxy instance, in fact we can probably drop those instances (VPS cloud project) and improve production after some testing
- I did not get the ideal k8s LoadBalancer service to work with external IPs, instead I used a NodePort service and pointed a webproxy to one of the nodes (any will do)
  - This does mean that if that node fails the site will proxy will fail, which is NOT ok for production
Differences between PAWS-beta and prod:
- Already without the query-killer image
- Deploy-hook image uses artful and not zesty
- Per above, traffic ingress is different

Notes on K8S upgrade in PAWS-beta

Using docs https://kubernetes.io/docs/tasks/administer-cluster/kubeadm-upgrade-1-9/

cluster control plane

Upgrade kubeadm
- curl -sSL https://dl.k8s.io/release/v1.9.4/bin/linux/amd64/kubeadm > /usr/bin/kubeadm
- chmod a+rx /usr/bin/kubeadm
Verify kubeadm version
- chicocvenancio@toolsbeta-paws-master-01:~$ kubeadm version
- kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.4", GitCommit:"bee2d1505c4fe820744d26d41ecd3fdd4a3d6546", GitTreeState:"clean", BuildDate:"2018-03-12T16:21:35Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
root@toolsbeta-paws-master-01:~# kubeadm upgrade plan
kubeadm upgrade apply v1.9.4

Upgrade nodes

drain each node
- chicocvenancio@toolsbeta-paws-master-01:~$ kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- toolsbeta-paws-master-01 Ready master 13d v1.9.3
- toolsbeta-paws-worker-1001 Ready <none> 13d v1.9.3
- toolsbeta-paws-worker-1002 Ready <none> 13d v1.9.3
- toolsbeta-paws-worker-1003 Ready <none> 13d v1.9.3
- chicocvenancio@toolsbeta-paws-master-01:~$ kubectl drain toolsbeta-paws-master-01 --ignore-daemonsets
upgrade k8s:
- sudo apt-get update
- sudo apt-get install kubeadm=1.9.4-00 kubectl=1.9.4-00 kubelet=1.9.4-00
verify kubelet is running
- systemctl status kubelet
Uncordon node and verify it is ready
- kubectl uncordon toolsbeta-paws-master-01
- kubectl get nodes
Rinse, repeat for each node until all nodes in version v1.9.4:
- chicocvenancio@toolsbeta-paws-master-01:~$ kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- toolsbeta-paws-master-01 Ready master 13d v1.9.4
- toolsbeta-paws-worker-1001 Ready <none> 13d v1.9.4
- toolsbeta-paws-worker-1002 Ready <none> 13d v1.9.4
- toolsbeta-paws-worker-1003 Ready <none> 13d v1.9.4

k8s upgrade in PAWS

Went through the same steps as beta, but there are more nodes, and ran into a few new issues
apt-pinning by puppet
- This is done by raising a version's priority, to keep the workflow and guarantee the "drain => upgrade => uncordon => next node" order used "=1.9.4-00" to request specific version to apt-get install
nodes taking a very long time to drain
- Google led me to https://medium.com/@felipedutratine/when-you-try-to-drain-a-kubernetes-node-but-it-blocks-5aba9592d7c9
  - get the pods on the node and delete the ones still there
    - kubectl get pods -o wide --all-namespaces|grep worker-1001
      - kube-system kube-flannel-ds-stbq6 1/1 Running 1 63d 10.68.23.135 tools-paws-worker-1001
      - kube-system kube-proxy-zzrj2 1/1 Running 0 3h 10.68.23.135 tools-paws-worker-1001
      - prod proxy-5cd7d56555-tm4p6 2/2 Running 0 20d 10.244.7.45 tools-paws-worker-1001
      - support support-nginx-ingress-controller-sbl64 1/1 Running 4 63d 10.68.23.135 tools-paws-worker-1001
    - Mind kube-flannel, support-nginx-ingress-controller, and kube-proxy are DeamonSet controlled, so they wouldn't interfere
      - chicocvenancio@tools-paws-master-01:~$ kubectl delete pod proxy-5cd7d56555-tm4p6 -n prod
Created bash script to go through each one (/home/chicocvenancio/update_nodes.sh)
Sent Change 419599 to pin k8s to version 1.9.4 in PAWS.