Jump to content

PAWS/Tools/Admin/Chico's notes

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Notes on setting up a PAWS staging env in toolsbeta VPS project (T188428)

Using https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ as docs

  • Doesn't seem like PAWS is properly puppetized
    • I see the apt-pinning defnitions, but not where tools-paws-master-01 installs needed packages (k8s, docker, etc)
  • Docker and k8s repos are defined for Xenial, though tools-paws-master-01 is stretch
    • Docker does have current stretch versions, k8s does not (left docker as strech, k8s as xenial)
  • Unsure about cgroup driver used by docker. Official docs says to place { "exec-opts": ["native.cgroupdriver=systemd"] } in /etc/docker/daemon.json since the prod version does not have that I'm ommiting it for now
  • swap needs to be turned off for docker, done manually
  • started the k8s cluster with flannel
    • kubeadm init --pod-network-cidr=10.244.0.0/16
  • Allow user chicocvenancio to use k8s
    • chicocvenancio@toolsbeta-paws-master-01:~$ mkdir -p $HOME/.kube
    • chicocvenancio@toolsbeta-paws-master-01:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    • chicocvenancio@toolsbeta-paws-master-01:~$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
  • Docs say to set /proc/sys/net/bridge/bridge-nf-call-iptables to 1, it already was 1
  • get flannel pods (docs mentions v0.9.1 I used latest version)
  • exported Yuvi's git-crypt key
    • yuvipanda@tools-paws-master-01:~/paws$ sudo -E git-crypt export-key /tmp/paws-key
    • chicocvenancio@tools-paws-master-01:~$ sudo chown chicocvenancio /tmp/paws-key
    • chicocvenancio@tools-paws-master-01:~$ scp /tmp/paws-key toolsbeta-paws-master-01.toolsbeta:~/paws-key
  • Intalled git-crypt
    • chicocvenancio@toolsbeta-paws-master-01:~$ sudo apt-get install git-crypt
  • ran into Error: could not find tiller
    • fixed with `helm init`
  • Tiller won't start due to a lack of nodes
  • Create new node
    • Is there a way to adhere to naming convention when using instance count in horizon?
    • toolsbeta-paws-worker-1001
      • Since its not puppetized, going for manual again
    • Node joining brings up tiller
  • Once tiller is up we can run "sudo ./build.py deploy prod --install" to install PAWS
    • We need to fix two things Yuvi did CLI and did not push to repo (yuvi's .bash_history invaluable to get these right)
      • Tiller RBAC
        • Done now by setting a non ideal permissive clusterrolebinding
          • chicocvenancio@toolsbeta-paws-master-01:~$ kubectl create clusterrolebinding permissive-binding --clusterrole=cluster-admin --user=admin --user=kubelet --group=system:serviceaccounts
      • the hub_db pvc is not defined at all in the repo, found the definition in yuvi's .bash_history
        • chicocvenancio@toolsbeta-paws-master-01:~$ kubectl -n prod apply -f /mnt/nfs/labstore-secondary-home/yuvipanda/paw-c/hub-pv.yaml
  • TODO: setup a new OAuth consumer for PAWS-beta
  • TODO: stop these annoying pre-puller deamonsets
    • This is actually not annoying and good once I pointed them to working docker repositories
  • Right now paws-beta uses a completely different way (WMCS-wise) for traffic ingress, I thought this simpler than copying the paws-proxy instance, in fact we can probably drop those instances (VPS cloud project) and improve production after some testing
    • I did not get the ideal k8s LoadBalancer service to work with external IPs, instead I used a NodePort service and pointed a webproxy to one of the nodes (any will do)
      • This does mean that if that node fails the site will proxy will fail, which is NOT ok for production
  • Differences between PAWS-beta and prod:
    • Already without the query-killer image
    • Deploy-hook image uses artful and not zesty
    • Per above, traffic ingress is different


Notes on K8S upgrade in PAWS-beta


cluster control plane

  • Upgrade kubeadm
  • Verify kubeadm version
    • chicocvenancio@toolsbeta-paws-master-01:~$ kubeadm version
    • kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.4", GitCommit:"bee2d1505c4fe820744d26d41ecd3fdd4a3d6546", GitTreeState:"clean", BuildDate:"2018-03-12T16:21:35Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  • root@toolsbeta-paws-master-01:~# kubeadm upgrade plan
  • kubeadm upgrade apply v1.9.4

Upgrade nodes

  • drain each node
    • chicocvenancio@toolsbeta-paws-master-01:~$ kubectl get nodes
    • NAME STATUS ROLES AGE VERSION
    • toolsbeta-paws-master-01 Ready master 13d v1.9.3
    • toolsbeta-paws-worker-1001 Ready <none> 13d v1.9.3
    • toolsbeta-paws-worker-1002 Ready <none> 13d v1.9.3
    • toolsbeta-paws-worker-1003 Ready <none> 13d v1.9.3
    • chicocvenancio@toolsbeta-paws-master-01:~$ kubectl drain toolsbeta-paws-master-01 --ignore-daemonsets
  • upgrade k8s:
    • sudo apt-get update
    • sudo apt-get install kubeadm=1.9.4-00 kubectl=1.9.4-00 kubelet=1.9.4-00
  • verify kubelet is running
    • systemctl status kubelet
  • Uncordon node and verify it is ready
    • kubectl uncordon toolsbeta-paws-master-01
    • kubectl get nodes
  • Rinse, repeat for each node until all nodes in version v1.9.4:
    • chicocvenancio@toolsbeta-paws-master-01:~$ kubectl get nodes
    • NAME STATUS ROLES AGE VERSION
    • toolsbeta-paws-master-01 Ready master 13d v1.9.4
    • toolsbeta-paws-worker-1001 Ready <none> 13d v1.9.4
    • toolsbeta-paws-worker-1002 Ready <none> 13d v1.9.4
    • toolsbeta-paws-worker-1003 Ready <none> 13d v1.9.4

k8s upgrade in PAWS

  • Went through the same steps as beta, but there are more nodes, and ran into a few new issues
  • apt-pinning by puppet
    • This is done by raising a version's priority, to keep the workflow and guarantee the "drain => upgrade => uncordon => next node" order used "=1.9.4-00" to request specific version to apt-get install
  • nodes taking a very long time to drain
    • Google led me to https://medium.com/@felipedutratine/when-you-try-to-drain-a-kubernetes-node-but-it-blocks-5aba9592d7c9
      • get the pods on the node and delete the ones still there
        • kubectl get pods -o wide --all-namespaces|grep worker-1001
          • kube-system kube-flannel-ds-stbq6 1/1 Running 1 63d 10.68.23.135 tools-paws-worker-1001
          • kube-system kube-proxy-zzrj2 1/1 Running 0 3h 10.68.23.135 tools-paws-worker-1001
          • prod proxy-5cd7d56555-tm4p6 2/2 Running 0 20d 10.244.7.45 tools-paws-worker-1001
          • support support-nginx-ingress-controller-sbl64 1/1 Running 4 63d 10.68.23.135 tools-paws-worker-1001
        • Mind kube-flannel, support-nginx-ingress-controller, and kube-proxy are DeamonSet controlled, so they wouldn't interfere
          • chicocvenancio@tools-paws-master-01:~$ kubectl delete pod proxy-5cd7d56555-tm4p6 -n prod
  • Created bash script to go through each one (/home/chicocvenancio/update_nodes.sh)
  • Sent Change 419599 to pin k8s to version 1.9.4 in PAWS.