Jump to content

User:Effie Mouzeli (WMF)/Howtos/New LVS Kubernetes Service

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Before adding a new LVS service, the service should be:

  • running and responding properly to healthcheck
  • listening to TLS tls.enabled

Puppet Private - Certs

Certificates

Follow the process in Kubernetes/Enabling_TLS#Create_and_place_certificates

DNS and Netbox Patch #1

Make an allocation [DNS/Netbox#How_to_manually_allocate_a_special_purpose_IP_address_in_Netbox]

Note: change the netmask to /32,

Add svc records in operations/dns

  • templates/10.in-addr.arpa
  • templates/wmnet
  • Review and merge
  • login to ns0.wikimedia.org, and run sudo authdns-update. This will pull from operations/dns and generate zonefiles and gnsd configs on each nameserver.
  • Verify: for i in 0 1 2 ; do dig @ns${i}.wikimedia.org -t any my-changed-record.wikimedia.org ; done

Finish up DNS

  • cumin1001:~$ sudo cookbook sre.dns.netbox "Add VIPs for X services"

Puppet Patch #1 - LVS prep

'''hieradata/common/service.yaml:'''
service::catalog:
  echostore:
    description: Echo store, echostore.svc.%{::site}.wmnet  <-- '''change'''
    encryption: true 
    ip: 
      codfw:
        default: 10.2.1.49                                  <-- '''change'''
      eqiad:
        default: 10.2.2.49                                  <-- '''change'''
    lvs: # Properties that are related to LVS setup.
      class: low-traffic
      conftool: 
        cluster: kubernetes
        service: kubesvc
      depool_threshold: '.5'
      enabled: true
      monitors:
        IdleConnection:
          max-delay: 300
          timeout-clean-reconnect: 3
      scheduler: wrr
      protocol: tcp
    monitoring: 
      check_command: check_https_port_status!8082!200!/healthz # <-- '''change PORT''' command for the check in icinga
      critical: false                                          # True in Prod
      sites: 
        codfw:
          hostname: echostore.svc.codfw.wmnet <-- '''change'''
        eqiad:
          hostname: echostore.svc.eqiad.wmnet  <-- '''change'''
    port: 8082                                                 # <-- '''change''' https://wikitech.wikimedia.org/wiki/Service_ports
    sites:
    - eqiad
    - codfw
    state: service_setup
    discovery:                                                 
    - dnsdisc: echostore                                        # <-- '''change'''
      active_active: true


'''hieradata/role/common/kubernetes/worker.yaml:'''
profile::lvs::realserver::pools:
  echostore: {}


'''conftool-data/discovery/services.yaml'''
foo: [eqiad,codfw]
  • sudo cumin 'O:lvs::balancer' 'run-puppet-agent'

Puppet Patch #2 - LVS setup

'''hieradata/common/service.yaml:'''
[...]
    sites:
    - eqiad
    - codfw
    state: lvs_setup <-----
    discovery:
    - dnsdisc: echostore
      active_active: true
  • Run puppet sudo cumin 'O:lvs::balancer' 'run-puppet-agent'
  • Ack PyBal diff checks on Icinga
  • Find primary and secondary low traffic LVSs in modules/lvs/manifests/configuration.pp
  • Log on SAL and restart pybal (secondaries first) sudo systemctl restart pybal
  • Checks: sudo ipvsadm -L -n and curl -v -k http://eventgate-analytics.svc.eqiad.wmnet:31192/_info

Puppet Patch #3 - LVS production

'''hieradata/common/service.yaml:'''
[...]
    sites:
    - eqiad
    - codfw
    state: production <-----
    discovery:
    - dnsdisc: echostore
      active_active: true
  • sudo cumin 'A:icinga or A:dns-auth' run-puppet-agent

DNS Patch #2

Add discovery (DYNA) records in operations/dns

  • templates/wmnet
  • utils/mock_etc/discovery-geo-resources
  • Review and merge
  • login to ns0.wikimedia.org, and run sudo authdns-update. This will pull from operations/dns and generate zonefiles and gnsd configs on each nameserver.

Pool!

$ confctl --object-type discovery select 'dnsdisc=echostore' set/pooled=true