Portal:Cloud VPS/Admin/notes/Neutron Migration

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This page contains historical information. This page was only relevant while we were migrating from nova-network to neutron

Clearly a lot of this can be automated. Once we've done a few projects without errors we can mash all this into one big super-script

Steps

Disable the project in the eqiad region and enable it in eqiad1. This will prevent users from creating new VMs in the old region.
- Example: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/460360/
On labcontrol1001, migrate quotas and security groups.

root@labcontrol1001:~# cd ~
root@labcontrol1001:~# pwd
/root

root@labcontrol1001:~# cd /root
root@labcontrol1001:~# source ~/novaenv.sh 
root@labcontrol1001:~# wmcs-region-migrate-quotas <project-name>
Updated quotas using <QuotaSet cores=12, fixed_ips=200, floating_ips=0, injected_file_content_bytes=10240, injected_file_path_bytes=255, injected_files=5, instances=8, key_pairs=100, metadata_items=128, ram=24576, security_group_rules=20, security_groups=10, server_group_members=10, server_groups=10>
root@labcontrol1001:~# wmcs-region-migrate-security-groups <project-name>
deleting rule {u'remote_group_id': u'2c908284-84ef-4a4a-8f1a-11e84b6256db', u'direction': u'ingress', u'protocol': None, u'description': u'', u'ethertype': u'IPv4', u'remote_ip_prefix': None, u'port_range_max': None, u'security_group_id': u'2c908284-84ef-4a4a-8f1a-11e84b6256db', u'port_range_min': None, u'tenant_id': u'hhvm', u'id': u'6ea1ac47-3876-4676-bb01-bf89cb6f4363'}
deleting rule {u'remote_group_id': u'2c908284-84ef-4a4a-8f1a-11e84b6256db', u'direction': u'ingress', u'protocol': None, u'description': u'', u'ethertype': u'IPv6', u'remote_ip_prefix': None, u'port_range_max': None, u'security_group_id': u'2c908284-84ef-4a4a-8f1a-11e84b6256db', u'port_range_min': None, u'tenant_id': u'hhvm', u'id': u'85eabca5-63f4-43f2-aeb2-64d2e70779f1'}
Updating group default in dest
copying rule: {u'from_port': None, u'group': {u'tenant_id': u'hhvm', u'name': u'default'}, u'ip_protocol': None, u'to_port': None, u'parent_group_id': 357, u'ip_range': {}, u'id': 1526}
copying rule: {u'from_port': -1, u'group': {}, u'ip_protocol': u'icmp', u'to_port': -1, u'parent_group_id': 357, u'ip_range': {u'cidr': u'0.0.0.0/0'}, u'id': 1527}
copying rule: {u'from_port': 22, u'group': {}, u'ip_protocol': u'tcp', u'to_port': 22, u'parent_group_id': 357, u'ip_range': {u'cidr': u'10.0.0.0/8'}, u'id': 1528}
copying rule: {u'from_port': 5666, u'group': {}, u'ip_protocol': u'tcp', u'to_port': 5666, u'parent_group_id': 357, u'ip_range': {u'cidr': u'10.0.0.0/8'}, u'id': 1529}

Start 'screen' because the next bit is going to take a while

root@labcontrol1001:~# screen

Get a list of all VMs in the project

root@labcontrol1001:~# OS_TENANT_NAME=<project-name> openstack server list
+--------------------------------------+------------------+--------+--------------------+
| ID                                   | Name             | Status | Networks           |
+--------------------------------------+------------------+--------+--------------------+
| d4730c86-a6cc-4cb1-9ebe-a84f26926f24 | hhvm-jmm-vp9     | ACTIVE | public=10.68.19.57 |
| 34522cd3-9628-4035-9faa-6d12e55b0f9f | hhvm-stretch-jmm | ACTIVE | public=10.68.20.46 |
| db3a0098-8707-49bd-846f-9b9629c63658 | hhvm-jmm         | ACTIVE | public=10.68.16.91 |
+--------------------------------------+------------------+--------+--------------------+

Migrate VMs one by one

root@labcontrol1001:~# wmcs-region-migrate d4730c86-a6cc-4cb1-9ebe-a84f26926f24

See what broke

Special Concerns for Kubernetes Nodes

When moving a Kubernetes worker (or anything that connects to the flannel network for that matter), you must reload ferm on every flannel etcd node (currently that means tools-flannel-etcd-0[1-3].tools.eqiad.wmflabs). After that, run puppet on the worker node to put everything to rights.

That said, don't forget that some worker nodes still have a broken image that has a bad resolve.conf. Do check that.

Therefore the process for moving a worker node is:

drain and cordon
move with wmcs-region-migrate
fix resolve.conf if needed
sudo systemctl reload ferm on tools-flannel-etcd-0[1-3]
run puppet
uncordon after validating the node is "Ready" in kubectl get nodes

Common issues

Puppet not working because certificate issues. Run sudo rm -rf /var/lib/puppet/ssl in the instance and then run sudo puppet agent -t -v again.