Jump to content

MariaDB/Switch Datacenter

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

The week before the switchover

  • 7 days before: no more maintenance on the database clusters.
  • 6 days before: Enable circular replication between eqiad and codfw.
    • This requires updating section_params in hieradata/common/profile/mariadb.yaml. E.g. gerrit:719168
    • Run the sre.switchdc.databases.prepare cookbook.
  • In the new DC:
    • Check and disable GTID on primaries.
    • Check that all replicas have GTID enabled.
    • Check for disabled notifications (icinga)/silences (alertmanager).
    • Check that the query killers are installed and enabled.
    • Review MW weights, comparing them to the old DC.
    • Warm up the caches using queries from the old DC.

The day of the switchover

Before the switchover

  • Downtime all db primaries just before the switch, so that read-only alerts won't fire (T285803).

After the switchover

  • Manually fix parsercache hosts and x2 in tendril: T266723
  • Submit a puppet patch changing host-down alerting:
    • Background: gerrit:736415
    • Move profile::monitoring::is_critical: true from hieradata/role/<old dc>/mariadb/* to hieradata/role/<new dc>/mariadb/
    • Re-run puppet: sudo cumin 'A:db-core or A:db-parsercache' 'run-puppet-agent -q'

After the switchover

  • 2 days after: disable circular replication again:
    • update section_params in hieradata/common/profile/mariadb.yaml again. E.g. gerrit:721421
    • Run the sre.switchdc.databases.finalize cookbook.