Jump to content

Portal:Toolforge/Admin/Runbooks/Redis

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This runbook explains how to debug and fix common issues with Toolforge Redis.

Error / Incident

  • checker.tools.wmflabs.org/toolschecker: Redis set/get is a paging alert that triggers where toolschecked is unable to talk to Redis.

Debugging

SSH to Redis hosts at tools-redis-X.tools.eqiad1.wikimedia.cloud and check the output of redis-cli info replication.

The Redis service is using a non-standard unit name in Systemd: redis-instance-tcp_6379.service

Common issues

max number of clients reached

When this happens, the following message will be logged: ERR max number of clients reached. Check in the logs of all servers with sudo journalctl -g "max number of clients". If you find this message appearing repeatedly, restart the Systemd unit on the hosts where the message is logged:

$ sudo systemctl restart redis-instance-tcp_6379.service

Portal:Toolforge/Admin/Redis

Support contacts

#wikimedia-cloud-admin is the main communication channel for Toolforge admins.

If Redis is down, you should follow the Wikimedia Cloud Services team/Incident Response Process

Old incidents