Jump to content

Incidents/2021-09-26 appserver latency

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

document status: final

Summary

Increased db load for enwiki (s1) resulted in slower responses, which in turn resulted in overall php-fpm worker limits being reached and thus affecting requests for all wikis. For requests above the limit, the error was "upstream connect error or disconnect/reset before headers. reset reason: overflow".

Impact: For about 15 minutes, backend appservers were slower or unable to respond for all wikis. This mainly affected logged-in users and most bot/API queries. Some page views from unregistered users were affected, for pages that were recently edited or otherwise expired from the CDN cache.

Documentation:

Actionables