Data Platform/Systems/EventLogging/Architecture

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This page explains WMF's EventLogging system topology and how its parts interact. Using the following diagram as a reference:

varnishkafka sends client-side raw (URL encoded JSON in query string) events from Varnish to eventlogging-client-side Kafka topic.
An eventlogging-processor consumes and processes these raw events and send them back to Kafka as JSON strings. Once processed and validated, the processed events are produce to Kafka in the topics: eventlogging-valid-mixed and eventlogging_<schemaName>. eventlogging-valid-mixed that contains the valid events from all schemas with the exception of blacklisted high volume schemas. eventlogging_<schemaName> holds all events for each schema.
eventlogging-valid-mixed is consumed by eventlogging-consumer processes and stored into MySQL and into the eventlogging log files. The eventlogging_<schemaName> topics are consumed by Camus and stored in HDFS partitioned by <schemaName>/<year>/<month>/<day>/<hour>

The EventLogging back-end is comprised of several pieces that consume and produce from/to Kafka, which makes it a single purpose standalone stream processor. The /etc/eventlogging.d file hierarchy contains those process instance definitions. It has a subfolder for each service type. An systemd task, uses this file hierarchy and provisions a job for each instance definition. Instance definition files contain command-line arguments for the service program, one argument per line.

An 'eventloggingctl' shell script provides a convenient wrapper around for managing EventLogging processes.