User:Cwhite/Logstash/ECS Schema Guide for Developers
Rationale
Oftentimes, software is opinionated about what constitutes a log entry. Since WMF's centralized logging infrastructure became generally available, it has experienced incredible organic growth. This growth presents challenges in the storage, ingest, and presentation domains. One such issue is there is no definition to how many fields can be set and subsequentially no typing info provided. Without control on the type of these fields, Elasticsearch must guess the type making type collisions a regular occurrence. Without control of what fields are available, fields remain largely undefined and meaningless to the outside observer. As we strive to boost signal, reduce noise, scale, simplify, and improve the user experience of the centralized logging system, we see the need to agree on a Common Logging Schema. The Observability team has evaluated options and decided to adopt the Elastic Common Schema (ECS).
Required Fields
ECS Version
ECS logs are identified by including the ECS version in the structured log event. This field is ecs.version and should contain the ECS version the log event is targeting.
Common Fields
The structured log object (a JSON object) consists of a set of attributes. There are a few common attributes[1] that most every log source will want to populate. When possible, please follow the field content recommendations in this document.
Timestamp
Ideally, the timestamp attribute contains an ISO-8601 formatted timestamp indicating the time the log was generated in UTC. This field will be translated to the native date type and moved to @timestamp.[2]
If not provided, the logging pipeline will generate the @timestamp field indicating the time it was received by the logging pipeline.
Message
message is a short summary or message optimized for viewing in a log viewer.[3] When a message is not provided, it can be constructed from other fields to provide a human-readable summary of the log entry.
The message field is often times the first field a user will look to when searching for diagnostic information. While there are no restrictions about what data is allowed in the message field, we recommend optimizing the field for human consumption by keeping the message short and putting diagnostic data in the proper place.[4]
How to tell if a piece of information is diagnostic data and not a good fit for the message field:
- Would this information be glossed over when a user reads the message?
- Is the piece of information useful for measurement?
- Is the piece of information useful to correlate with other log entries?
- Would it take multiple lines render the data in the message?
If the answer to any of the above questions is "yes," consider moving the datapoint(s) to their own field as defined in the ECS documentation or the label object.
Common datapoints with their own fields:
- Event (UU)IDs:
event.idfield. - Stack traces:
error.stack_tracefield. - HTTP data:
httpobject field. - URL data:
urlobject field. - (... this list is incomplete)
Log Level[5]
The log.level field is a human-readable string and is indexed as a keyword. If log.level is omitted, the logging pipeline will attempt to populate it with:
- The value at
log.syslog.severity.name. - The human-readable definition of
log.syslog.severity.code. NOTSETif no other level indicator could be found.[6]
For log producers that emit JSON-formatted messages and define their own level, log.level is used to populate log.syslog.severity.name and log.syslog.severity.code per this table:
Lowercase log.level |
RFC5424 definition | Lowercase RFC5424 Severity | RFC5424 Severity code | PHP[7] | Java[8] | NodeJS[9] | Python[10] | Syslog[11] |
|---|---|---|---|---|---|---|---|---|
| trace, debug | debug-level messages | debug | 7 | |||||
| info, informational | informational messages | informational | 6 | |||||
| notice | normal but significant condition | notice | 5 | |||||
| warning, warn | warning conditions | warning | 4 | |||||
| error, err | error conditions | error | 3 | |||||
| critical, crit | critical conditions | critical | 2 | |||||
| alert | action must be taken immediately | alert | 1 | |||||
| emerg, emergency, fatal | system is unusable | emergency | 0 |
If log.level cannot be mapped to RFC5424 severity, then syslog.severity.name will be set to "alert" and syslog.severity.code will be set to "1".
Service Name[12]
service.name is a combination of service and cluster. The intent for this field is to indicate not just the service that emitted the log entry, but also indicate what cluster in the overall system the log came from.
- For Kubernetes: this is the namespace name.
- For all others: this is usually the application name and cluster concatenated with a hyphen (-).
Examples:
- elasticsearch-logging
- blazegraph-wdqs
- elasticsearch-wdqs
- mediawiki-api_appserver
- mediawiki-jobrunner
- memcached-memcached_gutter
- memcached-memcached
Service Type[12]
service.type is the application name.
- For Kubernetes: this is the app label.
- For all others: this is the application name.
Examples:
- elasticsearch
- kafka
- blazegraph
- mediawiki
- restbase
Diagnostic Data
Oftentimes, one will need diagnostic data to accompany the log entry. Diagnostic data gives the log entry context, more detail, and sometimes a path to reproduction. ECS defines fields to provide for the need for diagnostic data.
Hostname
host.name and respective fields in the host object.
Url Object
HTTP Object
Custom Fields
ECS defines the labels field for custom key-value data.
labels field does not support nested objects. All keys and values are stored as keyword.Deprecated Fields
These fields are commonly used, but have no clear analogue in ECS.
Channel
Use log.logger, event.module, or a custom label in the labels object.
Type
Use service.type and/or service.name.
Program
Use service.type and/or service.name.
Missing Fields
HTTP Headers
As of this writing (1.6.0), there is no great place for HTTP headers. (See this PR).
Notes
- ↑ The terms "attribute" and "field" are used interchangeably.
- ↑ Presence of the
timestampfield (without the@) in Kibana indicates a problem in the logging pipeline and must be rectified. - ↑ In Kibana, 180 characters shows comfortably on one line on a 1920x1080 widescreen monitor.
- ↑
The message field is analyzed as a natural language text type. This means that the message field will be:
- tokenized -- the text is broken up on whitespace, stop words, and optionally non-letter characters
- filtered -- the tokens are downcased and stemmed. Based on a set of rules, the base word is extracted from the word. For example, "running" is stemmed to "run" and "browsers" is stemmed to "browser".
- indexed -- the filtered tokens are then indexed into an inverted index indicating which documents the token can be found in.
- ↑ The WMF uses a number of programming languages in production. Each programming language has its own opinion on how to indicate logging level. Logging level can be customized by the developer further complicating the issue of finding errors. We see the need to agree on a defined set of log levels to make it easier for log consumers not always familiar with the programming language or developer preferences to find what they need. The Observability team has decided to standardize on RFC5424 Syslog severity.
- ↑
NOTSETindicates a problem either in the log producer or the pipeline and must be rectified. - ↑ https://www.php-fig.org/psr/psr-3/
- ↑ https://en.wikipedia.org/wiki/Log4j#Log4j_log_levels
- ↑ https://github.com/trentm/node-bunyan#levels
- ↑ https://docs.python.org/3/library/logging.html#levels
- ↑ https://tools.ietf.org/html/rfc5424#section-6.2.1
- ↑ 12.0 12.1 In some cases, this field can be generated by the pipeline.