Jump to content

Data Platform/Data Lake/Edits/Metrics

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This table stores daily and monthly metrics computed over the denormalized mediawiki history dataset. It is partitioned by wiki_db and metric name to facilitate using its data outside of Hive, namely for display in Dashiki.

Schema


col_name	data_type	comment
dt                  	string              	The date of this measurement, as YYYY-MM-DD
value               	bigint              	The measurement     
snapshot            	string              	Versioning information to keep multiple datasets (YYYY-MM for regular labs imports)
metric              	string              	The metric being computed to measure
wiki_db             	string              	The wiki this measurement pertains to
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
snapshot            	string              	Versioning information to keep multiple datasets (YYYY-MM for regular labs imports)
metric              	string              	The metric being computed to measure
wiki_db             	string              	The wiki this measurement pertains to

As May 2023, possible values for metric include:

1. daily_edits
2. daily_edits_by_anonymous_users
3. daily_edits_by_bot_users
4. daily_edits_by_registered_users
5. daily_unique_anonymous_editors
6. daily_unique_bot_editors
7. daily_unique_editors
8. daily_unique_page_creators
9. daily_unique_registered_editors
10. monthly_new_editors
11. monthly_new_registered_users
12. monthly_surviving_new_editors

Definition

The Hive queries that generate these metrics are in wikimedia/analytics-refinery/oozie/mediawiki/history/metrics.