Jump to content

Talk:Analytics/Data Lake/Traffic/Pageview hourly/Identity reconstruction analysis

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

UA Approach Recommendation

I see a couple options listed regarding the UA to be removal of user_agent_map altogether or having some threshold for inclusion. I recommend that we retain the following -

  • os_family
  • os_major
  • os_minor
  • browser_family

- for all such distinct maps having at least 1000 daily members.

As part of the UA transformation, the Wikipedia apps UA info should be transformed such that browser_family (can we rename to "browser" to avoid confusion?) becomes "WikipediaApp" (instead of, for example, "Mobile Safari" or "Android"). Here are examples of some user_agent_map values for the apps today.

{"browser_major":"-","os_family":"iOS","device_family":"Generic Smartphone","os_major":"7","browser_family":"Mobile Safari","wmf_app_version":"4.1.3","os_minor":"1"}

{"browser_major":"4","os_family":"Android","device_family":"Generic Smartphone","os_major":"4","browser_family":"Android","wmf_app_version":"2.0.110-r-2015-08-31","os_minor":"4"}
That covers part of this example but leaves other vectors of attack opened, this is an example of a wider problem explained here: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly/K_Anonymity_Threshold_Analysis Nuria (talk) 17:38, 11 May 2017 (UTC)Reply