Jump to content

User:Elukey/Analytics/Oozie

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Unexpected values in pageview for workflow

Error from Oozie:

oozie@analytics1003.eqiad.wmnet via wikimedia.org 
12:46 PM (2 hours ago)

to analytics-aler. 
Values were found in pageview that were not in the whitelist.

Please have a look and take necessary action !
Thanks :)
-- Oozie

Check the following:

ADD JAR /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar; use wmf; select * from pageview_unexpected_values where year = 2016 and month = 10 and day = 7;

If you need to speed up your job

yarn application --movetoqueue application_1476969128131_44799 --queue production

Data Loss ERROR - Workflow webrequest-load-check_sequence_statistics

Error from Oozie:

oozie@analytics1003.eqiad.wmnet via wikimedia.org 
8:26 PM (18 hours ago)

to analytics-aler. 
Please check wmf_raw.webrequest_sequence_stats_hourly.
This is an ERROR.
This job has failed, refine has not been launched.

Please have a look and take necessary action !
Thanks :)
-- Oozie

The first step is to investigate why this is happening. Useful resources:

- https://grafana.wikimedia.org/dashboard/db/varnish-aggregate-client-status-codes, since Varnish errors ending up in 503s might contribute.

- Check the percentage of loss registered:

ADD JAR /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar ;
use wmf_raw;
select webrequest_source,percent_lost from webrequest_sequence_stats_hourly where day = 6 and month = 10 and year = 2016 and hour = 17;

If there is a valid motivation, pleas re-run from stat1004 the Oozie job with the following command (WARNING: start/stop time values need to be changed!)

sudo -u hdfs oozie job --oozie $OOZIE_URL

 -Drefinery_directory=hdfs://analytics-hadoop$(hdfs dfs -ls -d /wmf/refinery/2016* | tail -n 1 | awk '{print $NF}')
 -Dqueue_name=production   -Doozie_launcher_queue_name=production -Doozie_launcher_memory=256   
 -Dstart_time=2016-10-06T17:00Z   -Dstop_time=2016-10-06T17:59Z   
 -config coord_load_webrequest_upload.properties -run