Jump to content

Obsolete:Data Dump Redesign/Throughput

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This page is kept for historical purposes only.

Stubs

  • 200K Samples
    • ESWIKI
      • real 20m19.417s / 25M
      • user 13m40.800s
      • sys 0m31.240s
      • ESWIKI has just under 1.2million entries
      • 1,170,721 as of 2009 06 15 Pages
      • Currently takes a little more then an hour to dump stubs
    • ENWIKI
      • 17,194,115 Pages
      • Split a the stubs from ENWIKI 20090604 into 200k chunks and process
      • 100Stubs @ 8core = 13 Machines

Splitting Stubs

Wed Jun 24 00:50:25 2009 Processed 10000 pages out of 200000 for segment 0
Wed Jun 24 01:18:41 2009 Closing Segment 0 after 200000 total pages processed. Opening next segment
  • 30Mins per stub