Jump to content

SRE/Dc-operations/Platform-specific documentation/Dell PowerEdge RN10/R510 Performance Testing

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

The Dell PowerEdge R510 hosts are used for the es1001-es1010 hosts in eqiad.

Hardware stats

12 disks hardware raid card

Performance statistics

These tests done with the disks configured:

  • raid 10
  • 256k stripe
  • no read ahead
  • writeback cache

Tests performed with sysbench 0.4.10-1build1, standard debian package (obtained by 'aptitude install sysbench')

cpu, memory, threads, mutex

root@es1003:/a/tmp/sysbench# for i in cpu memory threads mutex ; do sysbench --test=$i run; done
sysbench 0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          10.0859s
    total number of events:              10000
    total time taken by event execution: 10.0851
    per-request statistics:
         min:                                  0.96ms
         avg:                                  1.01ms
         max:                                  1.91ms
         approx.  95 percentile:               1.76ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   10.0851/0.00

sysbench 0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1K

Memory transfer size: 102400M

Memory operations type: write
Memory scope type: global
Threads started!
Done.
Operations performed: 104857600 (2259393.46 ops/sec)
102400.00 MB transferred (2206.44 MB/sec)


Test execution summary:
    total time:                          46.4096s
    total number of events:              104857600
    total time taken by event execution: 39.3877
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.00ms
         max:                                  0.19ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           104857600.0000/0.00
    execution time (avg/stddev):   39.3877/0.00

sysbench 0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing thread subsystem performance test
Thread yields per test: 1000 Locks used: 8
Threads started!
Done.


Test execution summary:
    total time:                          2.0137s
    total number of events:              10000
    total time taken by event execution: 2.0128
    per-request statistics:
         min:                                  0.20ms
         avg:                                  0.20ms
         max:                                  0.33ms
         approx.  95 percentile:               0.20ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   2.0128/0.00

sysbench 0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing mutex performance test
Threads started!
Done.


Test execution summary:
    total time:                          0.0021s
    total number of events:              1
    total time taken by event execution: 0.0020
    per-request statistics:
         min:                                  2.01ms
         avg:                                  2.01ms
         max:                                  2.01ms
         approx.  95 percentile:         10000000.00ms

Threads fairness:    events (avg/stddev):           1.0000/0.00
    execution time (avg/stddev):   0.0020/0.00

disks

This machine is also what is currently (May 2014) running graphite, thus it makes sense to have a syntetic benchmark of the raid itself. Graphite (carbon-cache) results in many small writes, thus fio was run with a config like the one below. Note that 8 matches the number of carbon-cache that write to disk.

# cat carbon-cache.job 
[global]
bs=4k
rw=randwrite
write_bw_log
write_lat_log

[carbon-cache]
group_reporting=1
directory=/a/tmp/
numjobs=8
size=10g

And executed like this: fio --output carbon-cache.out --bandwidth-log --latency-log carbon-cache.job

# cat carbon-cache.out 
carbon-cache: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
...
carbon-cache: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
fio 1.59
Starting 8 processes

carbon-cache: (groupid=0, jobs=8): err= 0: pid=13789
  write: io=81920MB, bw=29500KB/s, iops=7375 , runt=2843564msec
    clat (usec): min=2 , max=132888 , avg=1065.97, stdev=4886.38
     lat (usec): min=2 , max=132888 , avg=1066.14, stdev=4886.38
    bw (KB/s) : min=    1, max=964232, per=12.68%, avg=3739.50, stdev=18964.43
  cpu          : usr=0.06%, sys=0.82%, ctx=1093250, majf=0, minf=247055
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=0/20971520/0, short=0/0/0
     lat (usec): 4=8.98%, 10=79.06%, 20=6.70%, 50=0.06%, 100=0.03%
     lat (usec): 250=0.01%, 500=0.01%, 750=0.01%
     lat (msec): 2=0.01%, 4=0.01%, 20=4.61%, 50=0.51%, 100=0.04%
     lat (msec): 250=0.01%

Run status group 0 (all jobs):
  WRITE: io=81920MB, aggrb=29500KB/s, minb=30208KB/s, maxb=30208KB/s, mint=2843564msec, maxt=2843564msec

Disk stats (read/write):
  dm-0: ios=1/13680669, merge=0/0, ticks=1768/415097120, in_queue=415108824, util=99.96%, aggrios=330/13701997, aggrmerge=10/329684, aggrticks=474592/413980824, aggrin_queue=414435312, aggrutil=99.96%
    sda: ios=330/13701997, merge=10/329684, ticks=474592/413980824, in_queue=414435312, util=99.96%

So with 4k random writes and 8 processes, using read/write it looks like there can be ~7.3k iops sustained from the raid controller.

TODO more comprehensive tests, with varying block sizes and sync/async io etc and performance comparation