SRE/Dc-operations/Platform-specific documentation/Dell PowerEdge RN10/R510 Performance Testing
The Dell PowerEdge R510 hosts are used for the es1001-es1010 hosts in eqiad.
Hardware stats
12 disks hardware raid card
Performance statistics
These tests done with the disks configured:
- raid 10
- 256k stripe
- no read ahead
- writeback cache
Tests performed with sysbench 0.4.10-1build1, standard debian package (obtained by 'aptitude install sysbench')
cpu, memory, threads, mutex
root@es1003:/a/tmp/sysbench# for i in cpu memory threads mutex ; do sysbench --test=$i run; done
sysbench 0.4.10: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Doing CPU performance benchmark
Threads started!
Done.
Maximum prime number checked in CPU test: 10000
Test execution summary:
total time: 10.0859s
total number of events: 10000
total time taken by event execution: 10.0851
per-request statistics:
min: 0.96ms
avg: 1.01ms
max: 1.91ms
approx. 95 percentile: 1.76ms
Threads fairness:
events (avg/stddev): 10000.0000/0.00
execution time (avg/stddev): 10.0851/0.00
sysbench 0.4.10: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Doing memory operations speed test
Memory block size: 1K
Memory transfer size: 102400M
Memory operations type: write
Memory scope type: global
Threads started!
Done.
Operations performed: 104857600 (2259393.46 ops/sec)
102400.00 MB transferred (2206.44 MB/sec)
Test execution summary:
total time: 46.4096s
total number of events: 104857600
total time taken by event execution: 39.3877
per-request statistics:
min: 0.00ms
avg: 0.00ms
max: 0.19ms
approx. 95 percentile: 0.00ms
Threads fairness:
events (avg/stddev): 104857600.0000/0.00
execution time (avg/stddev): 39.3877/0.00
sysbench 0.4.10: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Doing thread subsystem performance test
Thread yields per test: 1000 Locks used: 8
Threads started!
Done.
Test execution summary:
total time: 2.0137s
total number of events: 10000
total time taken by event execution: 2.0128
per-request statistics:
min: 0.20ms
avg: 0.20ms
max: 0.33ms
approx. 95 percentile: 0.20ms
Threads fairness:
events (avg/stddev): 10000.0000/0.00
execution time (avg/stddev): 2.0128/0.00
sysbench 0.4.10: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Doing mutex performance test
Threads started!
Done.
Test execution summary:
total time: 0.0021s
total number of events: 1
total time taken by event execution: 0.0020
per-request statistics:
min: 2.01ms
avg: 2.01ms
max: 2.01ms
approx. 95 percentile: 10000000.00ms
Threads fairness: events (avg/stddev): 1.0000/0.00
execution time (avg/stddev): 0.0020/0.00
disks
This machine is also what is currently (May 2014) running graphite, thus it makes sense to have a syntetic benchmark of the raid itself. Graphite (carbon-cache) results in many small writes, thus fio was run with a config like the one below. Note that 8 matches the number of carbon-cache that write to disk.
# cat carbon-cache.job [global] bs=4k rw=randwrite write_bw_log write_lat_log [carbon-cache] group_reporting=1 directory=/a/tmp/ numjobs=8 size=10g
And executed like this: fio --output carbon-cache.out --bandwidth-log --latency-log carbon-cache.job
# cat carbon-cache.out
carbon-cache: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
...
carbon-cache: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
fio 1.59
Starting 8 processes
carbon-cache: (groupid=0, jobs=8): err= 0: pid=13789
write: io=81920MB, bw=29500KB/s, iops=7375 , runt=2843564msec
clat (usec): min=2 , max=132888 , avg=1065.97, stdev=4886.38
lat (usec): min=2 , max=132888 , avg=1066.14, stdev=4886.38
bw (KB/s) : min= 1, max=964232, per=12.68%, avg=3739.50, stdev=18964.43
cpu : usr=0.06%, sys=0.82%, ctx=1093250, majf=0, minf=247055
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=0/20971520/0, short=0/0/0
lat (usec): 4=8.98%, 10=79.06%, 20=6.70%, 50=0.06%, 100=0.03%
lat (usec): 250=0.01%, 500=0.01%, 750=0.01%
lat (msec): 2=0.01%, 4=0.01%, 20=4.61%, 50=0.51%, 100=0.04%
lat (msec): 250=0.01%
Run status group 0 (all jobs):
WRITE: io=81920MB, aggrb=29500KB/s, minb=30208KB/s, maxb=30208KB/s, mint=2843564msec, maxt=2843564msec
Disk stats (read/write):
dm-0: ios=1/13680669, merge=0/0, ticks=1768/415097120, in_queue=415108824, util=99.96%, aggrios=330/13701997, aggrmerge=10/329684, aggrticks=474592/413980824, aggrin_queue=414435312, aggrutil=99.96%
sda: ios=330/13701997, merge=10/329684, ticks=474592/413980824, in_queue=414435312, util=99.96%
So with 4k random writes and 8 processes, using read/write it looks like there can be ~7.3k iops sustained from the raid controller.
TODO more comprehensive tests, with varying block sizes and sync/async io etc and performance comparation