Raid and MegaCli
Raid setup at Wikimedia
RAID setups for database:
- raid-10
- 256k stripe
- writeback cache
- no read ahead
How to set up the initial raid group
Querying the RAID card
Get event logs
# installs megacli if not present and output logs to all.txt sudo apt install -qqy megacli && sudo megacli -AdpAliLog -a0>all.txt
Logical device info
Things to look for here:
- Stripe size
- Cache policy
root@db1047:/a/sqldata# megacli -LDInfo -Lall -Aall
Adapter 0 -- Virtual Drive Information:
Virtual Disk: 0 (Target Id: 0)
Name:
RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
Size:1713408MB
State: Optimal
Stripe Size: 256kB
Number Of Drives:2
Span Depth:6
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Exit Code: 0x00
root@db1047:/a/sqldata#
Firmware, serial number, and boatloads of other default config
root@db1047:/a/sqldata# megacli -AdpAllInfo -aALL | less
Adapter #0
==============================================================================
Versions
================
Product Name : PERC H700 Integrated
Serial No : 11L006J
FW Package Build: 12.10.0-0025
Mfg. Data
================
Mfg. Date : 01/22/11
Rework Date : 01/22/11
Revision No : A00
Battery FRU : N/A
Image Versions In Flash:
================
BIOS Version : 3.18.00_4.09.05.00_0x0416A000
FW Version : 2.100.03-1046
Preboot CLI Version: 04.04-010:#%00008
Ctrl-R Version : 2.02-0025
Boot Block Version : 2.02.00.00-0000
MORE...
Icinga checks
As of 2017, check_raid python script only checks the state of the controller (Optimal, failed, rebuilding, etc.), the logical disks and the pysical disk state:
sudo /usr/local/lib/nagios/plugins/check_raid [megacli] OK: optimal, 1 logical, 2 physical
An optional parameter was introduced to also check the write cache policy:
sudo /usr/local/lib/nagios/plugins/check_raid --policy WriteBack megacli OK: optimal, 1 logical, 2 physical, WriteBack policy
It will complain if for any reason the policy is different than the one given: /usr/local/lib/nagios/plugins/check_raid --policy WriteThrough CRITICAL: 1 LD(s) must have write cache policy WriteThrough, currently using: WriteBack
It does not have into account the state of the BBU, size and existence of the cache, etc.
It is enabled on puppet by setting the ::raid class parameter $write_cache_policy, normally set through profile::base::check_raid_policy