Jump to content

Raid and MegaCli

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Raid setup at Wikimedia

RAID setups for database:

  • raid-10
  • 256k stripe
  • writeback cache
  • no read ahead

How to set up the initial raid group

Querying the RAID card

Get event logs

# installs megacli if not present and output logs to all.txt
sudo apt install -qqy megacli && sudo megacli -AdpAliLog -a0>all.txt

Logical device info

Things to look for here:

  • Stripe size
  • Cache policy
root@db1047:/a/sqldata# megacli -LDInfo -Lall -Aall
                                     

Adapter 0 -- Virtual Drive Information:
Virtual Disk: 0 (Target Id: 0)
Name:
RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
Size:1713408MB
State: Optimal
Stripe Size: 256kB
Number Of Drives:2
Span Depth:6
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default

Exit Code: 0x00
root@db1047:/a/sqldata# 

Firmware, serial number, and boatloads of other default config

root@db1047:/a/sqldata# megacli -AdpAllInfo -aALL | less

Adapter #0

==============================================================================
                    Versions
                ================
Product Name    : PERC H700 Integrated
Serial No       : 11L006J
FW Package Build: 12.10.0-0025

                    Mfg. Data
                ================
Mfg. Date       : 01/22/11
Rework Date     : 01/22/11
Revision No     : A00   
Battery FRU     : N/A   

                Image Versions In Flash:
                ================
BIOS Version       : 3.18.00_4.09.05.00_0x0416A000
FW Version         : 2.100.03-1046
Preboot CLI Version: 04.04-010:#%00008
Ctrl-R Version     : 2.02-0025
Boot Block Version : 2.02.00.00-0000

MORE...

Icinga checks

As of 2017, check_raid python script only checks the state of the controller (Optimal, failed, rebuilding, etc.), the logical disks and the pysical disk state:

sudo /usr/local/lib/nagios/plugins/check_raid [megacli]
OK: optimal, 1 logical, 2 physical

An optional parameter was introduced to also check the write cache policy:

sudo /usr/local/lib/nagios/plugins/check_raid --policy WriteBack megacli
OK: optimal, 1 logical, 2 physical, WriteBack policy

It will complain if for any reason the policy is different than the one given: /usr/local/lib/nagios/plugins/check_raid --policy WriteThrough CRITICAL: 1 LD(s) must have write cache policy WriteThrough, currently using: WriteBack

It does not have into account the state of the BBU, size and existence of the cache, etc.

It is enabled on puppet by setting the ::raid class parameter $write_cache_policy, normally set through profile::base::check_raid_policy

See Also