Monitoring software RAID status (mdstat)
Mdadm is a Linux utility to manage and monitor software RAID. The mdadm monitoring plugin keeps track of failed disks in a RAID setup.
- Faulty
- Active
- Resync
- Read only
This plugin needs the mdstat python plugin, you can install this by running pip install mdstat.
To fetch the status we also have to give this plugin sudo access, edit /etc/sudoers and add the following at the end of the file.
This is considering mdjson is located at /usr/local/bin/mdjson to verify it's location run whereis mdjson
Run sudo -u nixstats nixstatsagent test mdstat to check if it's returning any data.
Open /etc/nixstats.ini and append the following lines at the end of the file.
Restart the plugin by running service nixstatsagent restart
Click on the Metrics link on the top menu, now select "mdstat" as metric type and active/faulty/read_only/resync as metric, choose the servers you would like to graph and save it to your dashboard.
You can create an alert for in case the faulty metric is higher than zero, indicating that a server has a bad drive.
Mdadm metrics
- Faulty
- Active
- Resync
- Read only
Mdadm dependancies
This plugin needs the mdstat python plugin, you can install this by running pip install mdstat.
To fetch the status we also have to give this plugin sudo access, edit /etc/sudoers and add the following at the end of the file.
nixstats ALL=(ALL) NOPASSWD: /usr/local/bin/mdjson
This is considering mdjson is located at /usr/local/bin/mdjson to verify it's location run whereis mdjson
Testing the mdadm plugin
Run sudo -u nixstats nixstatsagent test mdstat to check if it's returning any data.
root@nixstats:~# sudo -u nixstats nixstatsagent test mdstat
mdstat:
{
"md0": {
"active": 1,
"faulty": 0,
"read_only": 0,
"resync": 0
},
"md1": {
"active": 1,
"faulty": 0,
"read_only": 0,
"resync": 0
},
"md2": {
"active": 1,
"faulty": 0,
"read_only": 0,
"resync": 0
}
}
Enable the plugin
Open /etc/nixstats.ini and append the following lines at the end of the file.
[mdstat]
enabled = yes
Restart the plugin by running service nixstatsagent restart
Creating charts
Click on the Metrics link on the top menu, now select "mdstat" as metric type and active/faulty/read_only/resync as metric, choose the servers you would like to graph and save it to your dashboard.
You can create an alert for in case the faulty metric is higher than zero, indicating that a server has a bad drive.
Updated on: 15/03/2018
Thank you!