1

I have an Intel Xeon-based system running Red Hat Enterprise Linux 9 with an Intel Software RAID configured. I'm using the mdadm utility for managing and monitoring software RAID devices, and I need to understand the possible RAID failure statuses. My goal is to monitor the status of the RAID arrays and be able to identify when they are in a degraded state, missing drives, or have failed devices.

Useful Commands:

  1. cat /proc/mdstat
    This command helps identify the active RAID devices. For example, I can see different RAID arrays with various states like "active" and "inactive."

    Example output:

    # cat /proc/mdstat Personalities : [raid1] md125 : inactive nvme2n1[0](S) 1105 blocks super external:imsm md126 : active raid1 nvme0n1[1] nvme1n1[0] 890806272 blocks super external:/md127/0 [2/2] [UU] md127 : inactive nvme1n1[1](S) nvme0n1[0](S) 10402 blocks super external:imsm unused devices: <none> 
  2. mdadm --detail /dev/md126
    This command provides detailed RAID volume information. The State field in the output indicates the health of the RAID volume.

    Example output:

    # mdadm --detail /dev/md126 /dev/md126: Raid Level : raid1 Array Size : 890806272 (849.54 GiB 912.19 GB) State : active Active Devices : 2 Failed Devices : 0 Consistency Policy : resync 

RAID Failure and Degradation Scenarios:

  • If a Hard Disk is Missing:
    What will be the output and RAID status in the mdadm --detail and cat /proc/mdstat commands? Specifically, how will the RAID array reflect the missing disk's status?

  • If a Hard Disk is Offline:
    How will this affect the RAID status shown by these commands? Will the status change to something like offlinesyncing or degraded?

  • If a RAID Array is Degraded:
    What status will be reflected in the output? Specifically, how does the term "degraded" appear in the RAID status in both mdadm output and /proc/mdstat?

  • If a RAID is in a FailSpare State:
    What is the output in mdadm --detail for a RAID array in this state? How is this state reflected in the status of the RAID array and individual drives?

I have found the following statuses in the mdadm manual:

  • Critical Severity:

    • Fail, FailSpare, DeviceDisappeared, DegradedArray
  • Warning Severity:

    • RebuildStarted, RebuildNN, RebuildFinished, SparesMissing

Could you explain what these status values mean, especially when monitoring RAID arrays for failure or degradation?

Additional Queries:

  1. Is it better to use the mdadm --detail /dev/md<number> command to check the complete RAID status, or should I use mdadm --examine /dev/<disk> on any RAID member disk? For example:

    # mdadm --examine /dev/nvme1n1 

    The output from this command provides information about the disk and its current state, but I am unsure about the relevance of this command compared to mdadm --detail.

Thank you for your assistance.

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.