I have a server running VMware ESXi v4.1.0 348481. It has a hardware RAID10 and a SATA backup drive. I have a VM running which has it's primary boot vmdk on the RAID10 datastore, and a 600 GB vmdk on the SATA backup drive's datastore. The VM runs Debian linux with the FreeBSD kernel, and uses ZFS for the backup drive.
EDIT: The drive is not directly attached to the VM. It is used as a VMware Datastore, and the VM has a vmdk on the SATA drive's datastore. The datastore is not full (only 65% full)
I logged in to the server using SSH and found that last night backup was hung, and zfs list or zpool list both hung. So I opened the virtual console in ESXi and was sad to see:

(da1:mpt0:0:1:0): READ(10). CDC: 28 0 19 97 3a 50 0 0 2d 0 (da1:mpt0:0:1:0): CAM status: SCSI Status Error (da1:mpt0:0:1:0): SCSI status: Check Condition (da1:mpt0:0:1:0): SCSI sense: MEDIUM ERROR info:4862ec asc:11,4 (Unrecovered read error - auto reallocate failed) (da1:mpt0:0:1:0): READ(10). CDC: 28 0 19 97 3a 50 0 0 2d 0 (da1:mpt0:0:1:0): CAM status: SCSI Status Error (da1:mpt0:0:1:0): SCSI status: Check Condition (da1:mpt0:0:1:0): SCSI sense: MEDIUM ERROR info:4862ec asc:11,4 (Unrecovered read error - auto reallocate failed) I tried to reboot the VM and I received a message that the system was going down for reboot, and then that hung. (^C appears but does not kill shutdown). I cannot interrupt or kill -9 the zpool list zfs list or rsync processes -- Nothing happens when I try.
- Does this ndicate the backup SATA drive is failing? Or could this just be an ESXi error?
- How in the vSphere client could I tell if the drive is failing? I didn't see any indication, everything under Hardware Health Status looks good, and I saw nothing under the Storage config.
- How should I proceed from here? Should I just hard reboot the VM?
UPDATE: I just hard rebooted the VM. After it came back online, the backup zpool was online, however:
root@timestandstill:/home/jnet# zpool status -v pool: backup state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM backup ONLINE 0 0 0 da1 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /backups/someserver/home/someuser/public_html/somedir/calendar/someuser/calendars/somefile.ics I am leaning heavily towards replacing the drive...