All of lore.kernel.org
 help / color / mirror / Atom feed
* DAC960 w/SAF-TE enclosure problems
@ 2004-01-13 21:20 Tony J. White
  0 siblings, 0 replies; only message in thread
From: Tony J. White @ 2004-01-13 21:20 UTC (permalink / raw)
  To: linux-kernel


I recently picked up a Mylex AcceleRAID 170 RAID controller and a 
SuperMicro CSE-M35S SCSI Accessed Fault-Tolerant Enclosure (SAF-TE) and have 
found a problem with my setup in testing.   When I simulate a disk failure
by pulling out one of the drives following things happen: 

1. Alarm sounds.
2. /proc/rd/status becomes ALERT
3. /proc/rd/c0/current_status displays "Disk Status: Dead, 17747968 blocks"
   for the physical device and "No Rebuild or Consistency Check in Progress"
4. The following is logged:
   Physical Device 0:0 Failed because Device Cannot Be Accessed
   Logical Drive 0 (/dev/rd/c0d0) Critical
   Physical Device 0:0 is now DEAD
   Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL


When I reinsert the drive, the following things happen:

1. Alarm stops.
2. /proc/rd/c0/current_status displays "Disk Status: Rebuild, 17747968 blocks"
   for the physical device and continues to display 
   "No Rebuild or Consistency Check in Progress"
3. The following is logged:
   Physical Device 0:0 Found
   Physical Device 0:0 Automatic Rebuild Started
   Physical Device 0:0 is now REBUILD
   Logical Drive 0 (/dev/rd/c0d0) Automatic Rebuild Started
   Logical Drive 0 (/dev/rd/c0d0) Rebuild Completed
   Physical Device 0:0 Rebuild Completed
   Physical Device 0:0 Online
   Logical Drive 0 (/dev/rd/c0d0) Online

At this point, /proc/rd/c0/current_status changes the "No Rebuild or 
Consistency Check in Progress" message to "Logical Drive 0 (/dev/rd/c0d0) 
Rebuild Completed".  However the following problems occur:

1. /proc/rd/status is never set back to OK, it remains ALERT
2. The activity indicator light on the sled for the drive that has been 
   rebuilt never goes off, although the red error light does.
3. The Physical Device information in /proc/rd/c0/current_status remains as
   "Disk Status: Rebuild, 17747968 blocks" 

If I reboot the machine, all is well.  That sort of defeats the purpose of
all this expensive hardware though :)  I also tried to:
echo "flush-cache" > /proc/rd/c0/user_command
but that just locked up the terminal I was typing in with no messages
anywhere.

Is it possible that the DAC960 driver should be doing something here that
it's not?  Or is all this functionality supposed to be completely taken care 
of at the firmware level of the RAID card?  I noticed DAC960.c does do some 
things differently if a SAF-TE is present, but only in the _V1 functions,
this card uses the _V2 functions.  I have tested this with Linux 2.4.23 and 
2.6.1 which produce exactly the same results.

Please CC me on any responses.

-Tony

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2004-01-13 21:20 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-13 21:20 DAC960 w/SAF-TE enclosure problems Tony J. White

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.