On Sat, 17 Sep 2011 19:16:50 -0600 Jim Schatzman
<james.schatzman@futurelabusa.com> wrote:

> Mike-
> 
> I have seen very similar problems. I regret that electronics engineers cannot design more secure connectors. eSata connector are terrible - they come loose at the slightest tug. For this reason, I am gradually abandoning eSata enclosures and going to internal drives only. Fortunately, there are some inexpensive RAID chassis available now.
> 
> I tried the same thing as you. I removed the array(s) from mdadm.conf and I wrote a script for "/etc/cron.reboot" which assembles the array, "no-degraded". Doing this seems to minimize the damage caused by drives prior to a reboot. However, if the drives are disconnected while Linux is up, then either the array will stay up but some drives will become stale or the array will be stopped. The behavior I usually see is that all the drives that went offline now become "spare".
> 
> It would be nice if md would just reassemble the array once all the drives come back online. Unfortunately, it doesn't. I would run mdadm -E against all the drives/partitions, verifying that the metadata all indicates that they are/were part of the expected array. At that point, you should be able ro re-create the RAID. Be sure you list the drives in the correct order. Once the array is going again, mount the resulting partitions RO and verify that the data is o.k. before going RW.

mdadm certainly can "just reassemble the array once all the drives come ...
online".

If you have udev configured to run "mdadm -I device-name" when a device
appears, then as soon as all required devices have appeared the array will be
started.

It would be good to have better handling of "half the devices disappeared",
particular if this is notice while trying to read or while trying to mark the
array "dirty" in preparation for write.
If it happens during a real 'write' it is a bit harder to handle cleanly.

I should add that to my list :-)

NeilBrown