Re: multipath md devices

* Re: multipath md devices
@ 2004-09-22 21:55 Doug Griswold
  0 siblings, 0 replies; 7+ messages in thread
From: Doug Griswold @ 2004-09-22 21:55 UTC (permalink / raw)
  To: linux-raid

>>> Doug Ledford <dledford@redhat.com> 09/22/04 5:35 PM >>>
On Wed, 2004-09-22 at 16:41, Anu Matthew wrote:
> Hi,
> 
> We have multipath devices created on SAN Luns. Say md0 is created on 
> /dev/sdj and /dev/sde, the latter being the alternate path for
/dev/sdj.
> 
> I've noticed the following:
> 
> 1) Without much IO to the md device, and  I pull out the cable to say 
> /dev/sdj, the /proc/mdstat still shows both devices.  /proc/mdstat
won't 
> get updated unless I start some considerable IO to the md device. Even

> mdadm scan/query o/p shows both the paths, which is not true. As we 
> start IO, /proc/mdstat reflects that one of the devices, /dev/sdj in 
> this case, has failed. Thereafter mdadm outputs would be correct too.
> 
> The entries (link down) in syslog and dmesg are almost instantaneous 
> when the cable is pulled out. This makes it very difficult to monitor 
> multipath devices, as we cannot rely on /proc/mdstat to read.  

> /proc/mdstat will be correct once the first physical read/write on the
> yanked path fails.

Is this true even if the lightpath is not dead?  

> 2) Another situation: Device md0 is active, with healthy multipaths 
> /dev/sdj and /dev/sde, under reasonable IO activity. If the cable to 
> /dev/sdj is yanked out, md0 remains still active, thanks to the 
> alternate path, sde. However, it fails to go back and re-construct the

> spare path allocation even after the fibre link is restored. Here, if
I 
> pull the cable out for sde even after 30 minutes, the machine ends up 
> failing to write to /dev/md0 as it does not care whether /dev/sdj is 
> back online, unless I failed, removed and add /dev/sdj  manually from 
> the mdadm command line. If something is hard mounted on /dev/md0, it
may 
> end up in a system crash.
> 
> To conclude, if one path goes off, and comes back after a while, and 
> then the second path goes off, md0 cannot be read, unless someone 
> manually did fail, remove and add the first device which came back 
> online, before the second path goes off.

> Yeah, IBM wrote a little app to help with that.  We stuffed it into
the
> mdadm package we ship since that seemed the most appropriate place for
> it.  It's called mdmpd and that's it's job basically.  Very simple
app,
> but doesn't run on upstream kernels at the moment (it wants the md
event
> interface which hasn't yet been submitted upstream by Neil).

> Any help towards this will be much appreciated.
> 
> Thanks,
> 
> --AM.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc.
         1801 Varsity Dr.
         Raleigh, NC 27606

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread