All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: multipath md devices
@ 2004-09-22 21:55 Doug Griswold
  0 siblings, 0 replies; 7+ messages in thread
From: Doug Griswold @ 2004-09-22 21:55 UTC (permalink / raw)
  To: linux-raid

>>> Doug Ledford <dledford@redhat.com> 09/22/04 5:35 PM >>>
On Wed, 2004-09-22 at 16:41, Anu Matthew wrote:
> Hi,
> 
> We have multipath devices created on SAN Luns. Say md0 is created on 
> /dev/sdj and /dev/sde, the latter being the alternate path for
/dev/sdj.
> 
> I've noticed the following:
> 
> 1) Without much IO to the md device, and  I pull out the cable to say 
> /dev/sdj, the /proc/mdstat still shows both devices.  /proc/mdstat
won't 
> get updated unless I start some considerable IO to the md device. Even

> mdadm scan/query o/p shows both the paths, which is not true. As we 
> start IO, /proc/mdstat reflects that one of the devices, /dev/sdj in 
> this case, has failed. Thereafter mdadm outputs would be correct too.
> 
> The entries (link down) in syslog and dmesg are almost instantaneous 
> when the cable is pulled out. This makes it very difficult to monitor 
> multipath devices, as we cannot rely on /proc/mdstat to read.  

> /proc/mdstat will be correct once the first physical read/write on the
> yanked path fails.

Is this true even if the lightpath is not dead?  

> 2) Another situation: Device md0 is active, with healthy multipaths 
> /dev/sdj and /dev/sde, under reasonable IO activity. If the cable to 
> /dev/sdj is yanked out, md0 remains still active, thanks to the 
> alternate path, sde. However, it fails to go back and re-construct the

> spare path allocation even after the fibre link is restored. Here, if
I 
> pull the cable out for sde even after 30 minutes, the machine ends up 
> failing to write to /dev/md0 as it does not care whether /dev/sdj is 
> back online, unless I failed, removed and add /dev/sdj  manually from 
> the mdadm command line. If something is hard mounted on /dev/md0, it
may 
> end up in a system crash.
> 
> To conclude, if one path goes off, and comes back after a while, and 
> then the second path goes off, md0 cannot be read, unless someone 
> manually did fail, remove and add the first device which came back 
> online, before the second path goes off.

> Yeah, IBM wrote a little app to help with that.  We stuffed it into
the
> mdadm package we ship since that seemed the most appropriate place for
> it.  It's called mdmpd and that's it's job basically.  Very simple
app,
> but doesn't run on upstream kernels at the moment (it wants the md
event
> interface which hasn't yet been submitted upstream by Neil).

> Any help towards this will be much appreciated.
> 
> Thanks,
> 
> --AM.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc.
         1801 Varsity Dr.
         Raleigh, NC 27606


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multipath md devices
  2004-09-24  1:10     ` Neil Brown
@ 2004-09-24 16:41       ` Doug Ledford
  0 siblings, 0 replies; 7+ messages in thread
From: Doug Ledford @ 2004-09-24 16:41 UTC (permalink / raw)
  To: Neil Brown; +Cc: Luca Berra, linux-raid

On Thu, 2004-09-23 at 21:10, Neil Brown wrote:
> On Thursday September 23, bluca@comedia.it wrote:
> > On Wed, Sep 22, 2004 at 05:35:23PM -0400, Doug Ledford wrote:
> > >Yeah, IBM wrote a little app to help with that.  We stuffed it into the
> > >mdadm package we ship since that seemed the most appropriate place for
> > >it.  It's called mdmpd and that's it's job basically.  Very simple app,
> > >but doesn't run on upstream kernels at the moment (it wants the md event
> > >interface which hasn't yet been submitted upstream by Neil).
> > 
> > iirc the it has been submitted and refused
> > http://marc.theaimsgroup.com/?t=109417961300007&r=1&w=2&n=8
> > 
> 
> Well..... it's still alive in -mm, and where there's life, there's
> hope (and need of vitals, as the old Gaffer usually added:-).

I didn't know that.  You know, I resubbed to linux-kernel about 2 days
after this conversation ended, so I read the archives yesterday and
wanted to write some responses to what was in there today.  Now, whether
I'll have the time to get to it today or not is one thing, but I am
planning on responding and it's nice to know that it's not *totally*
dead already.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc.
         1801 Varsity Dr.
         Raleigh, NC 27606



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multipath md devices
  2004-09-23  5:46   ` Luca Berra
@ 2004-09-24  1:10     ` Neil Brown
  2004-09-24 16:41       ` Doug Ledford
  0 siblings, 1 reply; 7+ messages in thread
From: Neil Brown @ 2004-09-24  1:10 UTC (permalink / raw)
  To: Luca Berra; +Cc: linux-raid

On Thursday September 23, bluca@comedia.it wrote:
> On Wed, Sep 22, 2004 at 05:35:23PM -0400, Doug Ledford wrote:
> >Yeah, IBM wrote a little app to help with that.  We stuffed it into the
> >mdadm package we ship since that seemed the most appropriate place for
> >it.  It's called mdmpd and that's it's job basically.  Very simple app,
> >but doesn't run on upstream kernels at the moment (it wants the md event
> >interface which hasn't yet been submitted upstream by Neil).
> 
> iirc the it has been submitted and refused
> http://marc.theaimsgroup.com/?t=109417961300007&r=1&w=2&n=8
> 

Well..... it's still alive in -mm, and where there's life, there's
hope (and need of vitals, as the old Gaffer usually added:-).

NeilBrown

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multipath md devices
  2004-09-22 21:35 ` Doug Ledford
  2004-09-22 22:20   ` Anu Matthew
@ 2004-09-23  5:46   ` Luca Berra
  2004-09-24  1:10     ` Neil Brown
  1 sibling, 1 reply; 7+ messages in thread
From: Luca Berra @ 2004-09-23  5:46 UTC (permalink / raw)
  To: linux-raid

On Wed, Sep 22, 2004 at 05:35:23PM -0400, Doug Ledford wrote:
>Yeah, IBM wrote a little app to help with that.  We stuffed it into the
>mdadm package we ship since that seemed the most appropriate place for
>it.  It's called mdmpd and that's it's job basically.  Very simple app,
>but doesn't run on upstream kernels at the moment (it wants the md event
>interface which hasn't yet been submitted upstream by Neil).

iirc the it has been submitted and refused
http://marc.theaimsgroup.com/?t=109417961300007&r=1&w=2&n=8

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multipath md devices
  2004-09-22 21:35 ` Doug Ledford
@ 2004-09-22 22:20   ` Anu Matthew
  2004-09-23  5:46   ` Luca Berra
  1 sibling, 0 replies; 7+ messages in thread
From: Anu Matthew @ 2004-09-22 22:20 UTC (permalink / raw)
  To: linux-raid

// 
Yeah, IBM wrote a little app to help with that.  We stuffed it into the
mdadm package we ship since that seemed the most appropriate place for
it.  It's called mdmpd and that's it's job basically.  Very simple app,
but doesn't run on upstream kernels at the moment (it wants the md event
interface which hasn't yet been submitted upstream by Neil). 
//


Doug, thanks a lot, we appreciate your response.

One of our boxes with the multipath problem (sic) runs on RHEL ES3.0 
with active RHN subscription and is at 2.4.21-15.ELsmp, and has 
mdadm-1.5.0-3, with mdmpd running. Yet it crashed last week as the paths 
went away from a SAN switch f/w upgrade, as they took off the paths one 
4 hours after another, but never both the paths at the same time.

I saw your post on Bugzilla, dated 2004-06-07 about the md Event 
Interface Patch. Do we still have to throw that patch in, manually? Or 
does 2.4.21-20.ELsmp have it already? I tried to read up the Change Log 
and File lists, could gather nothing much.

Thanks,

--Anu Matthew


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multipath md devices
  2004-09-22 20:41 Anu Matthew
@ 2004-09-22 21:35 ` Doug Ledford
  2004-09-22 22:20   ` Anu Matthew
  2004-09-23  5:46   ` Luca Berra
  0 siblings, 2 replies; 7+ messages in thread
From: Doug Ledford @ 2004-09-22 21:35 UTC (permalink / raw)
  To: Anu Matthew; +Cc: linux-raid

On Wed, 2004-09-22 at 16:41, Anu Matthew wrote:
> Hi,
> 
> We have multipath devices created on SAN Luns. Say md0 is created on 
> /dev/sdj and /dev/sde, the latter being the alternate path for /dev/sdj.
> 
> I've noticed the following:
> 
> 1) Without much IO to the md device, and  I pull out the cable to say 
> /dev/sdj, the /proc/mdstat still shows both devices.  /proc/mdstat won't 
> get updated unless I start some considerable IO to the md device. Even 
> mdadm scan/query o/p shows both the paths, which is not true. As we 
> start IO, /proc/mdstat reflects that one of the devices, /dev/sdj in 
> this case, has failed. Thereafter mdadm outputs would be correct too.
> 
> The entries (link down) in syslog and dmesg are almost instantaneous 
> when the cable is pulled out. This makes it very difficult to monitor 
> multipath devices, as we cannot rely on /proc/mdstat to read.  

/proc/mdstat will be correct once the first physical read/write on the
yanked path fails.

> 2) Another situation: Device md0 is active, with healthy multipaths 
> /dev/sdj and /dev/sde, under reasonable IO activity. If the cable to 
> /dev/sdj is yanked out, md0 remains still active, thanks to the 
> alternate path, sde. However, it fails to go back and re-construct the 
> spare path allocation even after the fibre link is restored. Here, if I 
> pull the cable out for sde even after 30 minutes, the machine ends up 
> failing to write to /dev/md0 as it does not care whether /dev/sdj is 
> back online, unless I failed, removed and add /dev/sdj  manually from 
> the mdadm command line. If something is hard mounted on /dev/md0, it may 
> end up in a system crash.
> 
> To conclude, if one path goes off, and comes back after a while, and 
> then the second path goes off, md0 cannot be read, unless someone 
> manually did fail, remove and add the first device which came back 
> online, before the second path goes off.

Yeah, IBM wrote a little app to help with that.  We stuffed it into the
mdadm package we ship since that seemed the most appropriate place for
it.  It's called mdmpd and that's it's job basically.  Very simple app,
but doesn't run on upstream kernels at the moment (it wants the md event
interface which hasn't yet been submitted upstream by Neil).

> Any help towards this will be much appreciated.
> 
> Thanks,
> 
> --AM.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc.
         1801 Varsity Dr.
         Raleigh, NC 27606



^ permalink raw reply	[flat|nested] 7+ messages in thread

* multipath md devices
@ 2004-09-22 20:41 Anu Matthew
  2004-09-22 21:35 ` Doug Ledford
  0 siblings, 1 reply; 7+ messages in thread
From: Anu Matthew @ 2004-09-22 20:41 UTC (permalink / raw)
  To: linux-raid

Hi,

We have multipath devices created on SAN Luns. Say md0 is created on 
/dev/sdj and /dev/sde, the latter being the alternate path for /dev/sdj.

I've noticed the following:

1) Without much IO to the md device, and  I pull out the cable to say 
/dev/sdj, the /proc/mdstat still shows both devices.  /proc/mdstat won't 
get updated unless I start some considerable IO to the md device. Even 
mdadm scan/query o/p shows both the paths, which is not true. As we 
start IO, /proc/mdstat reflects that one of the devices, /dev/sdj in 
this case, has failed. Thereafter mdadm outputs would be correct too.

The entries (link down) in syslog and dmesg are almost instantaneous 
when the cable is pulled out. This makes it very difficult to monitor 
multipath devices, as we cannot rely on /proc/mdstat to read.  

2) Another situation: Device md0 is active, with healthy multipaths 
/dev/sdj and /dev/sde, under reasonable IO activity. If the cable to 
/dev/sdj is yanked out, md0 remains still active, thanks to the 
alternate path, sde. However, it fails to go back and re-construct the 
spare path allocation even after the fibre link is restored. Here, if I 
pull the cable out for sde even after 30 minutes, the machine ends up 
failing to write to /dev/md0 as it does not care whether /dev/sdj is 
back online, unless I failed, removed and add /dev/sdj  manually from 
the mdadm command line. If something is hard mounted on /dev/md0, it may 
end up in a system crash.

To conclude, if one path goes off, and comes back after a while, and 
then the second path goes off, md0 cannot be read, unless someone 
manually did fail, remove and add the first device which came back 
online, before the second path goes off.

Any help towards this will be much appreciated.

Thanks,

--AM.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-09-24 16:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-22 21:55 multipath md devices Doug Griswold
  -- strict thread matches above, loose matches on Subject: below --
2004-09-22 20:41 Anu Matthew
2004-09-22 21:35 ` Doug Ledford
2004-09-22 22:20   ` Anu Matthew
2004-09-23  5:46   ` Luca Berra
2004-09-24  1:10     ` Neil Brown
2004-09-24 16:41       ` Doug Ledford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.