* [PING / RESEND] handling reservation conflicts in dm-mpath
@ 2016-08-02 12:36 Christoph Hellwig
2016-08-02 12:36 ` [PATCH] dm-mpath: always return reservation conflict Christoph Hellwig
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2016-08-02 12:36 UTC (permalink / raw)
To: dm-devel, linux-scsi; +Cc: hare
Hannes sent this patch a bit more than a year ago, but it got silently
dropped. When using the pNFS SCSI layout we can easily hit a
failover "livelock" without it as the reservation conflicts on a
newly detected device that doesn't have layouts yet, or after a fency
will keep failing over from one path to another for no reason at all.
Any chance to get it into mainline now?
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] dm-mpath: always return reservation conflict
2016-08-02 12:36 [PING / RESEND] handling reservation conflicts in dm-mpath Christoph Hellwig
@ 2016-08-02 12:36 ` Christoph Hellwig
2016-08-11 18:38 ` [dm-devel] " Christoph Hellwig
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2016-08-02 12:36 UTC (permalink / raw)
To: dm-devel, linux-scsi; +Cc: hare, Hannes Reinecke
From: Hannes Reinecke <hare@suse.de>
If dm-mpath encounters an reservation conflict it should not fail the
path (as communication with the target is not affected) but should
rather retry on another path. However, in doing so we might be inducing
a ping-pong between paths, with no guarantee of any forward progress.
And arguably a reservation conflict is an unexpected error, so we should
be passing it upwards to allow the application to take appropriate steps.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/dm-mpath.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 7eac080..8d2f916 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1555,16 +1555,22 @@ static int do_end_io(struct multipath *m, struct request *clone,
if (noretry_error(error))
return error;
- if (mpio->pgpath)
+ /*
+ * EBADE signals an reservation conflict.
+ * We shouldn't fail the path here as we can communicate with
+ * the target. We should failover to the next path, but in
+ * doing so we might be causing a ping-pong between paths.
+ * So just return the reservation conflict error.
+ */
+ if (error == -EBADE)
+ r = error;
+ else if (mpio->pgpath)
fail_path(mpio->pgpath);
if (!atomic_read(&m->nr_valid_paths)) {
if (!test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) {
if (!must_push_back_rq(m))
r = -EIO;
- } else {
- if (error == -EBADE)
- r = error;
}
}
--
2.1.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [dm-devel] [PATCH] dm-mpath: always return reservation conflict
2016-08-02 12:36 ` [PATCH] dm-mpath: always return reservation conflict Christoph Hellwig
@ 2016-08-11 18:38 ` Christoph Hellwig
2016-08-15 13:08 ` Mike Snitzer
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2016-08-11 18:38 UTC (permalink / raw)
To: dm-devel, linux-scsi; +Cc: hare
ping?
On Tue, Aug 02, 2016 at 02:36:32PM +0200, Christoph Hellwig wrote:
> From: Hannes Reinecke <hare@suse.de>
>
> If dm-mpath encounters an reservation conflict it should not fail the
> path (as communication with the target is not affected) but should
> rather retry on another path. However, in doing so we might be inducing
> a ping-pong between paths, with no guarantee of any forward progress.
>
> And arguably a reservation conflict is an unexpected error, so we should
> be passing it upwards to allow the application to take appropriate steps.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Acked-by: Christoph Hellwig <hch@lst.de>
> Tested-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/dm-mpath.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 7eac080..8d2f916 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -1555,16 +1555,22 @@ static int do_end_io(struct multipath *m, struct request *clone,
> if (noretry_error(error))
> return error;
>
> - if (mpio->pgpath)
> + /*
> + * EBADE signals an reservation conflict.
> + * We shouldn't fail the path here as we can communicate with
> + * the target. We should failover to the next path, but in
> + * doing so we might be causing a ping-pong between paths.
> + * So just return the reservation conflict error.
> + */
> + if (error == -EBADE)
> + r = error;
> + else if (mpio->pgpath)
> fail_path(mpio->pgpath);
>
> if (!atomic_read(&m->nr_valid_paths)) {
> if (!test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) {
> if (!must_push_back_rq(m))
> r = -EIO;
> - } else {
> - if (error == -EBADE)
> - r = error;
> }
> }
>
> --
> 2.1.4
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
---end quoted text---
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: dm-mpath: always return reservation conflict
2016-08-11 18:38 ` [dm-devel] " Christoph Hellwig
@ 2016-08-15 13:08 ` Mike Snitzer
2016-08-15 13:40 ` Mike Snitzer
0 siblings, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2016-08-15 13:08 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: dm-devel, linux-scsi, hare
Not sure how Hannes' original patch was overlooked but...
One issue I see with the patch is it will return -EBADE regardless of
whether 'queue_if_no_path' is set. That's fine (since path isn't being
failed for this case any more). But why not just return error
immediately?
But taking a step back, shouldn't all paths be tried once before
returning an error? Obviously that'd impose the use of a new
'conflict_seen' (or whatever) flag at the end of 'struct pgpath'. And
then only return error if the flag is set.
I threw together the following RFC patch to illustrate what I'm
thinking, but thinking about this further it is tough to know all paths
have seen the reservation conflict (my patch assumes if 'conflict_seen'
is set then the conflict iterated through all paths.. but if paths
aren't being failed there isn't a guarantee that the path selector
didn't just hand us back the same path that just experienced the
conflict). So this is throw-away for now (and I'll get Hannes' patch
applied for 4.8-rc3, with the tweak of returning -EBADE immediately):
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index ac734e5..c3d92db 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -41,6 +41,7 @@ struct pgpath {
struct delayed_work activate_path;
bool is_active:1; /* Path status */
+ bool conflict_seen:1;
};
#define path_to_pgpath(__pgp) container_of((__pgp), struct pgpath, path)
@@ -1569,6 +1570,33 @@ static int do_end_io(struct multipath *m, struct request *clone,
if (noretry_error(error))
return error;
+ if (error == -EBADE) {
+ /*
+ * EBADE signals a reservation conflict.
+ * We shouldn't fail the path here as we can communicate with
+ * the target. We should failover to the next path, but in
+ * doing so we might be causing a ping-pong between paths.
+ * Avoid this by only returning the reservation conflict error
+ * if a conflict has been seen on all paths.
+ */
+ if (!mpio->pgpath || mpio->pgpath->conflict_seen) {
+ struct priority_group *pg;
+ struct pgpath *p;
+
+ /* clear 'conflict_seen' for all pgpaths */
+ list_for_each_entry(pg, &m->priority_groups, list) {
+ list_for_each_entry(p, &pg->pgpaths, list) {
+ p->conflict_seen = false;
+ }
+ }
+ return error;
+ }
+ else if (mpio->pgpath) {
+ mpio->pgpath->conflict_seen = true;
+ return r;
+ }
+ }
+
if (mpio->pgpath)
fail_path(mpio->pgpath);
@@ -1576,9 +1604,6 @@ static int do_end_io(struct multipath *m, struct request *clone,
if (!test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) {
if (!must_push_back_rq(m))
r = -EIO;
- } else {
- if (error == -EBADE)
- r = error;
}
}
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: dm-mpath: always return reservation conflict
2016-08-15 13:08 ` Mike Snitzer
@ 2016-08-15 13:40 ` Mike Snitzer
2016-09-26 16:52 ` [dm-devel] " Christoph Hellwig
0 siblings, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2016-08-15 13:40 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: dm-devel, linux-scsi
On Mon, Aug 15 2016 at 9:08P -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> Not sure how Hannes' original patch was overlooked but...
It wasn't overlooked. It was very much unresolved. The original thread
unraveled to all sorts of PR edge case concerns (and doubt about whether
anything relies on the current multipath handling of reservation
conflicts). See patchwork thread below.
Obviously you have found a problematic case which requires Hannes'
patch. So there is definitely increased pressure to fix this.
> One issue I see with the patch is it will return -EBADE regardless of
> whether 'queue_if_no_path' is set. That's fine (since path isn't being
> failed for this case any more). But why not just return error
> immediately?
>
> But taking a step back, shouldn't all paths be tried once before
> returning an error? Obviously that'd impose the use of a new
> 'conflict_seen' (or whatever) flag at the end of 'struct pgpath'. And
> then only return error if the flag is set.
>
> I threw together the following RFC patch to illustrate what I'm
> thinking, but thinking about this further it is tough to know all paths
> have seen the reservation conflict (my patch assumes if 'conflict_seen'
> is set then the conflict iterated through all paths.. but if paths
> aren't being failed there isn't a guarantee that the path selector
> didn't just hand us back the same path that just experienced the
> conflict).
Seems we still need a more sophisticated approach. But I'm left
wondering: if we didn't do it would anything notice? Sadly, the same
big question from the original thread from a year ago:
https://patchwork.kernel.org/patch/6797111/
> So this is throw-away for now (and I'll get Hannes' patch applied for
> 4.8-rc3, with the tweak of returning -EBADE immediately):
Unfortunately, I'm _not_ staging Hannes' patch until I have James
Bottomley's Ack (given his original issues with the patch haven't been
explained away AFAICT).
Mike
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dm-devel] dm-mpath: always return reservation conflict
2016-08-15 13:40 ` Mike Snitzer
@ 2016-09-26 16:52 ` Christoph Hellwig
2016-09-26 19:06 ` James Bottomley
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2016-09-26 16:52 UTC (permalink / raw)
To: Mike Snitzer
Cc: Christoph Hellwig, dm-devel, linux-scsi, Hannes Reinecke,
James.Bottomley
Getting back to this after Hannes recovered from his vacation
and I had a chat with him..
On Mon, Aug 15, 2016 at 09:40:39AM -0400, Mike Snitzer wrote:
> Seems we still need a more sophisticated approach. But I'm left
> wondering: if we didn't do it would anything notice? Sadly, the same
> big question from the original thread from a year ago:
Yes. I have a customer looking to push the pNFS SCSI layout into
a product, and the major show stopper right now is that we can
trivially get into failver loops without this (or and equivalent)
fix.
A year ago SCSI layout was still work in progress in the IETF,
people use the similar block layout instead that doesn't use
PRs and we also didn't have the in-kernel PR API, so you effectively
couldn't use PRs with multipathing.
> https://patchwork.kernel.org/patch/6797111/
>
> > So this is throw-away for now (and I'll get Hannes' patch applied for
> > 4.8-rc3, with the tweak of returning -EBADE immediately):
>
> Unfortunately, I'm _not_ staging Hannes' patch until I have James
> Bottomley's Ack (given his original issues with the patch haven't been
> explained away AFAICT).
I've added James to the Cc. His argument was that the old behavior
could be implemented to use some non-standard use of reservations
without a specific example. I don't really think his example even
is practical - once we use dm-mpath it exclusively claims the underling
block devices, so any sort of selective reservations would have had
to happen before even starting dm-multipath. So a dynamic SAN
controller would have to tear down and rebuild the dm-multipath setup
at all the time.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dm-devel] dm-mpath: always return reservation conflict
2016-09-26 16:52 ` [dm-devel] " Christoph Hellwig
@ 2016-09-26 19:06 ` James Bottomley
2016-09-27 6:34 ` Hannes Reinecke
0 siblings, 1 reply; 20+ messages in thread
From: James Bottomley @ 2016-09-26 19:06 UTC (permalink / raw)
To: Christoph Hellwig, Mike Snitzer
Cc: Christoph Hellwig, dm-devel, linux-scsi, Hannes Reinecke
On Mon, 2016-09-26 at 09:52 -0700, Christoph Hellwig wrote:
> Getting back to this after Hannes recovered from his vacation
> and I had a chat with him..
>
> On Mon, Aug 15, 2016 at 09:40:39AM -0400, Mike Snitzer wrote:
> > Seems we still need a more sophisticated approach. But I'm left
> > wondering: if we didn't do it would anything notice? Sadly, the
> > same
> > big question from the original thread from a year ago:
>
> Yes. I have a customer looking to push the pNFS SCSI layout into
> a product, and the major show stopper right now is that we can
> trivially get into failver loops without this (or and equivalent)
> fix.
>
> A year ago SCSI layout was still work in progress in the IETF,
> people use the similar block layout instead that doesn't use
> PRs and we also didn't have the in-kernel PR API, so you effectively
> couldn't use PRs with multipathing.
>
> > https://patchwork.kernel.org/patch/6797111/
> >
> > > So this is throw-away for now (and I'll get Hannes' patch applied
> > > for
> > > 4.8-rc3, with the tweak of returning -EBADE immediately):
> >
> > Unfortunately, I'm _not_ staging Hannes' patch until I have James
> > Bottomley's Ack (given his original issues with the patch haven't
> > been
> > explained away AFAICT).
>
> I've added James to the Cc. His argument was that the old behavior
> could be implemented to use some non-standard use of reservations
> without a specific example. I don't really think his example even
> is practical - once we use dm-mpath it exclusively claims the
> underling block devices, so any sort of selective reservations would
> have had to happen before even starting dm-multipath.
Well, now that you've made me reread the thread from 14 months ago that
wasn't quite my objection. The objection hinged on the fact that
anything that uses path specific reservations would now fail instead of
retrying on a different path. I thought the IBM SVC did this and
Hannes implied he'd be able to check this ... did anyone check? If
we've checked and there's no issue with the SVC, then I don't have any
other objections.
> So a dynamic SAN controller would have to tear down and rebuild the
> dm-multipath setup at all the time.
That was the job of the SVC: it sat in the middle of the SAN and
controlled which node saw what storage.
https://www.ibm.com/support/knowledgecenter/STPVGU/com.ibm.storage.svc.console.720.doc/svc_svcovr_1bcfiq.html
The SVC can issue its own reservations in those circumstances. What
I'm not at all clear on is whether they'll interact badly with the dm
-mp reservations.
James
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dm-devel] dm-mpath: always return reservation conflict
2016-09-26 19:06 ` James Bottomley
@ 2016-09-27 6:34 ` Hannes Reinecke
2016-09-27 18:50 ` James Bottomley
0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2016-09-27 6:34 UTC (permalink / raw)
To: James Bottomley, Christoph Hellwig, Mike Snitzer
Cc: dm-devel, Christoph Hellwig, linux-scsi, Hannes Reinecke
On 09/26/2016 09:06 PM, James Bottomley wrote:
> On Mon, 2016-09-26 at 09:52 -0700, Christoph Hellwig wrote:
>> Getting back to this after Hannes recovered from his vacation
>> and I had a chat with him..
>>
>> On Mon, Aug 15, 2016 at 09:40:39AM -0400, Mike Snitzer wrote:
>>> Seems we still need a more sophisticated approach. But I'm left
>>> wondering: if we didn't do it would anything notice? Sadly, the
>>> same
>>> big question from the original thread from a year ago:
>>
>> Yes. I have a customer looking to push the pNFS SCSI layout into
>> a product, and the major show stopper right now is that we can
>> trivially get into failver loops without this (or and equivalent)
>> fix.
>>
>> A year ago SCSI layout was still work in progress in the IETF,
>> people use the similar block layout instead that doesn't use
>> PRs and we also didn't have the in-kernel PR API, so you effectively
>> couldn't use PRs with multipathing.
>>
>>> https://patchwork.kernel.org/patch/6797111/
>>>
>>>> So this is throw-away for now (and I'll get Hannes' patch applied
>>>> for
>>>> 4.8-rc3, with the tweak of returning -EBADE immediately):
>>>
>>> Unfortunately, I'm _not_ staging Hannes' patch until I have James
>>> Bottomley's Ack (given his original issues with the patch haven't
>>> been
>>> explained away AFAICT).
>>
>> I've added James to the Cc. His argument was that the old behavior
>> could be implemented to use some non-standard use of reservations
>> without a specific example. I don't really think his example even
>> is practical - once we use dm-mpath it exclusively claims the
>> underling block devices, so any sort of selective reservations would
>> have had to happen before even starting dm-multipath.
>
> Well, now that you've made me reread the thread from 14 months ago that
> wasn't quite my objection. The objection hinged on the fact that
> anything that uses path specific reservations would now fail instead of
> retrying on a different path. I thought the IBM SVC did this and
> Hannes implied he'd be able to check this ... did anyone check? If
> we've checked and there's no issue with the SVC, then I don't have any
> other objections.
>
>> So a dynamic SAN controller would have to tear down and rebuild the
>> dm-multipath setup at all the time.
>
> That was the job of the SVC: it sat in the middle of the SAN and
> controlled which node saw what storage.
>
> https://www.ibm.com/support/knowledgecenter/STPVGU/com.ibm.storage.svc.console.720.doc/svc_svcovr_1bcfiq.html
>
> The SVC can issue its own reservations in those circumstances. What
> I'm not at all clear on is whether they'll interact badly with the dm
> -mp reservations.
>
In the end SVC is (for us) just another storage array.
If and what SVC does in the background is of no interest to us.
OTOH I'd be very surprised if the SVC would be allowing us to see
remnants of its internal working (like persistent reservation errors);
in doing so third-party applications would be able to see and possibly
modify these persistent reservations and the SVC would find itself in a
very fragile operating scenario.
Also interactions with GPFS (which uses it's own set of reservations)
will become very tricky.
So I sincerely doubt we'll ever see SVC-originated persistent
reservations errors.
And as a side-note, this particular patch is included in SLES since
2011. With no noticeable side-effect.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [dm-devel] dm-mpath: always return reservation conflict
2016-09-27 6:34 ` Hannes Reinecke
@ 2016-09-27 18:50 ` James Bottomley
2016-09-29 15:01 ` Mike Snitzer
0 siblings, 1 reply; 20+ messages in thread
From: James Bottomley @ 2016-09-27 18:50 UTC (permalink / raw)
To: Hannes Reinecke, Christoph Hellwig, Mike Snitzer
Cc: dm-devel, Christoph Hellwig, linux-scsi, Hannes Reinecke
On Tue, 2016-09-27 at 08:34 +0200, Hannes Reinecke wrote:
> On 09/26/2016 09:06 PM, James Bottomley wrote:
> > On Mon, 2016-09-26 at 09:52 -0700, Christoph Hellwig wrote:
> > > Getting back to this after Hannes recovered from his vacation
> > > and I had a chat with him..
> > >
> > > On Mon, Aug 15, 2016 at 09:40:39AM -0400, Mike Snitzer wrote:
> > > > Seems we still need a more sophisticated approach. But I'm
> > > > left wondering: if we didn't do it would anything notice?
> > > > Sadly, the same big question from the original thread from a
> > > > year ago:
> > >
> > > Yes. I have a customer looking to push the pNFS SCSI layout into
> > > a product, and the major show stopper right now is that we can
> > > trivially get into failver loops without this (or and equivalent)
> > > fix.
> > >
> > > A year ago SCSI layout was still work in progress in the IETF,
> > > people use the similar block layout instead that doesn't use
> > > PRs and we also didn't have the in-kernel PR API, so you
> > > effectively couldn't use PRs with multipathing.
> > >
> > > > https://patchwork.kernel.org/patch/6797111/
> > > >
> > > > > So this is throw-away for now (and I'll get Hannes' patch
> > > > > applied for 4.8-rc3, with the tweak of returning -EBADE
> > > > > immediately):
> > > >
> > > > Unfortunately, I'm _not_ staging Hannes' patch until I have
> > > > James Bottomley's Ack (given his original issues with the patch
> > > > haven't been explained away AFAICT).
> > >
> > > I've added James to the Cc. His argument was that the old
> > > behavior could be implemented to use some non-standard use of
> > > reservations without a specific example. I don't really think
> > > his example even is practical - once we use dm-mpath it
> > > exclusively claims the underling block devices, so any sort of
> > > selective reservations would have had to happen before even
> > > starting dm-multipath.
> >
> > Well, now that you've made me reread the thread from 14 months ago
> > that wasn't quite my objection. The objection hinged on the fact
> > that anything that uses path specific reservations would now fail
> > instead of retrying on a different path. I thought the IBM SVC did
> > this and Hannes implied he'd be able to check this ... did anyone
> > check? If we've checked and there's no issue with the SVC, then I
> > don't have any other objections.
> >
> > > So a dynamic SAN controller would have to tear down and rebuild
> > > the dm-multipath setup at all the time.
> >
> > That was the job of the SVC: it sat in the middle of the SAN and
> > controlled which node saw what storage.
> >
> > https://www.ibm.com/support/knowledgecenter/STPVGU/com.ibm.storage.
> > svc.console.720.doc/svc_svcovr_1bcfiq.html
> >
> > The SVC can issue its own reservations in those circumstances.
> > What I'm not at all clear on is whether they'll interact badly
> > with the dm-mp reservations.
> >
> In the end SVC is (for us) just another storage array.
> If and what SVC does in the background is of no interest to us.
How can that be true? It sits *on* the san and manages devices, it
doesn't sit between the initators and the devices. It applies
reservations to devices under management, but every node usually sees
everything else, so devices under SVC management are visible to all
initators unless you zone them off.
The last SVC manual I saw included a procedure for manually releasing
stuck SVC reservations from an initator, which illustrates the
expectation.
> OTOH I'd be very surprised if the SVC would be allowing us to see
> remnants of its internal working (like persistent reservation
> errors); in doing so third-party applications would be able to see
> and possibly modify these persistent reservations and the SVC would
> find itself in a very fragile operating scenario.
Because unless you zone the fibre, that's precisely what you do see.
> Also interactions with GPFS (which uses it's own set of reservations)
> will become very tricky.
>
> So I sincerely doubt we'll ever see SVC-originated persistent
> reservations errors.
>
> And as a side-note, this particular patch is included in SLES since
> 2011. With no noticeable side-effect.
OK, so can you actually say that someone has tested this scenario? If
not, do you have the capacity to test it?
James
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: dm-mpath: always return reservation conflict
2016-09-27 18:50 ` James Bottomley
@ 2016-09-29 15:01 ` Mike Snitzer
2016-09-29 15:35 ` Christoph Hellwig
2016-09-30 0:55 ` James Bottomley
0 siblings, 2 replies; 20+ messages in thread
From: Mike Snitzer @ 2016-09-29 15:01 UTC (permalink / raw)
To: James Bottomley
Cc: Hannes Reinecke, Christoph Hellwig, dm-devel, Christoph Hellwig,
linux-scsi, Hannes Reinecke
On Tue, Sep 27 2016 at 2:50pm -0400,
James Bottomley <James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2016-09-27 at 08:34 +0200, Hannes Reinecke wrote:
> > On 09/26/2016 09:06 PM, James Bottomley wrote:
> > > On Mon, 2016-09-26 at 09:52 -0700, Christoph Hellwig wrote:
> > > > Getting back to this after Hannes recovered from his vacation
> > > > and I had a chat with him..
> > > >
> > > > On Mon, Aug 15, 2016 at 09:40:39AM -0400, Mike Snitzer wrote:
> > > > > Seems we still need a more sophisticated approach. But I'm
> > > > > left wondering: if we didn't do it would anything notice?
> > > > > Sadly, the same big question from the original thread from a
> > > > > year ago:
> > > >
> > > > Yes. I have a customer looking to push the pNFS SCSI layout into
> > > > a product, and the major show stopper right now is that we can
> > > > trivially get into failver loops without this (or and equivalent)
> > > > fix.
> > > >
> > > > A year ago SCSI layout was still work in progress in the IETF,
> > > > people use the similar block layout instead that doesn't use
> > > > PRs and we also didn't have the in-kernel PR API, so you
> > > > effectively couldn't use PRs with multipathing.
> > > >
> > > > > https://patchwork.kernel.org/patch/6797111/
> > > > >
> > > > > > So this is throw-away for now (and I'll get Hannes' patch
> > > > > > applied for 4.8-rc3, with the tweak of returning -EBADE
> > > > > > immediately):
> > > > >
> > > > > Unfortunately, I'm _not_ staging Hannes' patch until I have
> > > > > James Bottomley's Ack (given his original issues with the patch
> > > > > haven't been explained away AFAICT).
> > > >
> > > > I've added James to the Cc. His argument was that the old
> > > > behavior could be implemented to use some non-standard use of
> > > > reservations without a specific example. I don't really think
> > > > his example even is practical - once we use dm-mpath it
> > > > exclusively claims the underling block devices, so any sort of
> > > > selective reservations would have had to happen before even
> > > > starting dm-multipath.
> > >
> > > Well, now that you've made me reread the thread from 14 months ago
> > > that wasn't quite my objection. The objection hinged on the fact
> > > that anything that uses path specific reservations would now fail
> > > instead of retrying on a different path. I thought the IBM SVC did
> > > this and Hannes implied he'd be able to check this ... did anyone
> > > check? If we've checked and there's no issue with the SVC, then I
> > > don't have any other objections.
> > >
> > > > So a dynamic SAN controller would have to tear down and rebuild
> > > > the dm-multipath setup at all the time.
> > >
> > > That was the job of the SVC: it sat in the middle of the SAN and
> > > controlled which node saw what storage.
> > >
> > > https://www.ibm.com/support/knowledgecenter/STPVGU/com.ibm.storage.
> > > svc.console.720.doc/svc_svcovr_1bcfiq.html
> > >
> > > The SVC can issue its own reservations in those circumstances.
> > > What I'm not at all clear on is whether they'll interact badly
> > > with the dm-mp reservations.
> > >
> > In the end SVC is (for us) just another storage array.
> > If and what SVC does in the background is of no interest to us.
>
> How can that be true? It sits *on* the san and manages devices, it
> doesn't sit between the initators and the devices. It applies
> reservations to devices under management, but every node usually sees
> everything else, so devices under SVC management are visible to all
> initators unless you zone them off.
>
> The last SVC manual I saw included a procedure for manually releasing
> stuck SVC reservations from an initator, which illustrates the
> expectation.
>
> > OTOH I'd be very surprised if the SVC would be allowing us to see
> > remnants of its internal working (like persistent reservation
> > errors); in doing so third-party applications would be able to see
> > and possibly modify these persistent reservations and the SVC would
> > find itself in a very fragile operating scenario.
>
> Because unless you zone the fibre, that's precisely what you do see.
>
> > Also interactions with GPFS (which uses it's own set of reservations)
> > will become very tricky.
> >
> > So I sincerely doubt we'll ever see SVC-originated persistent
> > reservations errors.
> >
> > And as a side-note, this particular patch is included in SLES since
> > 2011. With no noticeable side-effect.
>
> OK, so can you actually say that someone has tested this scenario? If
> not, do you have the capacity to test it?
I've elected to just take this change for 4.9. Please see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.9&id=8ff232c1a819c2e98d85974a3bff0b7b8e2970ed
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: dm-mpath: always return reservation conflict
2016-09-29 15:01 ` Mike Snitzer
@ 2016-09-29 15:35 ` Christoph Hellwig
2016-09-30 0:55 ` James Bottomley
1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-09-29 15:35 UTC (permalink / raw)
To: Mike Snitzer
Cc: James Bottomley, Hannes Reinecke, Christoph Hellwig, dm-devel,
Christoph Hellwig, linux-scsi, Hannes Reinecke
On Thu, Sep 29, 2016 at 11:01:33AM -0400, Mike Snitzer wrote:
> I've elected to just take this change for 4.9. Please see:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.9&id=8ff232c1a819c2e98d85974a3bff0b7b8e2970ed
Thanks Mike.
If any problems show up I will send you an incremental patch that limits
this behavior to devices where we created reservations using the kernel
pr_ops API.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: dm-mpath: always return reservation conflict
2016-09-29 15:01 ` Mike Snitzer
2016-09-29 15:35 ` Christoph Hellwig
@ 2016-09-30 0:55 ` James Bottomley
1 sibling, 0 replies; 20+ messages in thread
From: James Bottomley @ 2016-09-30 0:55 UTC (permalink / raw)
To: Mike Snitzer
Cc: Hannes Reinecke, Christoph Hellwig, dm-devel, Christoph Hellwig,
linux-scsi, Hannes Reinecke
On Thu, 2016-09-29 at 11:01 -0400, Mike Snitzer wrote:
> On Tue, Sep 27 2016 at 2:50pm -0400,
> James Bottomley <James.Bottomley@hansenpartnership.com> wrote:
>
> > On Tue, 2016-09-27 at 08:34 +0200, Hannes Reinecke wrote:
> > > On 09/26/2016 09:06 PM, James Bottomley wrote:
> > > > On Mon, 2016-09-26 at 09:52 -0700, Christoph Hellwig wrote:
> > > > > Getting back to this after Hannes recovered from his vacation
> > > > > and I had a chat with him..
> > > > >
> > > > > On Mon, Aug 15, 2016 at 09:40:39AM -0400, Mike Snitzer wrote:
> > > > > > Seems we still need a more sophisticated approach. But I'm
> > > > > > left wondering: if we didn't do it would anything notice?
> > > > > > Sadly, the same big question from the original thread from
> > > > > > a
> > > > > > year ago:
> > > > >
> > > > > Yes. I have a customer looking to push the pNFS SCSI layout
> > > > > into
> > > > > a product, and the major show stopper right now is that we
> > > > > can
> > > > > trivially get into failver loops without this (or and
> > > > > equivalent)
> > > > > fix.
> > > > >
> > > > > A year ago SCSI layout was still work in progress in the
> > > > > IETF,
> > > > > people use the similar block layout instead that doesn't use
> > > > > PRs and we also didn't have the in-kernel PR API, so you
> > > > > effectively couldn't use PRs with multipathing.
> > > > >
> > > > > > https://patchwork.kernel.org/patch/6797111/
> > > > > >
> > > > > > > So this is throw-away for now (and I'll get Hannes' patch
> > > > > > > applied for 4.8-rc3, with the tweak of returning -EBADE
> > > > > > > immediately):
> > > > > >
> > > > > > Unfortunately, I'm _not_ staging Hannes' patch until I have
> > > > > > James Bottomley's Ack (given his original issues with the
> > > > > > patch
> > > > > > haven't been explained away AFAICT).
> > > > >
> > > > > I've added James to the Cc. His argument was that the old
> > > > > behavior could be implemented to use some non-standard use of
> > > > > reservations without a specific example. I don't really
> > > > > think
> > > > > his example even is practical - once we use dm-mpath it
> > > > > exclusively claims the underling block devices, so any sort
> > > > > of
> > > > > selective reservations would have had to happen before even
> > > > > starting dm-multipath.
> > > >
> > > > Well, now that you've made me reread the thread from 14 months
> > > > ago
> > > > that wasn't quite my objection. The objection hinged on the
> > > > fact
> > > > that anything that uses path specific reservations would now
> > > > fail
> > > > instead of retrying on a different path. I thought the IBM SVC
> > > > did
> > > > this and Hannes implied he'd be able to check this ... did
> > > > anyone
> > > > check? If we've checked and there's no issue with the SVC,
> > > > then I
> > > > don't have any other objections.
> > > >
> > > > > So a dynamic SAN controller would have to tear down and
> > > > > rebuild
> > > > > the dm-multipath setup at all the time.
> > > >
> > > > That was the job of the SVC: it sat in the middle of the SAN
> > > > and
> > > > controlled which node saw what storage.
> > > >
> > > > https://www.ibm.com/support/knowledgecenter/STPVGU/com.ibm.stor
> > > > age.
> > > > svc.console.720.doc/svc_svcovr_1bcfiq.html
> > > >
> > > > The SVC can issue its own reservations in those circumstances.
> > > > What I'm not at all clear on is whether they'll interact badly
> > > > with the dm-mp reservations.
> > > >
> > > In the end SVC is (for us) just another storage array.
> > > If and what SVC does in the background is of no interest to us.
> >
> > How can that be true? It sits *on* the san and manages devices, it
> > doesn't sit between the initators and the devices. It applies
> > reservations to devices under management, but every node usually
> > sees
> > everything else, so devices under SVC management are visible to all
> > initators unless you zone them off.
> >
> > The last SVC manual I saw included a procedure for manually
> > releasing
> > stuck SVC reservations from an initator, which illustrates the
> > expectation.
> >
> > > OTOH I'd be very surprised if the SVC would be allowing us to see
> > > remnants of its internal working (like persistent reservation
> > > errors); in doing so third-party applications would be able to
> > > see
> > > and possibly modify these persistent reservations and the SVC
> > > would
> > > find itself in a very fragile operating scenario.
> >
> > Because unless you zone the fibre, that's precisely what you do
> > see.
> >
> > > Also interactions with GPFS (which uses it's own set of
> > > reservations)
> > > will become very tricky.
> > >
> > > So I sincerely doubt we'll ever see SVC-originated persistent
> > > reservations errors.
> > >
> > > And as a side-note, this particular patch is included in SLES
> > > since
> > > 2011. With no noticeable side-effect.
> >
> > OK, so can you actually say that someone has tested this scenario?
> > If not, do you have the capacity to test it?
>
> I've elected to just take this change for 4.9. Please see:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.g
> it/commit/?h=dm-4.9&id=8ff232c1a819c2e98d85974a3bff0b7b8e2970ed
That's fine. I think the answer is that SVC technology is not around
much so it can't be tested, so I was going to dump it on you to make
the call anyway ...
James
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] dm-mpath: always return reservation conflict
@ 2015-07-15 11:23 Hannes Reinecke
2015-07-15 11:35 ` James Bottomley
0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2015-07-15 11:23 UTC (permalink / raw)
To: Mike Snitzer
Cc: Christoph Hellwig, dm-devel, linux-scsi, James Bottomley,
Hannes Reinecke
If dm-mpath encounters an reservation conflict it should not
fail the path (as communication with the target is not affected)
but should rather retry on another path.
However, in doing so we might be inducing a ping-pong between
paths, with no guarantee of any forward progress.
And arguably a reservation conflict is an unexpected error,
so we should be passing it upwards to allow the application
to take appropriate steps.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/md/dm-mpath.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 5a67671..e65d266 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1269,7 +1269,16 @@ static int do_end_io(struct multipath *m, struct request *clone,
if (noretry_error(error))
return error;
- if (mpio->pgpath)
+ /*
+ * EBADE signals an reservation conflict.
+ * We shouldn't fail the path here as we can communicate with
+ * the target. We should failover to the next path, but in
+ * doing so we might be causing a ping-pong between paths.
+ * So just return the reservation conflict error.
+ */
+ if (error == -EBADE)
+ r = error;
+ else if (mpio->pgpath)
fail_path(mpio->pgpath);
spin_lock_irqsave(&m->lock, flags);
@@ -1277,9 +1286,6 @@ static int do_end_io(struct multipath *m, struct request *clone,
if (!m->queue_if_no_path) {
if (!__must_push_back(m))
r = -EIO;
- } else {
- if (error == -EBADE)
- r = error;
}
}
spin_unlock_irqrestore(&m->lock, flags);
--
1.8.5.2
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] dm-mpath: always return reservation conflict
2015-07-15 11:23 [PATCH] " Hannes Reinecke
@ 2015-07-15 11:35 ` James Bottomley
2015-07-15 11:52 ` Hannes Reinecke
2015-07-16 5:07 ` Christophe Varoqui
0 siblings, 2 replies; 20+ messages in thread
From: James Bottomley @ 2015-07-15 11:35 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Mike Snitzer, Christoph Hellwig, dm-devel, linux-scsi
On Wed, 2015-07-15 at 13:23 +0200, Hannes Reinecke wrote:
> If dm-mpath encounters an reservation conflict it should not
> fail the path (as communication with the target is not affected)
> but should rather retry on another path.
> However, in doing so we might be inducing a ping-pong between
> paths, with no guarantee of any forward progress.
> And arguably a reservation conflict is an unexpected error,
> so we should be passing it upwards to allow the application
> to take appropriate steps.
If I interpret the code correctly, you've changed the behaviour from the
current try all paths and fail them, ultimately passing the reservation
conflict up if all paths fail to return reservation conflict
immediately, keeping all paths running. This assumes that the
reservation isn't path specific because if we encounter a path specific
reservation, you've altered the behaviour from route around to fail.
The case I think the original code was for is SAN Volume controllers
which use path specific SCSI-3 reservations effectively to do traffic
control and allow favoured paths. Have you verified that nothing we
encounter in the enterprise uses path specific reservations for
multipath shaping any more?
James
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
> drivers/md/dm-mpath.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 5a67671..e65d266 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -1269,7 +1269,16 @@ static int do_end_io(struct multipath *m, struct request *clone,
> if (noretry_error(error))
> return error;
>
> - if (mpio->pgpath)
> + /*
> + * EBADE signals an reservation conflict.
> + * We shouldn't fail the path here as we can communicate with
> + * the target. We should failover to the next path, but in
> + * doing so we might be causing a ping-pong between paths.
> + * So just return the reservation conflict error.
> + */
> + if (error == -EBADE)
> + r = error;
> + else if (mpio->pgpath)
> fail_path(mpio->pgpath);
>
> spin_lock_irqsave(&m->lock, flags);
> @@ -1277,9 +1286,6 @@ static int do_end_io(struct multipath *m, struct request *clone,
> if (!m->queue_if_no_path) {
> if (!__must_push_back(m))
> r = -EIO;
> - } else {
> - if (error == -EBADE)
> - r = error;
> }
> }
> spin_unlock_irqrestore(&m->lock, flags);
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] dm-mpath: always return reservation conflict
2015-07-15 11:35 ` James Bottomley
@ 2015-07-15 11:52 ` Hannes Reinecke
2015-07-15 11:56 ` Christoph Hellwig
2015-07-15 12:01 ` James Bottomley
2015-07-16 5:07 ` Christophe Varoqui
1 sibling, 2 replies; 20+ messages in thread
From: Hannes Reinecke @ 2015-07-15 11:52 UTC (permalink / raw)
To: James Bottomley; +Cc: Mike Snitzer, Christoph Hellwig, dm-devel, linux-scsi
On 07/15/2015 01:35 PM, James Bottomley wrote:
> On Wed, 2015-07-15 at 13:23 +0200, Hannes Reinecke wrote:
>> If dm-mpath encounters an reservation conflict it should not
>> fail the path (as communication with the target is not affected)
>> but should rather retry on another path.
>> However, in doing so we might be inducing a ping-pong between
>> paths, with no guarantee of any forward progress.
>> And arguably a reservation conflict is an unexpected error,
>> so we should be passing it upwards to allow the application
>> to take appropriate steps.
>
> If I interpret the code correctly, you've changed the behaviour from the
> current try all paths and fail them, ultimately passing the reservation
> conflict up if all paths fail to return reservation conflict
> immediately, keeping all paths running. This assumes that the
> reservation isn't path specific because if we encounter a path specific
> reservation, you've altered the behaviour from route around to fail.
>
That is correct.
As mentioned in the path, the 'correct' solution would be to retry
the offending I/O on another path.
However, the current multipath design doesn't allow us to do that
without failing the path first.
If we were just retrying I/O on another path without failing the
path first (and all paths would return a reservation conflict) we
wouldn't know when we've exhausted all paths.
> The case I think the original code was for is SAN Volume controllers
> which use path specific SCSI-3 reservations effectively to do traffic
> control and allow favoured paths. Have you verified that nothing we
> encounter in the enterprise uses path specific reservations for
> multipath shaping any more?
>
Ah. That was some input I was looking for.
With that patch I've assumed that persistent reservations are done
primarily from userland / filesystem, where the reservation would
effectively be done on a per-LUN basis.
If it's being used from the storage array internally this is a
different matter.
(Although I'd be very interested how this behaviour would play
together with applications which use persistent reservations
internally; GPFS springs to mind here ...)
But apparently this specific behaviour wasn't seen that often in the
field; I certainly never got any customer reports about mysteriously
failing paths.
Anyway. I'll see if I can come up with something to restore the
original behaviour.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] dm-mpath: always return reservation conflict
2015-07-15 11:52 ` Hannes Reinecke
@ 2015-07-15 11:56 ` Christoph Hellwig
2015-07-15 12:02 ` Hannes Reinecke
2015-07-15 12:01 ` James Bottomley
1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2015-07-15 11:56 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: James Bottomley, Mike Snitzer, dm-devel, linux-scsi
An array can't issue a reservation, the initiator needs to register
it. Right now the only way to do it is through SG_IO passthrough,
which is a best luck effort it I/O isn't also using SG_IO and can't
be properly supported because of that.
However I will submit an in-kernel reservation API soon which will
allow us to have that sort of control. My current prototyp only allows
for all-path reservations as I couldn't come up with a use case for
per-path reservations, but if such a need should arise we can add it
and take that into account in the multipathing code.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] dm-mpath: always return reservation conflict
2015-07-15 11:56 ` Christoph Hellwig
@ 2015-07-15 12:02 ` Hannes Reinecke
0 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2015-07-15 12:02 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: James Bottomley, Mike Snitzer, dm-devel, linux-scsi
On 07/15/2015 01:56 PM, Christoph Hellwig wrote:
> An array can't issue a reservation, the initiator needs to register
> it. Right now the only way to do it is through SG_IO passthrough,
> which is a best luck effort it I/O isn't also using SG_IO and can't
> be properly supported because of that.
>
> However I will submit an in-kernel reservation API soon which will
> allow us to have that sort of control. My current prototyp only allows
> for all-path reservations as I couldn't come up with a use case for
> per-path reservations, but if such a need should arise we can add it
> and take that into account in the multipathing code.
>
Which was my reasoning as well.
I would consider a per-path reservation in a multipath setup an
error, as the current multipath code is not able to handle this.
With the current code we will fail a path due to the reservation
conflict error, but whatever happens next depends on the type of
reservation and the used prioritizer/path checker.
It can be everything from 'just working' to recurrent path drops to
and I/O stall (as SET TARGET PORT GROUPS might return an reservation
conflict, too, so we wouldn't be able to switch to a working path...)
And implementing a per-path reservation in multipath is far from
trivial, so I'd rather not attempt this.
_Especially_ not as you're working on a in-kernel reservation code.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] dm-mpath: always return reservation conflict
2015-07-15 11:52 ` Hannes Reinecke
2015-07-15 11:56 ` Christoph Hellwig
@ 2015-07-15 12:01 ` James Bottomley
2015-07-15 12:15 ` Hannes Reinecke
1 sibling, 1 reply; 20+ messages in thread
From: James Bottomley @ 2015-07-15 12:01 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Mike Snitzer, Christoph Hellwig, dm-devel, linux-scsi
On Wed, 2015-07-15 at 13:52 +0200, Hannes Reinecke wrote:
> On 07/15/2015 01:35 PM, James Bottomley wrote:
> > On Wed, 2015-07-15 at 13:23 +0200, Hannes Reinecke wrote:
> >> If dm-mpath encounters an reservation conflict it should not
> >> fail the path (as communication with the target is not affected)
> >> but should rather retry on another path.
> >> However, in doing so we might be inducing a ping-pong between
> >> paths, with no guarantee of any forward progress.
> >> And arguably a reservation conflict is an unexpected error,
> >> so we should be passing it upwards to allow the application
> >> to take appropriate steps.
> >
> > If I interpret the code correctly, you've changed the behaviour from the
> > current try all paths and fail them, ultimately passing the reservation
> > conflict up if all paths fail to return reservation conflict
> > immediately, keeping all paths running. This assumes that the
> > reservation isn't path specific because if we encounter a path specific
> > reservation, you've altered the behaviour from route around to fail.
> >
> That is correct.
> As mentioned in the path, the 'correct' solution would be to retry
> the offending I/O on another path.
> However, the current multipath design doesn't allow us to do that
> without failing the path first.
> If we were just retrying I/O on another path without failing the
> path first (and all paths would return a reservation conflict) we
> wouldn't know when we've exhausted all paths.
>
> > The case I think the original code was for is SAN Volume controllers
> > which use path specific SCSI-3 reservations effectively to do traffic
> > control and allow favoured paths. Have you verified that nothing we
> > encounter in the enterprise uses path specific reservations for
> > multipath shaping any more?
> >
> Ah. That was some input I was looking for.
> With that patch I've assumed that persistent reservations are done
> primarily from userland / filesystem, where the reservation would
> effectively be done on a per-LUN basis.
> If it's being used from the storage array internally this is a
> different matter.
> (Although I'd be very interested how this behaviour would play
> together with applications which use persistent reservations
> internally; GPFS springs to mind here ...)
>
> But apparently this specific behaviour wasn't seen that often in the
> field; I certainly never got any customer reports about mysteriously
> failing paths.
Have you already got this patch in SLES, if so, for how long?
> Anyway. I'll see if I can come up with something to restore the
> original behaviour.
Or a way of verifying that nothing in the current enterprise uses path
specific reservations ... we can change the current behaviour as long
as nothing notices.
James
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] dm-mpath: always return reservation conflict
2015-07-15 12:01 ` James Bottomley
@ 2015-07-15 12:15 ` Hannes Reinecke
0 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2015-07-15 12:15 UTC (permalink / raw)
To: James Bottomley; +Cc: Mike Snitzer, Christoph Hellwig, dm-devel, linux-scsi
On 07/15/2015 02:01 PM, James Bottomley wrote:
> On Wed, 2015-07-15 at 13:52 +0200, Hannes Reinecke wrote:
>> On 07/15/2015 01:35 PM, James Bottomley wrote:
>>> On Wed, 2015-07-15 at 13:23 +0200, Hannes Reinecke wrote:
>>>> If dm-mpath encounters an reservation conflict it should not
>>>> fail the path (as communication with the target is not affected)
>>>> but should rather retry on another path.
>>>> However, in doing so we might be inducing a ping-pong between
>>>> paths, with no guarantee of any forward progress.
>>>> And arguably a reservation conflict is an unexpected error,
>>>> so we should be passing it upwards to allow the application
>>>> to take appropriate steps.
>>>
>>> If I interpret the code correctly, you've changed the behaviour from the
>>> current try all paths and fail them, ultimately passing the reservation
>>> conflict up if all paths fail to return reservation conflict
>>> immediately, keeping all paths running. This assumes that the
>>> reservation isn't path specific because if we encounter a path specific
>>> reservation, you've altered the behaviour from route around to fail.
>>>
>> That is correct.
>> As mentioned in the path, the 'correct' solution would be to retry
>> the offending I/O on another path.
>> However, the current multipath design doesn't allow us to do that
>> without failing the path first.
>> If we were just retrying I/O on another path without failing the
>> path first (and all paths would return a reservation conflict) we
>> wouldn't know when we've exhausted all paths.
>>
>>> The case I think the original code was for is SAN Volume controllers
>>> which use path specific SCSI-3 reservations effectively to do traffic
>>> control and allow favoured paths. Have you verified that nothing we
>>> encounter in the enterprise uses path specific reservations for
>>> multipath shaping any more?
>>>
>> Ah. That was some input I was looking for.
>> With that patch I've assumed that persistent reservations are done
>> primarily from userland / filesystem, where the reservation would
>> effectively be done on a per-LUN basis.
>> If it's being used from the storage array internally this is a
>> different matter.
>> (Although I'd be very interested how this behaviour would play
>> together with applications which use persistent reservations
>> internally; GPFS springs to mind here ...)
>>
>> But apparently this specific behaviour wasn't seen that often in the
>> field; I certainly never got any customer reports about mysteriously
>> failing paths.
>
> Have you already got this patch in SLES, if so, for how long?
>
We haven't as of yet; I've come across this behaviour due to another
issue. And before I were to put this into SLES I thought I should be
asking those in the know ... persistent reservations _is_ an arcane
topic, after all.
I was just referring to the fact that I rarely got customer issues
with persistent reservations; and those I get tend to be tape-centric.
>> Anyway. I'll see if I can come up with something to restore the
>> original behaviour.
>
> Or a way of verifying that nothing in the current enterprise uses path
> specific reservations ... we can change the current behaviour as long
> as nothing notices.
>
The only instance I know of is GPFS; someone in our company once
wrote an HA agent using persistent reservations, but I'm not sure if
it's deployed anywhere. But that agent is certainly aware of
multipathing, and doesn't issue per-path reservations.
(Well, actually it does, but it does it for every path :-)
I would think the same goes for GPFS.
Incidentally, the SVC docs have a section about persistent
reservations, but do not mention anything about internal use.
So if it does it'll be opaque to the user, otherwise I would assume
it to be mentioned there.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] dm-mpath: always return reservation conflict
2015-07-15 11:35 ` James Bottomley
2015-07-15 11:52 ` Hannes Reinecke
@ 2015-07-16 5:07 ` Christophe Varoqui
1 sibling, 0 replies; 20+ messages in thread
From: Christophe Varoqui @ 2015-07-16 5:07 UTC (permalink / raw)
To: device-mapper development; +Cc: linux-scsi, Christoph Hellwig, Mike Snitzer
[-- Attachment #1.1: Type: text/plain, Size: 3526 bytes --]
If your patch implement a direct path-to-multipath conflict error
propagation, won't the common scenario where admins add new (initialy
unregistered) paths to a reserved multipath result in application-visible
i/o errors ? If so, those admins might hold us a grudge :/
For reference the opensvc crm does use type 5 pr, and aims for all paths
registered. It still does not make use of the multipathd pr janitoring
features, and uses sg_persist directly for pr status and actions.
Best regards,
Christophe Varoqui
OpenSVC
Regards,
Christophe Varoqui
OpenSVC
On Wed, Jul 15, 2015 at 1:35 PM, James Bottomley <
James.Bottomley@hansenpartnership.com> wrote:
> On Wed, 2015-07-15 at 13:23 +0200, Hannes Reinecke wrote:
> > If dm-mpath encounters an reservation conflict it should not
> > fail the path (as communication with the target is not affected)
> > but should rather retry on another path.
> > However, in doing so we might be inducing a ping-pong between
> > paths, with no guarantee of any forward progress.
> > And arguably a reservation conflict is an unexpected error,
> > so we should be passing it upwards to allow the application
> > to take appropriate steps.
>
> If I interpret the code correctly, you've changed the behaviour from the
> current try all paths and fail them, ultimately passing the reservation
> conflict up if all paths fail to return reservation conflict
> immediately, keeping all paths running. This assumes that the
> reservation isn't path specific because if we encounter a path specific
> reservation, you've altered the behaviour from route around to fail.
>
> The case I think the original code was for is SAN Volume controllers
> which use path specific SCSI-3 reservations effectively to do traffic
> control and allow favoured paths. Have you verified that nothing we
> encounter in the enterprise uses path specific reservations for
> multipath shaping any more?
>
> James
>
> > Signed-off-by: Hannes Reinecke <hare@suse.de>
> > ---
> > drivers/md/dm-mpath.c | 14 ++++++++++----
> > 1 file changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > index 5a67671..e65d266 100644
> > --- a/drivers/md/dm-mpath.c
> > +++ b/drivers/md/dm-mpath.c
> > @@ -1269,7 +1269,16 @@ static int do_end_io(struct multipath *m, struct
> request *clone,
> > if (noretry_error(error))
> > return error;
> >
> > - if (mpio->pgpath)
> > + /*
> > + * EBADE signals an reservation conflict.
> > + * We shouldn't fail the path here as we can communicate with
> > + * the target. We should failover to the next path, but in
> > + * doing so we might be causing a ping-pong between paths.
> > + * So just return the reservation conflict error.
> > + */
> > + if (error == -EBADE)
> > + r = error;
> > + else if (mpio->pgpath)
> > fail_path(mpio->pgpath);
> >
> > spin_lock_irqsave(&m->lock, flags);
> > @@ -1277,9 +1286,6 @@ static int do_end_io(struct multipath *m, struct
> request *clone,
> > if (!m->queue_if_no_path) {
> > if (!__must_push_back(m))
> > r = -EIO;
> > - } else {
> > - if (error == -EBADE)
> > - r = error;
> > }
> > }
> > spin_unlock_irqrestore(&m->lock, flags);
>
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
[-- Attachment #1.2: Type: text/html, Size: 4779 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2016-09-30 0:55 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-02 12:36 [PING / RESEND] handling reservation conflicts in dm-mpath Christoph Hellwig
2016-08-02 12:36 ` [PATCH] dm-mpath: always return reservation conflict Christoph Hellwig
2016-08-11 18:38 ` [dm-devel] " Christoph Hellwig
2016-08-15 13:08 ` Mike Snitzer
2016-08-15 13:40 ` Mike Snitzer
2016-09-26 16:52 ` [dm-devel] " Christoph Hellwig
2016-09-26 19:06 ` James Bottomley
2016-09-27 6:34 ` Hannes Reinecke
2016-09-27 18:50 ` James Bottomley
2016-09-29 15:01 ` Mike Snitzer
2016-09-29 15:35 ` Christoph Hellwig
2016-09-30 0:55 ` James Bottomley
-- strict thread matches above, loose matches on Subject: below --
2015-07-15 11:23 [PATCH] " Hannes Reinecke
2015-07-15 11:35 ` James Bottomley
2015-07-15 11:52 ` Hannes Reinecke
2015-07-15 11:56 ` Christoph Hellwig
2015-07-15 12:02 ` Hannes Reinecke
2015-07-15 12:01 ` James Bottomley
2015-07-15 12:15 ` Hannes Reinecke
2015-07-16 5:07 ` Christophe Varoqui
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.