All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: dm-mpath: do not fail paths when the target returns ALUA state transition
@ 2021-07-15 16:57 Brian Bunker
  2021-07-15 17:01 ` Martin Wilck
  2021-07-16  6:27 ` Hannes Reinecke
  0 siblings, 2 replies; 6+ messages in thread
From: Brian Bunker @ 2021-07-15 16:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: Martin Wilck, Hannes Reinecke

When paths return an ALUA state transition, do not fail those paths.
The expectation is that the transition is short lived until the new
ALUA state is entered. There might not be other paths in an online
state to serve the request which can lead to an unexpected I/O error
on the multipath device.

Signed-off-by: Brian Bunker <brian@purestorage.com>
Acked-by: Krishna Kant <krishna.kant@purestorage.com>
Acked-by: Seamus Connor <sconnor@purestorage.com>
--
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index bced42f082b0..28948cc481f9 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1652,12 +1652,12 @@ static int multipath_end_io(struct dm_target
*ti, struct request *clone,
        if (error && blk_path_error(error)) {
                struct multipath *m = ti->private;

-               if (error == BLK_STS_RESOURCE)
+               if (error == BLK_STS_RESOURCE || error == BLK_STS_AGAIN)
                        r = DM_ENDIO_DELAY_REQUEUE;
                else
                        r = DM_ENDIO_REQUEUE;

-               if (pgpath)
+               if (pgpath && (error != BLK_STS_AGAIN))
                        fail_path(pgpath);

                if (!atomic_read(&m->nr_valid_paths) &&
--

Thanks,
Brian

Brian Bunker
PURE Storage, Inc.
brian@purestorage.com

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: dm-mpath: do not fail paths when the target returns ALUA state transition
  2021-07-15 16:57 [PATCH] scsi: dm-mpath: do not fail paths when the target returns ALUA state transition Brian Bunker
@ 2021-07-15 17:01 ` Martin Wilck
  2021-07-16  6:27 ` Hannes Reinecke
  1 sibling, 0 replies; 6+ messages in thread
From: Martin Wilck @ 2021-07-15 17:01 UTC (permalink / raw)
  To: Brian Bunker, linux-scsi; +Cc: Hannes Reinecke

On Do, 2021-07-15 at 09:57 -0700, Brian Bunker wrote:
> When paths return an ALUA state transition, do not fail those paths.
> The expectation is that the transition is short lived until the new
> ALUA state is entered. There might not be other paths in an online
> state to serve the request which can lead to an unexpected I/O error
> on the multipath device.
> 
> Signed-off-by: Brian Bunker <brian@purestorage.com>
> Acked-by: Krishna Kant <krishna.kant@purestorage.com>
> Acked-by: Seamus Connor <sconnor@purestorage.com>
> --
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index bced42f082b0..28948cc481f9 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -1652,12 +1652,12 @@ static int multipath_end_io(struct dm_target
> *ti, struct request *clone,
>         if (error && blk_path_error(error)) {
>                 struct multipath *m = ti->private;
> 
> -               if (error == BLK_STS_RESOURCE)
> +               if (error == BLK_STS_RESOURCE || error ==
> BLK_STS_AGAIN)
>                         r = DM_ENDIO_DELAY_REQUEUE;
>                 else
>                         r = DM_ENDIO_REQUEUE;
> 
> -               if (pgpath)
> +               if (pgpath && (error != BLK_STS_AGAIN))
>                         fail_path(pgpath);
> 
>                 if (!atomic_read(&m->nr_valid_paths) &&
> --
> 
> Thanks,
> Brian
> 
> Brian Bunker
> PURE Storage, Inc.
> brian@purestorage.com
> 

Reviewed-by: Martin Wilck <mwilck@suse.com>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: dm-mpath: do not fail paths when the target returns ALUA state transition
  2021-07-15 16:57 [PATCH] scsi: dm-mpath: do not fail paths when the target returns ALUA state transition Brian Bunker
  2021-07-15 17:01 ` Martin Wilck
@ 2021-07-16  6:27 ` Hannes Reinecke
  2021-07-16  8:25   ` Martin Wilck
  1 sibling, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2021-07-16  6:27 UTC (permalink / raw)
  To: Brian Bunker, linux-scsi; +Cc: Martin Wilck, Hannes Reinecke

On 7/15/21 6:57 PM, Brian Bunker wrote:
> When paths return an ALUA state transition, do not fail those paths.
> The expectation is that the transition is short lived until the new
> ALUA state is entered. There might not be other paths in an online
> state to serve the request which can lead to an unexpected I/O error
> on the multipath device.
> 
> Signed-off-by: Brian Bunker <brian@purestorage.com>
> Acked-by: Krishna Kant <krishna.kant@purestorage.com>
> Acked-by: Seamus Connor <sconnor@purestorage.com>
> --
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index bced42f082b0..28948cc481f9 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -1652,12 +1652,12 @@ static int multipath_end_io(struct dm_target
> *ti, struct request *clone,
>          if (error && blk_path_error(error)) {
>                  struct multipath *m = ti->private;
> 
> -               if (error == BLK_STS_RESOURCE)
> +               if (error == BLK_STS_RESOURCE || error == BLK_STS_AGAIN)
>                          r = DM_ENDIO_DELAY_REQUEUE;
>                  else
>                          r = DM_ENDIO_REQUEUE;
> 
> -               if (pgpath)
> +               if (pgpath && (error != BLK_STS_AGAIN))
>                          fail_path(pgpath);
> 
>                  if (!atomic_read(&m->nr_valid_paths) &&
> --

Sorry, but this will lead to regressions during failover for arrays 
taking longer time (some take up to 30 minutes for a complete failover).
And for those it's absolutely crucial to _not_ retry I/O on the paths in 
transitioning.

And you already admitted that 'queue_if_no_path' would resolve this 
problem, so why not update the device configuration in multipath-tools 
to have the correct setting for your array?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: dm-mpath: do not fail paths when the target returns ALUA state transition
  2021-07-16  6:27 ` Hannes Reinecke
@ 2021-07-16  8:25   ` Martin Wilck
  2021-07-16 18:39     ` Brian Bunker
  0 siblings, 1 reply; 6+ messages in thread
From: Martin Wilck @ 2021-07-16  8:25 UTC (permalink / raw)
  To: Hannes Reinecke, Brian Bunker, linux-scsi; +Cc: Hannes Reinecke

Hannes,

On Fr, 2021-07-16 at 08:27 +0200, Hannes Reinecke wrote:
> On 7/15/21 6:57 PM, Brian Bunker wrote:
> > When paths return an ALUA state transition, do not fail those paths.
> > The expectation is that the transition is short lived until the new
> > ALUA state is entered. There might not be other paths in an online
> > state to serve the request which can lead to an unexpected I/O error
> > on the multipath device.
> > 
> > Signed-off-by: Brian Bunker <brian@purestorage.com>
> > Acked-by: Krishna Kant <krishna.kant@purestorage.com>
> > Acked-by: Seamus Connor <sconnor@purestorage.com>
> > --
> > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > index bced42f082b0..28948cc481f9 100644
> > --- a/drivers/md/dm-mpath.c
> > +++ b/drivers/md/dm-mpath.c
> > @@ -1652,12 +1652,12 @@ static int multipath_end_io(struct dm_target
> > *ti, struct request *clone,
> >          if (error && blk_path_error(error)) {
> >                  struct multipath *m = ti->private;
> > 
> > -               if (error == BLK_STS_RESOURCE)
> > +               if (error == BLK_STS_RESOURCE || error ==
> > BLK_STS_AGAIN)
> >                          r = DM_ENDIO_DELAY_REQUEUE;
> >                  else
> >                          r = DM_ENDIO_REQUEUE;
> > 
> > -               if (pgpath)
> > +               if (pgpath && (error != BLK_STS_AGAIN))
> >                          fail_path(pgpath);
> > 
> >                  if (!atomic_read(&m->nr_valid_paths) &&
> > --
> 
> Sorry, but this will lead to regressions during failover for arrays 
> taking longer time (some take up to 30 minutes for a complete
> failover).
> And for those it's absolutely crucial to _not_ retry I/O on the paths
> in 
> transitioning.

This won't happen.

As argued in https://marc.info/?l=linux-scsi&m=162625194203635&w=2,
your previous patches avoid the request being requeued on the SCSI
queue, even with Brian's patch on top. IMO that means that the deadlock
situation analyzed earlier can't occur. If you disagree, please explain
in detail.

By not failing the path, the request can be requeued on the dm level,
and dm can decide to try the transitioning path again. But the request
wouldn't be added to the SCSI queue, because alua_prep_fn() would
prevent that. 

So, in the worst case, dm would retry queueing the request on the
transitioning path over and over again. By adding a delay, we avoid
busy-looping. This has basically the same effect as queueing on the dm
layer in the first place: the request stays queued on the dm level most
of the time. Except for the fact that the queueing would stop earlier:
as soon as the kernel notices that the path is not transitioning any
more. By not failing the dm paths, we don't depend on user space to
reinstate them, which is the right thing to do for a transitioning
state IMO.

> And you already admitted that 'queue_if_no_path' would resolve this 
> problem, so why not update the device configuration in multipath-
> tools 
> to have the correct setting for your array?

I think Brian is right that setting transitioning paths to failed state
in dm is highly questionable.

So far, in dm-multipath, we haven't set paths to failed state because
of ALUA state transitions. We've been mapping ALUA states to priorities
instead. We don't even fail paths in ALUA "unavailable" state, so why
do it for "transitioning"?

Thanks
Martin



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: dm-mpath: do not fail paths when the target returns ALUA state transition
  2021-07-16  8:25   ` Martin Wilck
@ 2021-07-16 18:39     ` Brian Bunker
  2021-08-03 22:09       ` Brian Bunker
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Bunker @ 2021-07-16 18:39 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Hannes Reinecke, linux-scsi, Hannes Reinecke

On Fri, Jul 16, 2021 at 1:25 AM Martin Wilck <mwilck@suse.com> wrote:
>
> Hannes,
>
> On Fr, 2021-07-16 at 08:27 +0200, Hannes Reinecke wrote:
> > On 7/15/21 6:57 PM, Brian Bunker wrote:
> > > When paths return an ALUA state transition, do not fail those paths.
> > > The expectation is that the transition is short lived until the new
> > > ALUA state is entered. There might not be other paths in an online
> > > state to serve the request which can lead to an unexpected I/O error
> > > on the multipath device.
> > >
> > > Signed-off-by: Brian Bunker <brian@purestorage.com>
> > > Acked-by: Krishna Kant <krishna.kant@purestorage.com>
> > > Acked-by: Seamus Connor <sconnor@purestorage.com>
> > > --
> > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > > index bced42f082b0..28948cc481f9 100644
> > > --- a/drivers/md/dm-mpath.c
> > > +++ b/drivers/md/dm-mpath.c
> > > @@ -1652,12 +1652,12 @@ static int multipath_end_io(struct dm_target
> > > *ti, struct request *clone,
> > >          if (error && blk_path_error(error)) {
> > >                  struct multipath *m = ti->private;
> > >
> > > -               if (error == BLK_STS_RESOURCE)
> > > +               if (error == BLK_STS_RESOURCE || error ==
> > > BLK_STS_AGAIN)
> > >                          r = DM_ENDIO_DELAY_REQUEUE;
> > >                  else
> > >                          r = DM_ENDIO_REQUEUE;
> > >
> > > -               if (pgpath)
> > > +               if (pgpath && (error != BLK_STS_AGAIN))
> > >                          fail_path(pgpath);
> > >
> > >                  if (!atomic_read(&m->nr_valid_paths) &&
> > > --
> >
> > Sorry, but this will lead to regressions during failover for arrays
> > taking longer time (some take up to 30 minutes for a complete
> > failover).
> > And for those it's absolutely crucial to _not_ retry I/O on the paths
> > in
> > transitioning.
>
> This won't happen.
>
> As argued in https://marc.info/?l=linux-scsi&m=162625194203635&w=2,
> your previous patches avoid the request being requeued on the SCSI
> queue, even with Brian's patch on top. IMO that means that the deadlock
> situation analyzed earlier can't occur. If you disagree, please explain
> in detail.
>
> By not failing the path, the request can be requeued on the dm level,
> and dm can decide to try the transitioning path again. But the request
> wouldn't be added to the SCSI queue, because alua_prep_fn() would
> prevent that.
>
> So, in the worst case, dm would retry queueing the request on the
> transitioning path over and over again. By adding a delay, we avoid
> busy-looping. This has basically the same effect as queueing on the dm
> layer in the first place: the request stays queued on the dm level most
> of the time. Except for the fact that the queueing would stop earlier:
> as soon as the kernel notices that the path is not transitioning any
> more. By not failing the dm paths, we don't depend on user space to
> reinstate them, which is the right thing to do for a transitioning
> state IMO.
>
> > And you already admitted that 'queue_if_no_path' would resolve this
> > problem, so why not update the device configuration in multipath-
> > tools
> > to have the correct setting for your array?

The reason I don't like queue_if_no_path for a solution is that there
are times we do want to tail all of the paths as quickly as possible
(e.g. a cluster sharing a resource). If we add some non zero value to
no_path_retry we would be forcing that configuration to unnecessarily
wait, and those customers will see this delay as a regression. This is
where a distinction between an ALUA state of standby or unavailable
vs. a transition ALUA state is attractive.

>
> I think Brian is right that setting transitioning paths to failed state
> in dm is highly questionable.
>
> So far, in dm-multipath, we haven't set paths to failed state because
> of ALUA state transitions. We've been mapping ALUA states to priorities
> instead. We don't even fail paths in ALUA "unavailable" state, so why
> do it for "transitioning"?
>
> Thanks
> Martin
>
>

Thanks,
Brian

-- 
Brian Bunker
PURE Storage, Inc.
brian@purestorage.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: dm-mpath: do not fail paths when the target returns ALUA state transition
  2021-07-16 18:39     ` Brian Bunker
@ 2021-08-03 22:09       ` Brian Bunker
  0 siblings, 0 replies; 6+ messages in thread
From: Brian Bunker @ 2021-08-03 22:09 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Hannes Reinecke, linux-scsi, Hannes Reinecke

Martin and Hannes,

Any resolution to this issue?

Thanks,
Brian

On Fri, Jul 16, 2021 at 11:39 AM Brian Bunker <brian@purestorage.com> wrote:
>
> On Fri, Jul 16, 2021 at 1:25 AM Martin Wilck <mwilck@suse.com> wrote:
> >
> > Hannes,
> >
> > On Fr, 2021-07-16 at 08:27 +0200, Hannes Reinecke wrote:
> > > On 7/15/21 6:57 PM, Brian Bunker wrote:
> > > > When paths return an ALUA state transition, do not fail those paths.
> > > > The expectation is that the transition is short lived until the new
> > > > ALUA state is entered. There might not be other paths in an online
> > > > state to serve the request which can lead to an unexpected I/O error
> > > > on the multipath device.
> > > >
> > > > Signed-off-by: Brian Bunker <brian@purestorage.com>
> > > > Acked-by: Krishna Kant <krishna.kant@purestorage.com>
> > > > Acked-by: Seamus Connor <sconnor@purestorage.com>
> > > > --
> > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > > > index bced42f082b0..28948cc481f9 100644
> > > > --- a/drivers/md/dm-mpath.c
> > > > +++ b/drivers/md/dm-mpath.c
> > > > @@ -1652,12 +1652,12 @@ static int multipath_end_io(struct dm_target
> > > > *ti, struct request *clone,
> > > >          if (error && blk_path_error(error)) {
> > > >                  struct multipath *m = ti->private;
> > > >
> > > > -               if (error == BLK_STS_RESOURCE)
> > > > +               if (error == BLK_STS_RESOURCE || error ==
> > > > BLK_STS_AGAIN)
> > > >                          r = DM_ENDIO_DELAY_REQUEUE;
> > > >                  else
> > > >                          r = DM_ENDIO_REQUEUE;
> > > >
> > > > -               if (pgpath)
> > > > +               if (pgpath && (error != BLK_STS_AGAIN))
> > > >                          fail_path(pgpath);
> > > >
> > > >                  if (!atomic_read(&m->nr_valid_paths) &&
> > > > --
> > >
> > > Sorry, but this will lead to regressions during failover for arrays
> > > taking longer time (some take up to 30 minutes for a complete
> > > failover).
> > > And for those it's absolutely crucial to _not_ retry I/O on the paths
> > > in
> > > transitioning.
> >
> > This won't happen.
> >
> > As argued in https://marc.info/?l=linux-scsi&m=162625194203635&w=2,
> > your previous patches avoid the request being requeued on the SCSI
> > queue, even with Brian's patch on top. IMO that means that the deadlock
> > situation analyzed earlier can't occur. If you disagree, please explain
> > in detail.
> >
> > By not failing the path, the request can be requeued on the dm level,
> > and dm can decide to try the transitioning path again. But the request
> > wouldn't be added to the SCSI queue, because alua_prep_fn() would
> > prevent that.
> >
> > So, in the worst case, dm would retry queueing the request on the
> > transitioning path over and over again. By adding a delay, we avoid
> > busy-looping. This has basically the same effect as queueing on the dm
> > layer in the first place: the request stays queued on the dm level most
> > of the time. Except for the fact that the queueing would stop earlier:
> > as soon as the kernel notices that the path is not transitioning any
> > more. By not failing the dm paths, we don't depend on user space to
> > reinstate them, which is the right thing to do for a transitioning
> > state IMO.
> >
> > > And you already admitted that 'queue_if_no_path' would resolve this
> > > problem, so why not update the device configuration in multipath-
> > > tools
> > > to have the correct setting for your array?
>
> The reason I don't like queue_if_no_path for a solution is that there
> are times we do want to tail all of the paths as quickly as possible
> (e.g. a cluster sharing a resource). If we add some non zero value to
> no_path_retry we would be forcing that configuration to unnecessarily
> wait, and those customers will see this delay as a regression. This is
> where a distinction between an ALUA state of standby or unavailable
> vs. a transition ALUA state is attractive.
>
> >
> > I think Brian is right that setting transitioning paths to failed state
> > in dm is highly questionable.
> >
> > So far, in dm-multipath, we haven't set paths to failed state because
> > of ALUA state transitions. We've been mapping ALUA states to priorities
> > instead. We don't even fail paths in ALUA "unavailable" state, so why
> > do it for "transitioning"?
> >
> > Thanks
> > Martin
> >
> >
>
> Thanks,
> Brian
>
> --
> Brian Bunker
> PURE Storage, Inc.
> brian@purestorage.com



-- 
Brian Bunker
PURE Storage, Inc.
brian@purestorage.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-03 22:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-15 16:57 [PATCH] scsi: dm-mpath: do not fail paths when the target returns ALUA state transition Brian Bunker
2021-07-15 17:01 ` Martin Wilck
2021-07-16  6:27 ` Hannes Reinecke
2021-07-16  8:25   ` Martin Wilck
2021-07-16 18:39     ` Brian Bunker
2021-08-03 22:09       ` Brian Bunker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.