All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
@ 2023-02-21 20:56 Brian Bunker
  2023-03-03  7:46 ` Martin Wilck
  2023-03-06 16:54 ` Benjamin Marzinski
  0 siblings, 2 replies; 16+ messages in thread
From: Brian Bunker @ 2023-02-21 20:56 UTC (permalink / raw)
  To: device-mapper development

A test unit ready command sent on a path in standby state will not result
in a failed path. The same should be true for a path in the
transitioning state.

Signed-off-by: Brian Bunker brian@purestorage.com
---
 libmultipath/checkers/tur.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 551dc4f0..fff7987b 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -179,10 +179,11 @@ retry:
                else if (key == 0x2) {
                        /* Not Ready */
                        /* Note: Other ALUA states are either UP or DOWN */
-                       if( asc == 0x04 && ascq == 0x0b){
+                       if (asc == 0x04 &&
+                           (ascq == 0x0b || ascq == 0x0a)) {
                                /*
                                 * LOGICAL UNIT NOT ACCESSIBLE,
-                                * TARGET PORT IN STANDBY STATE
+                                * TARGET PORT IN STANDBY OR TRANSITION STATE
                                 */
                                *msgid = CHECKER_MSGID_GHOST;
                                return PATH_GHOST;
--

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-02-21 20:56 [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition Brian Bunker
@ 2023-03-03  7:46 ` Martin Wilck
  2023-03-03 17:18   ` Brian Bunker
  2023-03-06 16:54 ` Benjamin Marzinski
  1 sibling, 1 reply; 16+ messages in thread
From: Martin Wilck @ 2023-03-03  7:46 UTC (permalink / raw)
  To: dm-devel, Brian Bunker

On Tue, 2023-02-21 at 12:56 -0800, Brian Bunker wrote:
> A test unit ready command sent on a path in standby state will not
> result
> in a failed path. The same should be true for a path in the
> transitioning state.
> 
> Signed-off-by: Brian Bunker brian@purestorage.com

In general, I'm somewhat reluctant to change things in this area.
GHOST state is not clearly defined, unfortunately; it has evolved 
from the original "inq succeeds but tur fails" to a somewhat hand-
waving "something in between failed and up" semantics. 

Internally, multipathd treats GHOST paths almost exactly like UP paths,
except that they get a different priority assigned. If for whatever
reason multipathd switches to a PG in this state, the path state is
highly likely to switch to FAILED soon after, as regular IO will fail
on these paths; at least without explicit ALUA. Is that what we want to
see for TRANSITIONING state?

But yes, this change is in the spirit of 5da642f ("Return 'ghost' state
when port is in standby").

CC'ing Ben for confirmation.

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  libmultipath/checkers/tur.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/libmultipath/checkers/tur.c
> b/libmultipath/checkers/tur.c
> index 551dc4f0..fff7987b 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -179,10 +179,11 @@ retry:
>                 else if (key == 0x2) {
>                         /* Not Ready */
>                         /* Note: Other ALUA states are either UP or
> DOWN */
> -                       if( asc == 0x04 && ascq == 0x0b){
> +                       if (asc == 0x04 &&
> +                           (ascq == 0x0b || ascq == 0x0a)) {
>                                 /*
>                                  * LOGICAL UNIT NOT ACCESSIBLE,
> -                                * TARGET PORT IN STANDBY STATE
> +                                * TARGET PORT IN STANDBY OR
> TRANSITION STATE
>                                  */
>                                 *msgid = CHECKER_MSGID_GHOST;
>                                 return PATH_GHOST;
> --
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-03-03  7:46 ` Martin Wilck
@ 2023-03-03 17:18   ` Brian Bunker
  2023-03-03 20:41     ` Martin Wilck
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Bunker @ 2023-03-03 17:18 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

> On Mar 2, 2023, at 11:46 PM, Martin Wilck <mwilck@suse.com> wrote:
> 
> On Tue, 2023-02-21 at 12:56 -0800, Brian Bunker wrote:
>> A test unit ready command sent on a path in standby state will not
>> result
>> in a failed path. The same should be true for a path in the
>> transitioning state.
>> 
>> Signed-off-by: Brian Bunker brian@purestorage.com
> 
> In general, I'm somewhat reluctant to change things in this area.
> GHOST state is not clearly defined, unfortunately; it has evolved 
> from the original "inq succeeds but tur fails" to a somewhat hand-
> waving "something in between failed and up" semantics. 
> 
> Internally, multipathd treats GHOST paths almost exactly like UP paths,
> except that they get a different priority assigned. If for whatever
> reason multipathd switches to a PG in this state, the path state is
> highly likely to switch to FAILED soon after, as regular IO will fail
> on these paths; at least without explicit ALUA. Is that what we want to
> see for TRANSITIONING state?
> 
> But yes, this change is in the spirit of 5da642f ("Return 'ghost' state
> when port is in standby").
> 
> CC'ing Ben for confirmation.
> 
> Reviewed-by: Martin Wilck <mwilck@suse.com>
Hello Martin, 

Doesn’t "something in between failed and up” semantics describe ALUA state transitioning pretty well?

Thanks,
Brian
> 
> 
> 
>> ---
>>  libmultipath/checkers/tur.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>> 
>> diff --git a/libmultipath/checkers/tur.c
>> b/libmultipath/checkers/tur.c
>> index 551dc4f0..fff7987b 100644
>> --- a/libmultipath/checkers/tur.c
>> +++ b/libmultipath/checkers/tur.c
>> @@ -179,10 +179,11 @@ retry:
>>                 else if (key == 0x2) {
>>                         /* Not Ready */
>>                         /* Note: Other ALUA states are either UP or
>> DOWN */
>> -                       if( asc == 0x04 && ascq == 0x0b){
>> +                       if (asc == 0x04 &&
>> +                           (ascq == 0x0b || ascq == 0x0a)) {
>>                                 /*
>>                                  * LOGICAL UNIT NOT ACCESSIBLE,
>> -                                * TARGET PORT IN STANDBY STATE
>> +                                * TARGET PORT IN STANDBY OR
>> TRANSITION STATE
>>                                  */
>>                                 *msgid = CHECKER_MSGID_GHOST;
>>                                 return PATH_GHOST;
>> --
>> 
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://listman.redhat.com/mailman/listinfo/dm-devel


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-03-03 17:18   ` Brian Bunker
@ 2023-03-03 20:41     ` Martin Wilck
  2023-03-04 20:49       ` Brian Bunker
  0 siblings, 1 reply; 16+ messages in thread
From: Martin Wilck @ 2023-03-03 20:41 UTC (permalink / raw)
  To: Brian Bunker; +Cc: dm-devel

On Fri, 2023-03-03 at 09:18 -0800, Brian Bunker wrote:
> > 
> Hello Martin, 
> 
> Doesn’t "something in between failed and up” semantics describe ALUA
> state transitioning pretty well?

transitioning - yes. standby - ??

Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-03-03 20:41     ` Martin Wilck
@ 2023-03-04 20:49       ` Brian Bunker
  2023-03-06 11:46         ` Martin Wilck
  2023-03-07 10:05         ` Martin Wilck
  0 siblings, 2 replies; 16+ messages in thread
From: Brian Bunker @ 2023-03-04 20:49 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Mar 3, 2023, at 12:41 PM, Martin Wilck <mwilck@suse.com> wrote:
> 
> On Fri, 2023-03-03 at 09:18 -0800, Brian Bunker wrote:
>>> 
>> Hello Martin, 
>> 
>> Doesn’t "something in between failed and up” semantics describe ALUA
>> state transitioning pretty well?
> 
> transitioning - yes. standby - ??
> 
> Martin
> 
The checking for standby is 14 years old, and says that TUR returns
a unit attention when the path is in standby. I am not sure why that
wouldn’t be handled by this code above: I would think there should be
just one unit attention for each I_T_L when an ALUA state change
happens.Not sure how it exceeds the retry count.

if (key == 0x6) {
    /* Unit Attention, retry */
    if (—retry_tur)
        goto retry;
}

From my perspective failing a path for ALUA state standby is reasonable
since it is not an active state.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-03-04 20:49       ` Brian Bunker
@ 2023-03-06 11:46         ` Martin Wilck
  2023-03-06 19:04           ` Benjamin Marzinski
  2023-03-07 10:05         ` Martin Wilck
  1 sibling, 1 reply; 16+ messages in thread
From: Martin Wilck @ 2023-03-06 11:46 UTC (permalink / raw)
  To: Brian Bunker; +Cc: dm-devel

Hi Brian,

On Sat, 2023-03-04 at 12:49 -0800, Brian Bunker wrote:
> > 
> The checking for standby is 14 years old, and says that TUR returns
> a unit attention when the path is in standby. I am not sure why that
> wouldn’t be handled by this code above: I would think there should be
> just one unit attention for each I_T_L when an ALUA state change
> happens.Not sure how it exceeds the retry count.
> 
> if (key == 0x6) {
>     /* Unit Attention, retry */
>     if (—retry_tur)
>         goto retry;
> }
> 
> From my perspective failing a path for ALUA state standby is
> reasonable
> since it is not an active state.

I think the historic rationale for using GHOST is that some storage
arrays, in particular active-passive configurations, may keep certain
port groups in STANDBY state. If STANDBY was classified as FAILED,
"multipath -ll" would show all paths of such port groups as FAILED,
which would confuse users.

That's what I meant before, multipath's GHOST can mean multiple things
depending on the actual hardware in use, explicit/implicit ALUA, etc.
Given that today basically every hardware supports ALUA, we could
probably do better. But changing the semantics in the current situation
is risky and error prone.

Regards
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-02-21 20:56 [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition Brian Bunker
  2023-03-03  7:46 ` Martin Wilck
@ 2023-03-06 16:54 ` Benjamin Marzinski
  1 sibling, 0 replies; 16+ messages in thread
From: Benjamin Marzinski @ 2023-03-06 16:54 UTC (permalink / raw)
  To: Brian Bunker; +Cc: device-mapper development

On Tue, Feb 21, 2023 at 12:56:43PM -0800, Brian Bunker wrote:
> A test unit ready command sent on a path in standby state will not result
> in a failed path. The same should be true for a path in the
> transitioning state.
> 
> Signed-off-by: Brian Bunker brian@purestorage.com

Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>

> ---
>  libmultipath/checkers/tur.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
> index 551dc4f0..fff7987b 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -179,10 +179,11 @@ retry:
>                 else if (key == 0x2) {
>                         /* Not Ready */
>                         /* Note: Other ALUA states are either UP or DOWN */
> -                       if( asc == 0x04 && ascq == 0x0b){
> +                       if (asc == 0x04 &&
> +                           (ascq == 0x0b || ascq == 0x0a)) {
>                                 /*
>                                  * LOGICAL UNIT NOT ACCESSIBLE,
> -                                * TARGET PORT IN STANDBY STATE
> +                                * TARGET PORT IN STANDBY OR TRANSITION STATE
>                                  */
>                                 *msgid = CHECKER_MSGID_GHOST;
>                                 return PATH_GHOST;
> --
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-03-06 11:46         ` Martin Wilck
@ 2023-03-06 19:04           ` Benjamin Marzinski
  2023-03-07 10:31             ` Martin Wilck
  0 siblings, 1 reply; 16+ messages in thread
From: Benjamin Marzinski @ 2023-03-06 19:04 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel, Brian Bunker

On Mon, Mar 06, 2023 at 12:46:50PM +0100, Martin Wilck wrote:
> Hi Brian,
> 
> On Sat, 2023-03-04 at 12:49 -0800, Brian Bunker wrote:
> > > 
> > The checking for standby is 14 years old, and says that TUR returns
> > a unit attention when the path is in standby. I am not sure why that
> > wouldn’t be handled by this code above: I would think there should be
> > just one unit attention for each I_T_L when an ALUA state change
> > happens.Not sure how it exceeds the retry count.
> > 
> > if (key == 0x6) {
> >     /* Unit Attention, retry */
> >     if (—retry_tur)
> >         goto retry;
> > }
> > 
> > From my perspective failing a path for ALUA state standby is
> > reasonable
> > since it is not an active state.
> 
> I think the historic rationale for using GHOST is that some storage
> arrays, in particular active-passive configurations, may keep certain
> port groups in STANDBY state. If STANDBY was classified as FAILED,
> "multipath -ll" would show all paths of such port groups as FAILED,
> which would confuse users.
> 
> That's what I meant before, multipath's GHOST can mean multiple things
> depending on the actual hardware in use, explicit/implicit ALUA, etc.
> Given that today basically every hardware supports ALUA, we could
> probably do better. But changing the semantics in the current situation
> is risky and error prone.

I am sympathetic to Martin's view that GHOST is an ambiguous state, and
it's not at all clear that in means "temporarily between states". In
fact, it ususally doesn't.  On the other hand, if we can be pretty
certain that devices won't keep paths in the TRANSITIONING state for an
extended time, but we can't be certain what the end state will be, I do
see the rationale for not failing them preemtively. 

I wonder if PATH_PENDING makes more sense.  We would retain the existing
state until the path left the TRANSITIONING state.  The question is, are
you trying to make paths that are transitioning out of a failed state
come back sooner, or are you trying to keep paths that were in a active
state from being prevemtively failed.  Using PATH_PENDING won't fix the
first case, only the second.

PATH_PENDING makes sure that if IO to the path does start failing, the
checker won't keep setting the path back to an active state again.  It
also avoids the another GHOST issue, where the path would end up being
grouped with any actually passive paths, which isn't what we're looking
for.

The one problem I can think of off the top of my head would be that if
the device was held in the TRANSISTIONING state for a long time,
multipathd would keep checking it constantly, since PATH_PENDING is
really meant for cases where the checker hasn't completed yet, and we
just want to keep looking for the result. I suppose it would be possible
to add another state that worked just like pending (and could even get
converted to PATH_PENDING if there was no other state to be converted
to) but didn't cause us to retigger the checker so quickly.  But if
devices really will only be in TRANSITIONING for a short time, it might
not even be an issue we have to worry about.

Thoughts?

-Ben

> 
> Regards
> Martin
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-03-04 20:49       ` Brian Bunker
  2023-03-06 11:46         ` Martin Wilck
@ 2023-03-07 10:05         ` Martin Wilck
  1 sibling, 0 replies; 16+ messages in thread
From: Martin Wilck @ 2023-03-07 10:05 UTC (permalink / raw)
  To: Brian Bunker; +Cc: dm-devel

On Sat, 2023-03-04 at 12:49 -0800, Brian Bunker wrote:
> On Mar 3, 2023, at 12:41 PM, Martin Wilck <mwilck@suse.com> wrote:
> > 
> > On Fri, 2023-03-03 at 09:18 -0800, Brian Bunker wrote:
> > > > 
> > > Hello Martin, 
> > > 
> > > Doesn’t "something in between failed and up” semantics describe
> > > ALUA
> > > state transitioning pretty well?
> > 
> > transitioning - yes. standby - ??
> > 
> > Martin
> > 
> The checking for standby is 14 years old, and says that TUR returns
> a unit attention when the path is in standby. I am not sure why that
> wouldn’t be handled by this code above: I would think there should be
> just one unit attention for each I_T_L when an ALUA state change
> happens.

SPC6 mandates that UA be established after a (successful or failed)
state transition. The UA would be seen once after STANDBY was entered,
and after that, TUR would finish with CHECK CONDITION, NOT READY,
LOGICAL UNIT NOT ACCESSIBLE, TARGET PORT IN STANDBY STATE.

Our code is indeed correct in this respect.

Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-03-06 19:04           ` Benjamin Marzinski
@ 2023-03-07 10:31             ` Martin Wilck
  2023-03-07 19:41               ` Brian Bunker
  0 siblings, 1 reply; 16+ messages in thread
From: Martin Wilck @ 2023-03-07 10:31 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel, Brian Bunker

On Mon, 2023-03-06 at 13:04 -0600, Benjamin Marzinski wrote:
> On Mon, Mar 06, 2023 at 12:46:50PM +0100, Martin Wilck wrote:
> > Hi Brian,
> > 
> > On Sat, 2023-03-04 at 12:49 -0800, Brian Bunker wrote:
> > > > 
> > > The checking for standby is 14 years old, and says that TUR
> > > returns
> > > a unit attention when the path is in standby. I am not sure why
> > > that
> > > wouldn’t be handled by this code above: I would think there
> > > should be
> > > just one unit attention for each I_T_L when an ALUA state change
> > > happens.Not sure how it exceeds the retry count.
> > > 
> > > if (key == 0x6) {
> > >     /* Unit Attention, retry */
> > >     if (—retry_tur)
> > >         goto retry;
> > > }
> > > 
> > > From my perspective failing a path for ALUA state standby is
> > > reasonable
> > > since it is not an active state.
> > 
> > I think the historic rationale for using GHOST is that some storage
> > arrays, in particular active-passive configurations, may keep
> > certain
> > port groups in STANDBY state. If STANDBY was classified as FAILED,
> > "multipath -ll" would show all paths of such port groups as FAILED,
> > which would confuse users.
> > 
> > That's what I meant before, multipath's GHOST can mean multiple
> > things
> > depending on the actual hardware in use, explicit/implicit ALUA,
> > etc.
> > Given that today basically every hardware supports ALUA, we could
> > probably do better. But changing the semantics in the current
> > situation
> > is risky and error prone.
> 
> I am sympathetic to Martin's view that GHOST is an ambiguous state,
> and
> it's not at all clear that in means "temporarily between states". In
> fact, it ususally doesn't.  On the other hand, if we can be pretty
> certain that devices won't keep paths in the TRANSITIONING state for
> an
> extended time, but we can't be certain what the end state will be, I
> do
> see the rationale for not failing them preemtively. 

This is an important point, for which I don't see a general solution.
Unfortunately, if a device is TRANSITIONING, the SCSI spec offers no
means for us to determine what state it's transitioning to, not even
whether the transition is "up" or "down" in the state hierarchy. We can
only guess from the previous state, but it will never be more than just
that, a guess.

> I wonder if PATH_PENDING makes more sense.  We would retain the
> existing
> state until the path left the TRANSITIONING state.  The question is,
> are
> you trying to make paths that are transitioning out of a failed state
> come back sooner, or are you trying to keep paths that were in a
> active
> state from being prevemtively failed.  Using PATH_PENDING won't fix
> the
> first case, only the second.

A very interesting suggestion. I like it.

I think that it makes little sense to try and make such paths "come
back sooner". TRANSITIONING devices aren't usable, and any attempt to
try to use them will cause an IO error and switch to FAILED state
immediately by the kernel driver. PATH_PENDING would cause devices that
are "coming up" to be checked frequently, and thus make them available
within one checker interval of them becoming actually ACTIVE, which is
about the best we can do in the "transitioning up" case.

When the path is going "down" from PATH_UP state, PATH_PENDING would
imply that the kernel DM_STATE would remain as-is (probably "active").
If I/O is happening, the device would sooner or later be used by the
kernel, and the I/O would most probably fail, setting the path to
FAILED. With PATH_GHOST, the path would get a lower priority and thus
the likelyhood of it being used would be decreased, at least with
group_by_prio (although this would mean that this path would be grouped
together with STANDBY paths, see below).

Again, I think the behavior with PATH_PENDING would be the best we can
get. Whether or not the kernel fails the device in the meantime,
multipathd will issue TUR frequently, and eventually see the device
arriving in a new state, which will probably be STANDBY or UNAVAILABLE.

> 
> PATH_PENDING makes sure that if IO to the path does start failing,
> the
> checker won't keep setting the path back to an active state again. 
> It
> also avoids the another GHOST issue, where the path would end up
> being
> grouped with any actually passive paths, which isn't what we're
> looking
> for.

Good point! This causes pointless re-grouping of paths for group-by-
prio for every ALUA transition. OTOH, we see such regrouping anyway, in
particular if the paths don't transition simultaneously (or we don't
detect the transition simultaneously). The only way to avoid this would
be a path grouping algorithm that directly uses RTPG reported path
groups rather than grouping by prio. We don't have such an algorithm
currently.

> The one problem I can think of off the top of my head would be that
> if
> the device was held in the TRANSISTIONING state for a long time,
> multipathd would keep checking it constantly, since PATH_PENDING is
> really meant for cases where the checker hasn't completed yet, and we
> just want to keep looking for the result. I suppose it would be
> possible
> to add another state that worked just like pending (and could even
> get
> converted to PATH_PENDING if there was no other state to be converted
> to) but didn't cause us to retigger the checker so quickly.  But if
> devices really will only be in TRANSITIONING for a short time, it
> might
> not even be an issue we have to worry about.

The default transitioning timeout is 60s, and in my experience, even if
the hardware overrides it, it's rarely more than a few minutes. After
that, the kernel will set the state to STANDBY.

Unless we're both overlooking something, I agree with you that
PATH_PENDING is the right thing to do for TRANSITIONING. When a device
is in transition between states, we _want_ to check it often to make
sure we notice when the target state is reached.

We must then be careful not to overload PATH_PENDING with too many
different meanings. But I don't see this as a big issue currently.

Regards
Martin

> Thoughts?
> 
> -Ben
> 
> > 
> > Regards
> > Martin
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-03-07 10:31             ` Martin Wilck
@ 2023-03-07 19:41               ` Brian Bunker
  2023-03-07 21:15                 ` Martin Wilck
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Bunker @ 2023-03-07 19:41 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel


> On Mar 7, 2023, at 2:31 AM, Martin Wilck <mwilck@suse.com> wrote:
> 
> On Mon, 2023-03-06 at 13:04 -0600, Benjamin Marzinski wrote:
>> On Mon, Mar 06, 2023 at 12:46:50PM +0100, Martin Wilck wrote:
>>> Hi Brian,
>>> 
>>> On Sat, 2023-03-04 at 12:49 -0800, Brian Bunker wrote:
>>>>> 
>>>> The checking for standby is 14 years old, and says that TUR
>>>> returns
>>>> a unit attention when the path is in standby. I am not sure why
>>>> that
>>>> wouldn’t be handled by this code above: I would think there
>>>> should be
>>>> just one unit attention for each I_T_L when an ALUA state change
>>>> happens.Not sure how it exceeds the retry count.
>>>> 
>>>> if (key == 0x6) {
>>>>     /* Unit Attention, retry */
>>>>     if (—retry_tur)
>>>>         goto retry;
>>>> }
>>>> 
>>>> From my perspective failing a path for ALUA state standby is
>>>> reasonable
>>>> since it is not an active state.
>>> 
>>> I think the historic rationale for using GHOST is that some storage
>>> arrays, in particular active-passive configurations, may keep
>>> certain
>>> port groups in STANDBY state. If STANDBY was classified as FAILED,
>>> "multipath -ll" would show all paths of such port groups as FAILED,
>>> which would confuse users.
>>> 
>>> That's what I meant before, multipath's GHOST can mean multiple
>>> things
>>> depending on the actual hardware in use, explicit/implicit ALUA,
>>> etc.
>>> Given that today basically every hardware supports ALUA, we could
>>> probably do better. But changing the semantics in the current
>>> situation
>>> is risky and error prone.
>> 
>> I am sympathetic to Martin's view that GHOST is an ambiguous state,
>> and
>> it's not at all clear that in means "temporarily between states". In
>> fact, it ususally doesn't.  On the other hand, if we can be pretty
>> certain that devices won't keep paths in the TRANSITIONING state for
>> an
>> extended time, but we can't be certain what the end state will be, I
>> do
>> see the rationale for not failing them preemtively. 
> 
> This is an important point, for which I don't see a general solution.
> Unfortunately, if a device is TRANSITIONING, the SCSI spec offers no
> means for us to determine what state it's transitioning to, not even
> whether the transition is "up" or "down" in the state hierarchy. We can
> only guess from the previous state, but it will never be more than just
> that, a guess.
If a device is transitioning, it is quite likely that the other paths to this
device might also be transitioning. Any response that fails the path
could lead to the failing of all of the paths. I think the initiator making
any decision about the path state, up or down, for sure is wrong. In
general, if ALUA transitioning is returned, retry the same path up to
the transition timeout.
> 
>> I wonder if PATH_PENDING makes more sense.  We would retain the
>> existing
>> state until the path left the TRANSITIONING state.  The question is,
>> are
>> you trying to make paths that are transitioning out of a failed state
>> come back sooner, or are you trying to keep paths that were in a
>> active
>> state from being prevemtively failed.  Using PATH_PENDING won't fix
>> the
>> first case, only the second.
More the second. In many cases paths to a device are served by
different parts of a system. You might have two controllers performing
an HA function or even two arrays performing an HA function where
each array is made up of controllers performing an HA function. These
relationships will change based on external factors. There could be
connection disruptions in upgrade cases or individual controller
failures, etc. When this happens, an ALUA state change will happen
where the first step of that change could be returning ALUA state
transitioning until a more permanent state is achieved. That could be
all paths are back online, or some are and some are in some offline state
directing initiators to paths which can serve I/O. The initiator can’t
know what has happened by considering a single path.
> 
> A very interesting suggestion. I like it.
> 
> I think that it makes little sense to try and make such paths "come
> back sooner". TRANSITIONING devices aren't usable, and any attempt to
> try to use them will cause an IO error and switch to FAILED state
> immediately by the kernel driver. PATH_PENDING would cause devices that
> are "coming up" to be checked frequently, and thus make them available
> within one checker interval of them becoming actually ACTIVE, which is
> about the best we can do in the "transitioning up" case.
> 
> When the path is going "down" from PATH_UP state, PATH_PENDING would
> imply that the kernel DM_STATE would remain as-is (probably "active").
> If I/O is happening, the device would sooner or later be used by the
> kernel, and the I/O would most probably fail, setting the path to
> FAILED. With PATH_GHOST, the path would get a lower priority and thus
> the likelyhood of it being used would be decreased, at least with
> group_by_prio (although this would mean that this path would be grouped
> together with STANDBY paths, see below).
> 
> Again, I think the behavior with PATH_PENDING would be the best we can
> get. Whether or not the kernel fails the device in the meantime,
> multipathd will issue TUR frequently, and eventually see the device
> arriving in a new state, which will probably be STANDBY or UNAVAILABLE.
> 
>> 
>> PATH_PENDING makes sure that if IO to the path does start failing,
>> the
>> checker won't keep setting the path back to an active state again. 
>> It
>> also avoids the another GHOST issue, where the path would end up
>> being
>> grouped with any actually passive paths, which isn't what we're
>> looking
>> for.
> 
> Good point! This causes pointless re-grouping of paths for group-by-
> prio for every ALUA transition. OTOH, we see such regrouping anyway, in
> particular if the paths don't transition simultaneously (or we don't
> detect the transition simultaneously). The only way to avoid this would
> be a path grouping algorithm that directly uses RTPG reported path
> groups rather than grouping by prio. We don't have such an algorithm
> currently.
Other OS’s do that. Once you have MPIO, both ESX and Windows
use RTPG and not TUR for determining individual path health. Even
this is not perfect. The ALUA state will be correct for the TPG containing
the index of the I_T nexus that sent it, but the ALUA states of the other
TPGs returned is also a best guess. Sending the RTPG down all paths
gives the full picture.
> 
>> The one problem I can think of off the top of my head would be that
>> if
>> the device was held in the TRANSISTIONING state for a long time,
>> multipathd would keep checking it constantly, since PATH_PENDING is
>> really meant for cases where the checker hasn't completed yet, and we
>> just want to keep looking for the result. I suppose it would be
>> possible
>> to add another state that worked just like pending (and could even
>> get
>> converted to PATH_PENDING if there was no other state to be converted
>> to) but didn't cause us to retigger the checker so quickly.  But if
>> devices really will only be in TRANSITIONING for a short time, it
>> might
>> not even be an issue we have to worry about.
This will likely depend on the kernel version. ALUA state transitioning
has been handled differently in different versions of the kernel. Currently
I believe a TUR sent to a path in ALUA state transition will not return to
the caller until it either succeeds, fails with a different sense code, or the
transition timeout has been exceeded. This hasn’t always been the case,
but I think is the right thing to do unless the multipath layer would be the
only place to fail the path and could handle the retries and not the kernel.
> 
> The default transitioning timeout is 60s, and in my experience, even if
> the hardware overrides it, it's rarely more than a few minutes. After
> that, the kernel will set the state to STANDBY.
Yes. The case of a target keeping a path in the transitioning state
Indefinitely is handled by the device handler.
> 
> Unless we're both overlooking something, I agree with you that
> PATH_PENDING is the right thing to do for TRANSITIONING. When a device
> is in transition between states, we _want_ to check it often to make
> sure we notice when the target state is reached.
> 
> We must then be careful not to overload PATH_PENDING with too many
> different meanings. But I don't see this as a big issue currently.
> 
> Regards
> Martin
> 
>> Thoughts?
>> 
>> -Ben
>> 
>>> 
>>> Regards
>>> Martin


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-03-07 19:41               ` Brian Bunker
@ 2023-03-07 21:15                 ` Martin Wilck
  2023-03-09 21:40                   ` Brian Bunker
  0 siblings, 1 reply; 16+ messages in thread
From: Martin Wilck @ 2023-03-07 21:15 UTC (permalink / raw)
  To: Brian Bunker; +Cc: dm-devel

Brian,

On Tue, 2023-03-07 at 11:41 -0800, Brian Bunker wrote:
> 
> > On Mar 7, 2023, at 2:31 AM, Martin Wilck <mwilck@suse.com> wrote:
> > 
> > On Mon, 2023-03-06 at 13:04 -0600, Benjamin Marzinski wrote:
> > > On Mon, Mar 06, 2023 at 12:46:50PM +0100, Martin Wilck wrote:
> > > > Hi Brian,
> > > > 
> > > > On Sat, 2023-03-04 at 12:49 -0800, Brian Bunker wrote:
> > > > > > 
> > > > > The checking for standby is 14 years old, and says that TUR
> > > > > returns
> > > > > a unit attention when the path is in standby. I am not sure
> > > > > why
> > > > > that
> > > > > wouldn’t be handled by this code above: I would think there
> > > > > should be
> > > > > just one unit attention for each I_T_L when an ALUA state
> > > > > change
> > > > > happens.Not sure how it exceeds the retry count.
> > > > > 
> > > > > if (key == 0x6) {
> > > > >     /* Unit Attention, retry */
> > > > >     if (—retry_tur)
> > > > >         goto retry;
> > > > > }
> > > > > 
> > > > > From my perspective failing a path for ALUA state standby is
> > > > > reasonable
> > > > > since it is not an active state.
> > > > 
> > > > I think the historic rationale for using GHOST is that some
> > > > storage
> > > > arrays, in particular active-passive configurations, may keep
> > > > certain
> > > > port groups in STANDBY state. If STANDBY was classified as
> > > > FAILED,
> > > > "multipath -ll" would show all paths of such port groups as
> > > > FAILED,
> > > > which would confuse users.
> > > > 
> > > > That's what I meant before, multipath's GHOST can mean multiple
> > > > things
> > > > depending on the actual hardware in use, explicit/implicit
> > > > ALUA,
> > > > etc.
> > > > Given that today basically every hardware supports ALUA, we
> > > > could
> > > > probably do better. But changing the semantics in the current
> > > > situation
> > > > is risky and error prone.
> > > 
> > > I am sympathetic to Martin's view that GHOST is an ambiguous
> > > state,
> > > and
> > > it's not at all clear that in means "temporarily between states".
> > > In
> > > fact, it ususally doesn't.  On the other hand, if we can be
> > > pretty
> > > certain that devices won't keep paths in the TRANSITIONING state
> > > for
> > > an
> > > extended time, but we can't be certain what the end state will
> > > be, I
> > > do
> > > see the rationale for not failing them preemtively. 
> > 
> > This is an important point, for which I don't see a general
> > solution.
> > Unfortunately, if a device is TRANSITIONING, the SCSI spec offers
> > no
> > means for us to determine what state it's transitioning to, not
> > even
> > whether the transition is "up" or "down" in the state hierarchy. We
> > can
> > only guess from the previous state, but it will never be more than
> > just
> > that, a guess.
> If a device is transitioning, it is quite likely that the other paths
> to this
> device might also be transitioning. Any response that fails the path
> could lead to the failing of all of the paths. I think the initiator
> making
> any decision about the path state, up or down, for sure is wrong. In
> general, if ALUA transitioning is returned, retry the same path up to
> the transition timeout.
> > 
> > > I wonder if PATH_PENDING makes more sense.  We would retain the
> > > existing
> > > state until the path left the TRANSITIONING state.  The question
> > > is,
> > > are
> > > you trying to make paths that are transitioning out of a failed
> > > state
> > > come back sooner, or are you trying to keep paths that were in a
> > > active
> > > state from being prevemtively failed.  Using PATH_PENDING won't
> > > fix
> > > the
> > > first case, only the second.
> More the second. In many cases paths to a device are served by
> different parts of a system. You might have two controllers
> performing
> an HA function or even two arrays performing an HA function where
> each array is made up of controllers performing an HA function. These
> relationships will change based on external factors. There could be
> connection disruptions in upgrade cases or individual controller
> failures, etc. When this happens, an ALUA state change will happen
> where the first step of that change could be returning ALUA state
> transitioning until a more permanent state is achieved. That could be
> all paths are back online, or some are and some are in some offline
> state
> directing initiators to paths which can serve I/O. The initiator
> can’t
> know what has happened by considering a single path.

Hm. Could it "know what has happened" by looking at multiple paths?
I don't think so.

Are you arguing for or against TRANSITIONING = PATH_PENDING here?

Again: if we use PATH_PENDING for a previously active path that is now
transitioning, it will be used for I/O, and be failed. If we use
PATH_GHOST, it will be regrouped together with STANDBY paths and thus
will only be used for I/O if all other good paths fail.

What is the right behavior, in your opinion?

> > 
> > A very interesting suggestion. I like it.
> > 
> > I think that it makes little sense to try and make such paths "come
> > back sooner". TRANSITIONING devices aren't usable, and any attempt
> > to
> > try to use them will cause an IO error and switch to FAILED state
> > immediately by the kernel driver. PATH_PENDING would cause devices
> > that
> > are "coming up" to be checked frequently, and thus make them
> > available
> > within one checker interval of them becoming actually ACTIVE, which
> > is
> > about the best we can do in the "transitioning up" case.
> > 
> > When the path is going "down" from PATH_UP state, PATH_PENDING
> > would
> > imply that the kernel DM_STATE would remain as-is (probably
> > "active").
> > If I/O is happening, the device would sooner or later be used by
> > the
> > kernel, and the I/O would most probably fail, setting the path to
> > FAILED. With PATH_GHOST, the path would get a lower priority and
> > thus
> > the likelyhood of it being used would be decreased, at least with
> > group_by_prio (although this would mean that this path would be
> > grouped
> > together with STANDBY paths, see below).
> > 
> > Again, I think the behavior with PATH_PENDING would be the best we
> > can
> > get. Whether or not the kernel fails the device in the meantime,
> > multipathd will issue TUR frequently, and eventually see the device
> > arriving in a new state, which will probably be STANDBY or
> > UNAVAILABLE.
> > 
> > > 
> > > PATH_PENDING makes sure that if IO to the path does start
> > > failing,
> > > the
> > > checker won't keep setting the path back to an active state
> > > again. 
> > > It
> > > also avoids the another GHOST issue, where the path would end up
> > > being
> > > grouped with any actually passive paths, which isn't what we're
> > > looking
> > > for.
> > 
> > Good point! This causes pointless re-grouping of paths for group-
> > by-
> > prio for every ALUA transition. OTOH, we see such regrouping
> > anyway, in
> > particular if the paths don't transition simultaneously (or we
> > don't
> > detect the transition simultaneously). The only way to avoid this
> > would
> > be a path grouping algorithm that directly uses RTPG reported path
> > groups rather than grouping by prio. We don't have such an
> > algorithm
> > currently.
> Other OS’s do that. Once you have MPIO, both ESX and Windows
> use RTPG and not TUR for determining individual path health.

We could do that too, but the code hasn't been written so far. It's
further complicated on Linux by the fact that dm-multipath is based on
the generic DM layer, which has no concept of ALUA.

Patches welcome ;-) 

IMO the first step towards a solution like this would be to set up path
groups directly from RTPG data rather than indirectly through
priorities. This would mean that (marginal path logic aside) path
groups would stay constant, except if they change on the storage side
(which happens rarely unless I am mistaken). I don't think we can use
RTPG for path state because we need to track individual path states
(see below).

>  Even
> this is not perfect. The ALUA state will be correct for the TPG
> containing
> the index of the I_T nexus that sent it, but the ALUA states of the
> other
> TPGs returned is also a best guess. Sending the RTPG down all paths
> gives the full picture.

I don't think RTPG will ever give you the full picture. It may provide
insight into how the storage array has organized its ports. But there
are still cases where a single path may fail, or be shaky, because of
fabric issues that have nothing to do with the storage array or ALUA.
And this is also the reason why the initiator can't just rely on ALUA
determining path state. It can happen that all ports change state
simultaneously in a state transition. But it can also happen that just
one path fails because of a broken cable.

> > 
> > > The one problem I can think of off the top of my head would be
> > > that
> > > if
> > > the device was held in the TRANSISTIONING state for a long time,
> > > multipathd would keep checking it constantly, since PATH_PENDING
> > > is
> > > really meant for cases where the checker hasn't completed yet,
> > > and we
> > > just want to keep looking for the result. I suppose it would be
> > > possible
> > > to add another state that worked just like pending (and could
> > > even
> > > get
> > > converted to PATH_PENDING if there was no other state to be
> > > converted
> > > to) but didn't cause us to retigger the checker so quickly.  But
> > > if
> > > devices really will only be in TRANSITIONING for a short time, it
> > > might
> > > not even be an issue we have to worry about.
> This will likely depend on the kernel version. ALUA state
> transitioning
> has been handled differently in different versions of the kernel.
> Currently
> I believe a TUR sent to a path in ALUA state transition will not
> return to
> the caller until it either succeeds, fails with a different sense
> code, or the
> transition timeout has been exceeded. 
> This hasn’t always been the case,

IMO this is wrong. alua_check_sense() returns NEEDS_RETRY for
transitioning, and pass-through commands from multipathd will not be
retried by the mid layer. And that has been so for a while.

> but I think is the right thing to do unless the multipath layer would
> be the
> only place to fail the path and could handle the retries and not the
> kernel.

I don't understand what your are arguing for. Should we use
PATH_PENDING for ALUA transitioning state, as Ben suggested, or
PATH_GHOST, like in your original patch?

Regards
Martin

> > 
> > The default transitioning timeout is 60s, and in my experience,
> > even if
> > the hardware overrides it, it's rarely more than a few minutes.
> > After
> > that, the kernel will set the state to STANDBY.
> Yes. The case of a target keeping a path in the transitioning state
> Indefinitely is handled by the device handler.
> > 
> > Unless we're both overlooking something, I agree with you that
> > PATH_PENDING is the right thing to do for TRANSITIONING. When a
> > device
> > is in transition between states, we _want_ to check it often to
> > make
> > sure we notice when the target state is reached.
> > 
> > We must then be careful not to overload PATH_PENDING with too many
> > different meanings. But I don't see this as a big issue currently.
> > 
> > Regards
> > Martin
> > 
> > > Thoughts?
> > > 
> > > -Ben
> > > 
> > > > 
> > > > Regards
> > > > Martin
> 
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-03-07 21:15                 ` Martin Wilck
@ 2023-03-09 21:40                   ` Brian Bunker
  2023-03-09 21:50                     ` Martin Wilck
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Bunker @ 2023-03-09 21:40 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel


> On Mar 7, 2023, at 1:15 PM, Martin Wilck <mwilck@suse.com> wrote:
> 
> Brian,
> 
> On Tue, 2023-03-07 at 11:41 -0800, Brian Bunker wrote:
>> 
>>> On Mar 7, 2023, at 2:31 AM, Martin Wilck <mwilck@suse.com> wrote:
>>> 
>>> On Mon, 2023-03-06 at 13:04 -0600, Benjamin Marzinski wrote:
>>>> On Mon, Mar 06, 2023 at 12:46:50PM +0100, Martin Wilck wrote:
>>>>> Hi Brian,
>>>>> 
>>>>> On Sat, 2023-03-04 at 12:49 -0800, Brian Bunker wrote:
>>>>>>> 
>>>>>> The checking for standby is 14 years old, and says that TUR
>>>>>> returns
>>>>>> a unit attention when the path is in standby. I am not sure
>>>>>> why
>>>>>> that
>>>>>> wouldn’t be handled by this code above: I would think there
>>>>>> should be
>>>>>> just one unit attention for each I_T_L when an ALUA state
>>>>>> change
>>>>>> happens.Not sure how it exceeds the retry count.
>>>>>> 
>>>>>> if (key == 0x6) {
>>>>>>     /* Unit Attention, retry */
>>>>>>     if (—retry_tur)
>>>>>>         goto retry;
>>>>>> }
>>>>>> 
>>>>>> From my perspective failing a path for ALUA state standby is
>>>>>> reasonable
>>>>>> since it is not an active state.
>>>>> 
>>>>> I think the historic rationale for using GHOST is that some
>>>>> storage
>>>>> arrays, in particular active-passive configurations, may keep
>>>>> certain
>>>>> port groups in STANDBY state. If STANDBY was classified as
>>>>> FAILED,
>>>>> "multipath -ll" would show all paths of such port groups as
>>>>> FAILED,
>>>>> which would confuse users.
>>>>> 
>>>>> That's what I meant before, multipath's GHOST can mean multiple
>>>>> things
>>>>> depending on the actual hardware in use, explicit/implicit
>>>>> ALUA,
>>>>> etc.
>>>>> Given that today basically every hardware supports ALUA, we
>>>>> could
>>>>> probably do better. But changing the semantics in the current
>>>>> situation
>>>>> is risky and error prone.
>>>> 
>>>> I am sympathetic to Martin's view that GHOST is an ambiguous
>>>> state,
>>>> and
>>>> it's not at all clear that in means "temporarily between states".
>>>> In
>>>> fact, it ususally doesn't.  On the other hand, if we can be
>>>> pretty
>>>> certain that devices won't keep paths in the TRANSITIONING state
>>>> for
>>>> an
>>>> extended time, but we can't be certain what the end state will
>>>> be, I
>>>> do
>>>> see the rationale for not failing them preemtively. 
>>> 
>>> This is an important point, for which I don't see a general
>>> solution.
>>> Unfortunately, if a device is TRANSITIONING, the SCSI spec offers
>>> no
>>> means for us to determine what state it's transitioning to, not
>>> even
>>> whether the transition is "up" or "down" in the state hierarchy. We
>>> can
>>> only guess from the previous state, but it will never be more than
>>> just
>>> that, a guess.
>> If a device is transitioning, it is quite likely that the other paths
>> to this
>> device might also be transitioning. Any response that fails the path
>> could lead to the failing of all of the paths. I think the initiator
>> making
>> any decision about the path state, up or down, for sure is wrong. In
>> general, if ALUA transitioning is returned, retry the same path up to
>> the transition timeout.
>>> 
>>>> I wonder if PATH_PENDING makes more sense.  We would retain the
>>>> existing
>>>> state until the path left the TRANSITIONING state.  The question
>>>> is,
>>>> are
>>>> you trying to make paths that are transitioning out of a failed
>>>> state
>>>> come back sooner, or are you trying to keep paths that were in a
>>>> active
>>>> state from being prevemtively failed.  Using PATH_PENDING won't
>>>> fix
>>>> the
>>>> first case, only the second.
>> More the second. In many cases paths to a device are served by
>> different parts of a system. You might have two controllers
>> performing
>> an HA function or even two arrays performing an HA function where
>> each array is made up of controllers performing an HA function. These
>> relationships will change based on external factors. There could be
>> connection disruptions in upgrade cases or individual controller
>> failures, etc. When this happens, an ALUA state change will happen
>> where the first step of that change could be returning ALUA state
>> transitioning until a more permanent state is achieved. That could be
>> all paths are back online, or some are and some are in some offline
>> state
>> directing initiators to paths which can serve I/O. The initiator
>> can’t
>> know what has happened by considering a single path.
> 
> Hm. Could it "know what has happened" by looking at multiple paths?
> I don't think so.
> 
> Are you arguing for or against TRANSITIONING = PATH_PENDING here?
> 
> Again: if we use PATH_PENDING for a previously active path that is now
> transitioning, it will be used for I/O, and be failed. If we use
> PATH_GHOST, it will be regrouped together with STANDBY paths and thus
> will only be used for I/O if all other good paths fail.
> 
> What is the right behavior, in your opinion?
> 
>>> 
>>> A very interesting suggestion. I like it.
>>> 
>>> I think that it makes little sense to try and make such paths "come
>>> back sooner". TRANSITIONING devices aren't usable, and any attempt
>>> to
>>> try to use them will cause an IO error and switch to FAILED state
>>> immediately by the kernel driver. PATH_PENDING would cause devices
>>> that
>>> are "coming up" to be checked frequently, and thus make them
>>> available
>>> within one checker interval of them becoming actually ACTIVE, which
>>> is
>>> about the best we can do in the "transitioning up" case.
>>> 
>>> When the path is going "down" from PATH_UP state, PATH_PENDING
>>> would
>>> imply that the kernel DM_STATE would remain as-is (probably
>>> "active").
>>> If I/O is happening, the device would sooner or later be used by
>>> the
>>> kernel, and the I/O would most probably fail, setting the path to
>>> FAILED. With PATH_GHOST, the path would get a lower priority and
>>> thus
>>> the likelyhood of it being used would be decreased, at least with
>>> group_by_prio (although this would mean that this path would be
>>> grouped
>>> together with STANDBY paths, see below).
>>> 
>>> Again, I think the behavior with PATH_PENDING would be the best we
>>> can
>>> get. Whether or not the kernel fails the device in the meantime,
>>> multipathd will issue TUR frequently, and eventually see the device
>>> arriving in a new state, which will probably be STANDBY or
>>> UNAVAILABLE.
>>> 
>>>> 
>>>> PATH_PENDING makes sure that if IO to the path does start
>>>> failing,
>>>> the
>>>> checker won't keep setting the path back to an active state
>>>> again. 
>>>> It
>>>> also avoids the another GHOST issue, where the path would end up
>>>> being
>>>> grouped with any actually passive paths, which isn't what we're
>>>> looking
>>>> for.
>>> 
>>> Good point! This causes pointless re-grouping of paths for group-
>>> by-
>>> prio for every ALUA transition. OTOH, we see such regrouping
>>> anyway, in
>>> particular if the paths don't transition simultaneously (or we
>>> don't
>>> detect the transition simultaneously). The only way to avoid this
>>> would
>>> be a path grouping algorithm that directly uses RTPG reported path
>>> groups rather than grouping by prio. We don't have such an
>>> algorithm
>>> currently.
>> Other OS’s do that. Once you have MPIO, both ESX and Windows
>> use RTPG and not TUR for determining individual path health.
> 
> We could do that too, but the code hasn't been written so far. It's
> further complicated on Linux by the fact that dm-multipath is based on
> the generic DM layer, which has no concept of ALUA.
> 
> Patches welcome ;-) 
> 
> IMO the first step towards a solution like this would be to set up path
> groups directly from RTPG data rather than indirectly through
> priorities. This would mean that (marginal path logic aside) path
> groups would stay constant, except if they change on the storage side
> (which happens rarely unless I am mistaken). I don't think we can use
> RTPG for path state because we need to track individual path states
> (see below).
> 
>> Even
>> this is not perfect. The ALUA state will be correct for the TPG
>> containing
>> the index of the I_T nexus that sent it, but the ALUA states of the
>> other
>> TPGs returned is also a best guess. Sending the RTPG down all paths
>> gives the full picture.
> 
> I don't think RTPG will ever give you the full picture. It may provide
> insight into how the storage array has organized its ports. But there
> are still cases where a single path may fail, or be shaky, because of
> fabric issues that have nothing to do with the storage array or ALUA.
> And this is also the reason why the initiator can't just rely on ALUA
> determining path state. It can happen that all ports change state
> simultaneously in a state transition. But it can also happen that just
> one path fails because of a broken cable.
> 
>>> 
>>>> The one problem I can think of off the top of my head would be
>>>> that
>>>> if
>>>> the device was held in the TRANSISTIONING state for a long time,
>>>> multipathd would keep checking it constantly, since PATH_PENDING
>>>> is
>>>> really meant for cases where the checker hasn't completed yet,
>>>> and we
>>>> just want to keep looking for the result. I suppose it would be
>>>> possible
>>>> to add another state that worked just like pending (and could
>>>> even
>>>> get
>>>> converted to PATH_PENDING if there was no other state to be
>>>> converted
>>>> to) but didn't cause us to retigger the checker so quickly.  But
>>>> if
>>>> devices really will only be in TRANSITIONING for a short time, it
>>>> might
>>>> not even be an issue we have to worry about.
>> This will likely depend on the kernel version. ALUA state
>> transitioning
>> has been handled differently in different versions of the kernel.
>> Currently
>> I believe a TUR sent to a path in ALUA state transition will not
>> return to
>> the caller until it either succeeds, fails with a different sense
>> code, or the
>> transition timeout has been exceeded. 
>> This hasn’t always been the case,
> 
> IMO this is wrong. alua_check_sense() returns NEEDS_RETRY for
> transitioning, and pass-through commands from multipathd will not be
> retried by the mid layer. And that has been so for a while.
> 
>> but I think is the right thing to do unless the multipath layer would
>> be the
>> only place to fail the path and could handle the retries and not the
>> kernel.
> 
> I don't understand what your are arguing for. Should we use
> PATH_PENDING for ALUA transitioning state, as Ben suggested, or
> PATH_GHOST, like in your original patch?
> 
> Regards
> Martin
Martin,

Sorry I bounce between kernel versions a lot since most of the
problems which find their way to us are released Linux versions
whose kernels are quite a bit older than upstream.I got a chance
to try the proposed solutions. The PATH_GHOST above does what
I am looking for which is not to have the path checker fail the path.

This also works as your earlier comments suggest. This does seem
clearer as to what is happening on the path:

diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c
index fdb91e17..50f0773e 100644
--- a/libmultipath/checkers.c
+++ b/libmultipath/checkers.c
@@ -343,6 +343,7 @@ static const char *generic_msg[CHECKER_GENERIC_MSGTABLE_SIZE] = {
        [CHECKER_MSGID_UP] = " reports path is up",
        [CHECKER_MSGID_DOWN] = " reports path is down",
        [CHECKER_MSGID_GHOST] = " reports path is ghost",
+       [CHECKER_MSGID_PENDING] = " reports path is transitioning",
        [CHECKER_MSGID_UNSUPPORTED] = " doesn't support this device",
 };

diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h
index ea1e8af6..4fbad621 100644
--- a/libmultipath/checkers.h
+++ b/libmultipath/checkers.h
@@ -111,6 +111,7 @@ enum {
        CHECKER_MSGID_UP,
        CHECKER_MSGID_DOWN,
        CHECKER_MSGID_GHOST,
+       CHECKER_MSGID_PENDING,
        CHECKER_MSGID_UNSUPPORTED,
        CHECKER_GENERIC_MSGTABLE_SIZE,
        CHECKER_FIRST_MSGID = 100,      /* lowest msgid for checkers */

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 551dc4f0..e1742c2b 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -179,7 +179,7 @@ retry:
                else if (key == 0x2) {
                        /* Not Ready */
                        /* Note: Other ALUA states are either UP or DOWN */
-                       if( asc == 0x04 && ascq == 0x0b){
+                       if (asc == 0x04 && ascq == 0x0b) {
                                /*
                                 * LOGICAL UNIT NOT ACCESSIBLE,
                                 * TARGET PORT IN STANDBY STATE
@@ -187,6 +187,14 @@ retry:
                                *msgid = CHECKER_MSGID_GHOST;
                                return PATH_GHOST;
                        }
+                       if (asc == 0x04 && ascq == 0x0a) {
+                               /*
+                                * LOGICAL UNIT NOT ACCESSIBLE,
+                                * TARGET PORT IN TRANSITION STATE
+                                */
+                               *msgid = CHECKER_MSGID_PENDING;
+                               return PATH_PENDING;
+                       }
                }
                *msgid = CHECKER_MSGID_DOWN;
                return PATH_DOWN;

This change also keeps the path from being failed by the checker.
It seems to go into and out of transitioning from other states.

Thanks,
Brian 
> 
>>> 
>>> The default transitioning timeout is 60s, and in my experience,
>>> even if
>>> the hardware overrides it, it's rarely more than a few minutes.
>>> After
>>> that, the kernel will set the state to STANDBY.
>> Yes. The case of a target keeping a path in the transitioning state
>> Indefinitely is handled by the device handler.
>>> 
>>> Unless we're both overlooking something, I agree with you that
>>> PATH_PENDING is the right thing to do for TRANSITIONING. When a
>>> device
>>> is in transition between states, we _want_ to check it often to
>>> make
>>> sure we notice when the target state is reached.
>>> 
>>> We must then be careful not to overload PATH_PENDING with too many
>>> different meanings. But I don't see this as a big issue currently.
>>> 
>>> Regards
>>> Martin
>>> 
>>>> Thoughts?
>>>> 
>>>> -Ben
>>>> 
>>>>> 
>>>>> Regards
>>>>> Martin


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-03-09 21:40                   ` Brian Bunker
@ 2023-03-09 21:50                     ` Martin Wilck
  2023-03-09 22:11                       ` Brian Bunker
  0 siblings, 1 reply; 16+ messages in thread
From: Martin Wilck @ 2023-03-09 21:50 UTC (permalink / raw)
  To: Brian Bunker; +Cc: dm-devel

Brian,

On Thu, 2023-03-09 at 13:40 -0800, Brian Bunker wrote:
> 
> Martin,
> 
> Sorry I bounce between kernel versions a lot since most of the
> problems which find their way to us are released Linux versions
> whose kernels are quite a bit older than upstream.I got a chance
> to try the proposed solutions. The PATH_GHOST above does what
> I am looking for which is not to have the path checker fail the path.
> 
> This also works as your earlier comments suggest. This does seem
> clearer as to what is happening on the path:
> 

I apologize for being slow, but I don't quite understand this response.
Have you tested Ben's patch set? Does it work for you? Is the patch
below meant to be applied on top of Ben's work?

Martin

> diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c
> index fdb91e17..50f0773e 100644
> --- a/libmultipath/checkers.c
> +++ b/libmultipath/checkers.c
> @@ -343,6 +343,7 @@ static const char
> *generic_msg[CHECKER_GENERIC_MSGTABLE_SIZE] = {
>         [CHECKER_MSGID_UP] = " reports path is up",
>         [CHECKER_MSGID_DOWN] = " reports path is down",
>         [CHECKER_MSGID_GHOST] = " reports path is ghost",
> +       [CHECKER_MSGID_PENDING] = " reports path is transitioning",
>         [CHECKER_MSGID_UNSUPPORTED] = " doesn't support this device",
>  };
> 
> diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h
> index ea1e8af6..4fbad621 100644
> --- a/libmultipath/checkers.h
> +++ b/libmultipath/checkers.h
> @@ -111,6 +111,7 @@ enum {
>         CHECKER_MSGID_UP,
>         CHECKER_MSGID_DOWN,
>         CHECKER_MSGID_GHOST,
> +       CHECKER_MSGID_PENDING,
>         CHECKER_MSGID_UNSUPPORTED,
>         CHECKER_GENERIC_MSGTABLE_SIZE,
>         CHECKER_FIRST_MSGID = 100,      /* lowest msgid for checkers
> */
> 
> diff --git a/libmultipath/checkers/tur.c
> b/libmultipath/checkers/tur.c
> index 551dc4f0..e1742c2b 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -179,7 +179,7 @@ retry:
>                 else if (key == 0x2) {
>                         /* Not Ready */
>                         /* Note: Other ALUA states are either UP or
> DOWN */
> -                       if( asc == 0x04 && ascq == 0x0b){
> +                       if (asc == 0x04 && ascq == 0x0b) {
>                                 /*
>                                  * LOGICAL UNIT NOT ACCESSIBLE,
>                                  * TARGET PORT IN STANDBY STATE
> @@ -187,6 +187,14 @@ retry:
>                                 *msgid = CHECKER_MSGID_GHOST;
>                                 return PATH_GHOST;
>                         }
> +                       if (asc == 0x04 && ascq == 0x0a) {
> +                               /*
> +                                * LOGICAL UNIT NOT ACCESSIBLE,
> +                                * TARGET PORT IN TRANSITION STATE
> +                                */
> +                               *msgid = CHECKER_MSGID_PENDING;
> +                               return PATH_PENDING;
> +                       }
>                 }
>                 *msgid = CHECKER_MSGID_DOWN;
>                 return PATH_DOWN;
> 
> This change also keeps the path from being failed by the checker.
> It seems to go into and out of transitioning from other states.
> 
> Thanks,
> Brian 
> > 
> > > > 
> > > > The default transitioning timeout is 60s, and in my experience,
> > > > even if
> > > > the hardware overrides it, it's rarely more than a few minutes.
> > > > After
> > > > that, the kernel will set the state to STANDBY.
> > > Yes. The case of a target keeping a path in the transitioning
> > > state
> > > Indefinitely is handled by the device handler.
> > > > 
> > > > Unless we're both overlooking something, I agree with you that
> > > > PATH_PENDING is the right thing to do for TRANSITIONING. When a
> > > > device
> > > > is in transition between states, we _want_ to check it often to
> > > > make
> > > > sure we notice when the target state is reached.
> > > > 
> > > > We must then be careful not to overload PATH_PENDING with too
> > > > many
> > > > different meanings. But I don't see this as a big issue
> > > > currently.
> > > > 
> > > > Regards
> > > > Martin
> > > > 
> > > > > Thoughts?
> > > > > 
> > > > > -Ben
> > > > > 
> > > > > > 
> > > > > > Regards
> > > > > > Martin
> 
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-03-09 21:50                     ` Martin Wilck
@ 2023-03-09 22:11                       ` Brian Bunker
  2023-03-10 15:28                         ` Martin Wilck
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Bunker @ 2023-03-09 22:11 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel


> On Mar 9, 2023, at 1:50 PM, Martin Wilck <mwilck@suse.com> wrote:
> 
> Brian,
> 
> On Thu, 2023-03-09 at 13:40 -0800, Brian Bunker wrote:
>> 
>> Martin,
>> 
>> Sorry I bounce between kernel versions a lot since most of the
>> problems which find their way to us are released Linux versions
>> whose kernels are quite a bit older than upstream.I got a chance
>> to try the proposed solutions. The PATH_GHOST above does what
>> I am looking for which is not to have the path checker fail the path.
>> 
>> This also works as your earlier comments suggest. This does seem
>> clearer as to what is happening on the path:
>> 
> 
> I apologize for being slow, but I don't quite understand this response.
> Have you tested Ben's patch set? Does it work for you? Is the patch
> below meant to be applied on top of Ben's work?
> 
> Martin
Martin,

This is just a patch I made instead of the original patch to verify
using PATH_PENDING would work in our case, and it does. It has
no relationship to anything Ben was doing.

Thanks,
Brian
> 
>> diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c
>> index fdb91e17..50f0773e 100644
>> --- a/libmultipath/checkers.c
>> +++ b/libmultipath/checkers.c
>> @@ -343,6 +343,7 @@ static const char
>> *generic_msg[CHECKER_GENERIC_MSGTABLE_SIZE] = {
>>         [CHECKER_MSGID_UP] = " reports path is up",
>>         [CHECKER_MSGID_DOWN] = " reports path is down",
>>         [CHECKER_MSGID_GHOST] = " reports path is ghost",
>> +       [CHECKER_MSGID_PENDING] = " reports path is transitioning",
>>         [CHECKER_MSGID_UNSUPPORTED] = " doesn't support this device",
>>  };
>> 
>> diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h
>> index ea1e8af6..4fbad621 100644
>> --- a/libmultipath/checkers.h
>> +++ b/libmultipath/checkers.h
>> @@ -111,6 +111,7 @@ enum {
>>         CHECKER_MSGID_UP,
>>         CHECKER_MSGID_DOWN,
>>         CHECKER_MSGID_GHOST,
>> +       CHECKER_MSGID_PENDING,
>>         CHECKER_MSGID_UNSUPPORTED,
>>         CHECKER_GENERIC_MSGTABLE_SIZE,
>>         CHECKER_FIRST_MSGID = 100,      /* lowest msgid for checkers
>> */
>> 
>> diff --git a/libmultipath/checkers/tur.c
>> b/libmultipath/checkers/tur.c
>> index 551dc4f0..e1742c2b 100644
>> --- a/libmultipath/checkers/tur.c
>> +++ b/libmultipath/checkers/tur.c
>> @@ -179,7 +179,7 @@ retry:
>>                 else if (key == 0x2) {
>>                         /* Not Ready */
>>                         /* Note: Other ALUA states are either UP or
>> DOWN */
>> -                       if( asc == 0x04 && ascq == 0x0b){
>> +                       if (asc == 0x04 && ascq == 0x0b) {
>>                                 /*
>>                                  * LOGICAL UNIT NOT ACCESSIBLE,
>>                                  * TARGET PORT IN STANDBY STATE
>> @@ -187,6 +187,14 @@ retry:
>>                                 *msgid = CHECKER_MSGID_GHOST;
>>                                 return PATH_GHOST;
>>                         }
>> +                       if (asc == 0x04 && ascq == 0x0a) {
>> +                               /*
>> +                                * LOGICAL UNIT NOT ACCESSIBLE,
>> +                                * TARGET PORT IN TRANSITION STATE
>> +                                */
>> +                               *msgid = CHECKER_MSGID_PENDING;
>> +                               return PATH_PENDING;
>> +                       }
>>                 }
>>                 *msgid = CHECKER_MSGID_DOWN;
>>                 return PATH_DOWN;
>> 
>> This change also keeps the path from being failed by the checker.
>> It seems to go into and out of transitioning from other states.
>> 
>> Thanks,
>> Brian 
>>> 
>>>>> 
>>>>> The default transitioning timeout is 60s, and in my experience,
>>>>> even if
>>>>> the hardware overrides it, it's rarely more than a few minutes.
>>>>> After
>>>>> that, the kernel will set the state to STANDBY.
>>>> Yes. The case of a target keeping a path in the transitioning
>>>> state
>>>> Indefinitely is handled by the device handler.
>>>>> 
>>>>> Unless we're both overlooking something, I agree with you that
>>>>> PATH_PENDING is the right thing to do for TRANSITIONING. When a
>>>>> device
>>>>> is in transition between states, we _want_ to check it often to
>>>>> make
>>>>> sure we notice when the target state is reached.
>>>>> 
>>>>> We must then be careful not to overload PATH_PENDING with too
>>>>> many
>>>>> different meanings. But I don't see this as a big issue
>>>>> currently.
>>>>> 
>>>>> Regards
>>>>> Martin
>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> -Ben
>>>>>> 
>>>>>>> 
>>>>>>> Regards
>>>>>>> Martin
>> 
>> 
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition
  2023-03-09 22:11                       ` Brian Bunker
@ 2023-03-10 15:28                         ` Martin Wilck
  0 siblings, 0 replies; 16+ messages in thread
From: Martin Wilck @ 2023-03-10 15:28 UTC (permalink / raw)
  To: Brian Bunker; +Cc: dm-devel

On Thu, 2023-03-09 at 14:11 -0800, Brian Bunker wrote:
> 
> > On Mar 9, 2023, at 1:50 PM, Martin Wilck <mwilck@suse.com> wrote:
> > 
> > Brian,
> > 
> > On Thu, 2023-03-09 at 13:40 -0800, Brian Bunker wrote:
> > > 
> > > Martin,
> > > 
> > > Sorry I bounce between kernel versions a lot since most of the
> > > problems which find their way to us are released Linux versions
> > > whose kernels are quite a bit older than upstream.I got a chance
> > > to try the proposed solutions. The PATH_GHOST above does what
> > > I am looking for which is not to have the path checker fail the
> > > path.
> > > 
> > > This also works as your earlier comments suggest. This does seem
> > > clearer as to what is happening on the path:
> > > 
> > 
> > I apologize for being slow, but I don't quite understand this
> > response.
> > Have you tested Ben's patch set? Does it work for you? Is the patch
> > below meant to be applied on top of Ben's work?
> > 
> > Martin
> Martin,
> 
> This is just a patch I made instead of the original patch to verify
> using PATH_PENDING would work in our case, and it does. It has
> no relationship to anything Ben was doing.

Thanks for the clarification.
Please, if you have time, give Ben's set a test.

Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-03-10 15:29 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-21 20:56 [dm-devel] [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition Brian Bunker
2023-03-03  7:46 ` Martin Wilck
2023-03-03 17:18   ` Brian Bunker
2023-03-03 20:41     ` Martin Wilck
2023-03-04 20:49       ` Brian Bunker
2023-03-06 11:46         ` Martin Wilck
2023-03-06 19:04           ` Benjamin Marzinski
2023-03-07 10:31             ` Martin Wilck
2023-03-07 19:41               ` Brian Bunker
2023-03-07 21:15                 ` Martin Wilck
2023-03-09 21:40                   ` Brian Bunker
2023-03-09 21:50                     ` Martin Wilck
2023-03-09 22:11                       ` Brian Bunker
2023-03-10 15:28                         ` Martin Wilck
2023-03-07 10:05         ` Martin Wilck
2023-03-06 16:54 ` Benjamin Marzinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.