All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] NFSD: fix error handling in callbacks
@ 2021-03-09 14:41 Olga Kornievskaia
  2021-03-09 15:37 ` J. Bruce Fields
  0 siblings, 1 reply; 17+ messages in thread
From: Olga Kornievskaia @ 2021-03-09 14:41 UTC (permalink / raw)
  To: bfields, chuck.lever; +Cc: linux-nfs

From: Olga Kornievskaia <kolga@netapp.com>

When the server tries to do a callback and a client fails it due to
authentication problems, we need the server to set callback down
flag in RENEW so that client can recover.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfsd/nfs4callback.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 052be5bf9ef5..7325592b456e 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
 		switch (task->tk_status) {
 		case -EIO:
 		case -ETIMEDOUT:
+		case -EACCES:
 			nfsd4_mark_cb_down(clp, task->tk_status);
 		}
 		break;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-09 14:41 [PATCH 1/1] NFSD: fix error handling in callbacks Olga Kornievskaia
@ 2021-03-09 15:37 ` J. Bruce Fields
  2021-03-09 16:39   ` Olga Kornievskaia
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: J. Bruce Fields @ 2021-03-09 15:37 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: chuck.lever, linux-nfs

On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> When the server tries to do a callback and a client fails it due to
> authentication problems, we need the server to set callback down
> flag in RENEW so that client can recover.

I was looking at this.  It looks to me like this should really be just:

	case 1:
		if (task->tk_status)
			nfsd4_mark_cb_down(clp, task->tk_status);

If tk_status showed an error, and the ->done method doesn't return 0 to
tell us it something worth retrying, then the callback failed
permanently, so we should mark the callback path down, regardless of the
exact error.

--b.

> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfsd/nfs4callback.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 052be5bf9ef5..7325592b456e 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
>  		switch (task->tk_status) {
>  		case -EIO:
>  		case -ETIMEDOUT:
> +		case -EACCES:
>  			nfsd4_mark_cb_down(clp, task->tk_status);
>  		}
>  		break;
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-09 15:37 ` J. Bruce Fields
@ 2021-03-09 16:39   ` Olga Kornievskaia
  2021-03-09 16:46     ` Olga Kornievskaia
  2021-03-09 19:45   ` Scott Mayhew
  2021-03-09 20:22   ` Trond Myklebust
  2 siblings, 1 reply; 17+ messages in thread
From: Olga Kornievskaia @ 2021-03-09 16:39 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Chuck Lever, linux-nfs

On Tue, Mar 9, 2021 at 10:37 AM J. Bruce Fields <bfields@redhat.com> wrote:
>
> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > When the server tries to do a callback and a client fails it due to
> > authentication problems, we need the server to set callback down
> > flag in RENEW so that client can recover.
>
> I was looking at this.  It looks to me like this should really be just:
>
>         case 1:
>                 if (task->tk_status)
>                         nfsd4_mark_cb_down(clp, task->tk_status);
>
> If tk_status showed an error, and the ->done method doesn't return 0 to
> tell us it something worth retrying, then the callback failed
> permanently, so we should mark the callback path down, regardless of the
> exact error.

Ok. v2 coming (will change the title to make it 4.0 callback)

>
> --b.
>
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/nfs4callback.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 052be5bf9ef5..7325592b456e 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> >               switch (task->tk_status) {
> >               case -EIO:
> >               case -ETIMEDOUT:
> > +             case -EACCES:
> >                       nfsd4_mark_cb_down(clp, task->tk_status);
> >               }
> >               break;
> > --
> > 2.27.0
> >
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-09 16:39   ` Olga Kornievskaia
@ 2021-03-09 16:46     ` Olga Kornievskaia
  2021-03-09 17:42       ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Olga Kornievskaia @ 2021-03-09 16:46 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Chuck Lever, linux-nfs

On Tue, Mar 9, 2021 at 11:39 AM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Tue, Mar 9, 2021 at 10:37 AM J. Bruce Fields <bfields@redhat.com> wrote:
> >
> > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > >
> > > When the server tries to do a callback and a client fails it due to
> > > authentication problems, we need the server to set callback down
> > > flag in RENEW so that client can recover.
> >
> > I was looking at this.  It looks to me like this should really be just:
> >
> >         case 1:
> >                 if (task->tk_status)
> >                         nfsd4_mark_cb_down(clp, task->tk_status);
> >
> > If tk_status showed an error, and the ->done method doesn't return 0 to
> > tell us it something worth retrying, then the callback failed
> > permanently, so we should mark the callback path down, regardless of the
> > exact error.
>
> Ok. v2 coming (will change the title to make it 4.0 callback)

Sigh, I didn't change the wording of the commit and left the
authentication problem which is not accurate enough for this patch (as
say connection errors are also covered by this patch). Do you need me
to change the wording of the commit and send v3?

>
> >
> > --b.
> >
> > >
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > >  fs/nfsd/nfs4callback.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > index 052be5bf9ef5..7325592b456e 100644
> > > --- a/fs/nfsd/nfs4callback.c
> > > +++ b/fs/nfsd/nfs4callback.c
> > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> > >               switch (task->tk_status) {
> > >               case -EIO:
> > >               case -ETIMEDOUT:
> > > +             case -EACCES:
> > >                       nfsd4_mark_cb_down(clp, task->tk_status);
> > >               }
> > >               break;
> > > --
> > > 2.27.0
> > >
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-09 16:46     ` Olga Kornievskaia
@ 2021-03-09 17:42       ` Chuck Lever
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever @ 2021-03-09 17:42 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Bruce Fields, Linux NFS Mailing List



> On Mar 9, 2021, at 11:46 AM, Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote:
> 
> On Tue, Mar 9, 2021 at 11:39 AM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
>> 
>> On Tue, Mar 9, 2021 at 10:37 AM J. Bruce Fields <bfields@redhat.com> wrote:
>>> 
>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
>>>> From: Olga Kornievskaia <kolga@netapp.com>
>>>> 
>>>> When the server tries to do a callback and a client fails it due to
>>>> authentication problems, we need the server to set callback down
>>>> flag in RENEW so that client can recover.
>>> 
>>> I was looking at this.  It looks to me like this should really be just:
>>> 
>>>        case 1:
>>>                if (task->tk_status)
>>>                        nfsd4_mark_cb_down(clp, task->tk_status);
>>> 
>>> If tk_status showed an error, and the ->done method doesn't return 0 to
>>> tell us it something worth retrying, then the callback failed
>>> permanently, so we should mark the callback path down, regardless of the
>>> exact error.
>> 
>> Ok. v2 coming (will change the title to make it 4.0 callback)
> 
> Sigh, I didn't change the wording of the commit and left the
> authentication problem which is not accurate enough for this patch (as
> say connection errors are also covered by this patch). Do you need me
> to change the wording of the commit and send v3?

Yes, please post a v3. Thanks.


>>> --b.
>>> 
>>>> 
>>>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>>>> ---
>>>> fs/nfsd/nfs4callback.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>> 
>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>> index 052be5bf9ef5..7325592b456e 100644
>>>> --- a/fs/nfsd/nfs4callback.c
>>>> +++ b/fs/nfsd/nfs4callback.c
>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
>>>>              switch (task->tk_status) {
>>>>              case -EIO:
>>>>              case -ETIMEDOUT:
>>>> +             case -EACCES:
>>>>                      nfsd4_mark_cb_down(clp, task->tk_status);
>>>>              }
>>>>              break;
>>>> --
>>>> 2.27.0

--
Chuck Lever




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-09 15:37 ` J. Bruce Fields
  2021-03-09 16:39   ` Olga Kornievskaia
@ 2021-03-09 19:45   ` Scott Mayhew
  2021-03-09 20:12     ` J. Bruce Fields
  2021-03-09 20:22   ` Trond Myklebust
  2 siblings, 1 reply; 17+ messages in thread
From: Scott Mayhew @ 2021-03-09 19:45 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Olga Kornievskaia, chuck.lever, linux-nfs

On Tue, 09 Mar 2021, J. Bruce Fields wrote:

> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> > 
> > When the server tries to do a callback and a client fails it due to
> > authentication problems, we need the server to set callback down
> > flag in RENEW so that client can recover.
> 
> I was looking at this.  It looks to me like this should really be just:
> 
> 	case 1:
> 		if (task->tk_status)
> 			nfsd4_mark_cb_down(clp, task->tk_status);
> 
> If tk_status showed an error, and the ->done method doesn't return 0 to
> tell us it something worth retrying, then the callback failed
> permanently, so we should mark the callback path down, regardless of the
> exact error.

That switch was added because the server was erroneously setting
SEQ4_STATUS_CB_PATH_DOWN in the event that a client returned an
NFS-level error for a CB_RECALL that the client had already done a
FREE_STATEID for.  Removing the switch will cause the server to go back
to that behaviour, won't it?

-Scott
> 
> --b.
> 
> > 
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/nfs4callback.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 052be5bf9ef5..7325592b456e 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> >  		switch (task->tk_status) {
> >  		case -EIO:
> >  		case -ETIMEDOUT:
> > +		case -EACCES:
> >  			nfsd4_mark_cb_down(clp, task->tk_status);
> >  		}
> >  		break;
> > -- 
> > 2.27.0
> > 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-09 19:45   ` Scott Mayhew
@ 2021-03-09 20:12     ` J. Bruce Fields
  2021-03-09 20:40       ` Scott Mayhew
  0 siblings, 1 reply; 17+ messages in thread
From: J. Bruce Fields @ 2021-03-09 20:12 UTC (permalink / raw)
  To: Scott Mayhew; +Cc: Olga Kornievskaia, chuck.lever, linux-nfs

On Tue, Mar 09, 2021 at 02:45:19PM -0500, Scott Mayhew wrote:
> On Tue, 09 Mar 2021, J. Bruce Fields wrote:
> 
> > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > > 
> > > When the server tries to do a callback and a client fails it due to
> > > authentication problems, we need the server to set callback down
> > > flag in RENEW so that client can recover.
> > 
> > I was looking at this.  It looks to me like this should really be just:
> > 
> > 	case 1:
> > 		if (task->tk_status)
> > 			nfsd4_mark_cb_down(clp, task->tk_status);
> > 
> > If tk_status showed an error, and the ->done method doesn't return 0 to
> > tell us it something worth retrying, then the callback failed
> > permanently, so we should mark the callback path down, regardless of the
> > exact error.
> 
> That switch was added because the server was erroneously setting
> SEQ4_STATUS_CB_PATH_DOWN in the event that a client returned an
> NFS-level error for a CB_RECALL that the client had already done a
> FREE_STATEID for.  Removing the switch will cause the server to go back
> to that behaviour, won't it?

Oh, thanks for the history.

My knee-jerk reaction is: that sounds like a recall-specific issue, so
maybe that logic should be in nfsd4_cb_recall_done?

I guess I'm OK continuing instead to enumerate transport-layer errors in
nfsd4_cb_done, but do we know that EACCES makes it a complete list?

--b.

> 
> -Scott
> > 
> > --b.
> > 
> > > 
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > >  fs/nfsd/nfs4callback.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > index 052be5bf9ef5..7325592b456e 100644
> > > --- a/fs/nfsd/nfs4callback.c
> > > +++ b/fs/nfsd/nfs4callback.c
> > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> > >  		switch (task->tk_status) {
> > >  		case -EIO:
> > >  		case -ETIMEDOUT:
> > > +		case -EACCES:
> > >  			nfsd4_mark_cb_down(clp, task->tk_status);
> > >  		}
> > >  		break;
> > > -- 
> > > 2.27.0
> > > 
> > 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-09 15:37 ` J. Bruce Fields
  2021-03-09 16:39   ` Olga Kornievskaia
  2021-03-09 19:45   ` Scott Mayhew
@ 2021-03-09 20:22   ` Trond Myklebust
  2021-03-09 20:41     ` Olga Kornievskaia
  2 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2021-03-09 20:22 UTC (permalink / raw)
  To: bfields, olga.kornievskaia; +Cc: linux-nfs, chuck.lever

On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> > 
> > When the server tries to do a callback and a client fails it due to
> > authentication problems, we need the server to set callback down
> > flag in RENEW so that client can recover.
> 
> I was looking at this.  It looks to me like this should really be
> just:
> 
>         case 1:
>                 if (task->tk_status)
>                         nfsd4_mark_cb_down(clp, task->tk_status);
> 
> If tk_status showed an error, and the ->done method doesn't return 0
> to
> tell us it something worth retrying, then the callback failed
> permanently, so we should mark the callback path down, regardless of
> the
> exact error.

I disagree. task->tk_status could be an unhandled NFSv4 error (see
nfsd4_cb_recall_done()). The client might, for instance, be in the
process of returning the delegation being recalled. Why should that
result in the callback channel being marked as down?

> 
> --b.
> 
> > 
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfsd/nfs4callback.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 052be5bf9ef5..7325592b456e 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > *task, void *calldata)
> >                 switch (task->tk_status) {
> >                 case -EIO:
> >                 case -ETIMEDOUT:
> > +               case -EACCES:
> >                         nfsd4_mark_cb_down(clp, task->tk_status);
> >                 }
> >                 break;
> > -- 
> > 2.27.0
> > 
> 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-09 20:12     ` J. Bruce Fields
@ 2021-03-09 20:40       ` Scott Mayhew
  0 siblings, 0 replies; 17+ messages in thread
From: Scott Mayhew @ 2021-03-09 20:40 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Olga Kornievskaia, chuck.lever, linux-nfs

On Tue, 09 Mar 2021, J. Bruce Fields wrote:

> On Tue, Mar 09, 2021 at 02:45:19PM -0500, Scott Mayhew wrote:
> > On Tue, 09 Mar 2021, J. Bruce Fields wrote:
> > 
> > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > 
> > > > When the server tries to do a callback and a client fails it due to
> > > > authentication problems, we need the server to set callback down
> > > > flag in RENEW so that client can recover.
> > > 
> > > I was looking at this.  It looks to me like this should really be just:
> > > 
> > > 	case 1:
> > > 		if (task->tk_status)
> > > 			nfsd4_mark_cb_down(clp, task->tk_status);
> > > 
> > > If tk_status showed an error, and the ->done method doesn't return 0 to
> > > tell us it something worth retrying, then the callback failed
> > > permanently, so we should mark the callback path down, regardless of the
> > > exact error.
> > 
> > That switch was added because the server was erroneously setting
> > SEQ4_STATUS_CB_PATH_DOWN in the event that a client returned an
> > NFS-level error for a CB_RECALL that the client had already done a
> > FREE_STATEID for.  Removing the switch will cause the server to go back
> > to that behaviour, won't it?
> 
> Oh, thanks for the history.
> 
> My knee-jerk reaction is: that sounds like a recall-specific issue, so
> maybe that logic should be in nfsd4_cb_recall_done?
> 
> I guess I'm OK continuing instead to enumerate transport-layer errors in
> nfsd4_cb_done, but do we know that EACCES makes it a complete list?

No idea.  I'm pretty sure EIO and ETIMEDOUT were the two errors that I
was seeing at the time.  I don't remember the exact test case but I
think I was hammering the server trying to reproduce seq misordered &
false retry errors.  And EACCES is what Olga's seeing when the server
uses the wrong principal for the callback cred.

-Scott
> 
> --b.
> 
> > 
> > -Scott
> > > 
> > > --b.
> > > 
> > > > 
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > >  fs/nfsd/nfs4callback.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > index 052be5bf9ef5..7325592b456e 100644
> > > > --- a/fs/nfsd/nfs4callback.c
> > > > +++ b/fs/nfsd/nfs4callback.c
> > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata)
> > > >  		switch (task->tk_status) {
> > > >  		case -EIO:
> > > >  		case -ETIMEDOUT:
> > > > +		case -EACCES:
> > > >  			nfsd4_mark_cb_down(clp, task->tk_status);
> > > >  		}
> > > >  		break;
> > > > -- 
> > > > 2.27.0
> > > > 
> > > 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-09 20:22   ` Trond Myklebust
@ 2021-03-09 20:41     ` Olga Kornievskaia
  2021-03-09 20:59       ` Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Olga Kornievskaia @ 2021-03-09 20:41 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: bfields, linux-nfs, chuck.lever

On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > >
> > > When the server tries to do a callback and a client fails it due to
> > > authentication problems, we need the server to set callback down
> > > flag in RENEW so that client can recover.
> >
> > I was looking at this.  It looks to me like this should really be
> > just:
> >
> >         case 1:
> >                 if (task->tk_status)
> >                         nfsd4_mark_cb_down(clp, task->tk_status);
> >
> > If tk_status showed an error, and the ->done method doesn't return 0
> > to
> > tell us it something worth retrying, then the callback failed
> > permanently, so we should mark the callback path down, regardless of
> > the
> > exact error.
>
> I disagree. task->tk_status could be an unhandled NFSv4 error (see
> nfsd4_cb_recall_done()). The client might, for instance, be in the
> process of returning the delegation being recalled. Why should that
> result in the callback channel being marked as down?
>

Are you talking about say the connection going down and server should
just reconnect instead of recovering the callback channel. I assumed
that connection break is something that's not  recoverable by the
callback but perhaps I'm wrong.

> >
> > --b.
> >
> > >
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > >  fs/nfsd/nfs4callback.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > index 052be5bf9ef5..7325592b456e 100644
> > > --- a/fs/nfsd/nfs4callback.c
> > > +++ b/fs/nfsd/nfs4callback.c
> > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > *task, void *calldata)
> > >                 switch (task->tk_status) {
> > >                 case -EIO:
> > >                 case -ETIMEDOUT:
> > > +               case -EACCES:
> > >                         nfsd4_mark_cb_down(clp, task->tk_status);
> > >                 }
> > >                 break;
> > > --
> > > 2.27.0
> > >
> >
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-09 20:41     ` Olga Kornievskaia
@ 2021-03-09 20:59       ` Trond Myklebust
  2021-03-10 14:47         ` J. Bruce Fields
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2021-03-09 20:59 UTC (permalink / raw)
  To: olga.kornievskaia; +Cc: linux-nfs, bfields, chuck.lever

On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> trondmy@hammerspace.com> wrote:
> > 
> > On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> > > wrote:
> > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > 
> > > > When the server tries to do a callback and a client fails it
> > > > due to
> > > > authentication problems, we need the server to set callback
> > > > down
> > > > flag in RENEW so that client can recover.
> > > 
> > > I was looking at this.  It looks to me like this should really be
> > > just:
> > > 
> > >         case 1:
> > >                 if (task->tk_status)
> > >                         nfsd4_mark_cb_down(clp, task->tk_status);
> > > 
> > > If tk_status showed an error, and the ->done method doesn't
> > > return 0
> > > to
> > > tell us it something worth retrying, then the callback failed
> > > permanently, so we should mark the callback path down, regardless
> > > of
> > > the
> > > exact error.
> > 
> > I disagree. task->tk_status could be an unhandled NFSv4 error (see
> > nfsd4_cb_recall_done()). The client might, for instance, be in the
> > process of returning the delegation being recalled. Why should that
> > result in the callback channel being marked as down?
> > 
> 
> Are you talking about say the connection going down and server should
> just reconnect instead of recovering the callback channel. I assumed
> that connection break is something that's not  recoverable by the
> callback but perhaps I'm wrong.

No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
not seeing why either of those errors should be handled by marking the
callback channel as being down.

Looking further, it seems that the same function will also return '1'
without checking the value of task->tk_status if the delegation has
been revoked or returned. So that would mean that even NFS4ERR_DELAY
could trigger the call to nfsd4_mark_cb_down() with the above change.

> 
> > > 
> > > --b.
> > > 
> > > > 
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > >  fs/nfsd/nfs4callback.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > index 052be5bf9ef5..7325592b456e 100644
> > > > --- a/fs/nfsd/nfs4callback.c
> > > > +++ b/fs/nfsd/nfs4callback.c
> > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > > *task, void *calldata)
> > > >                 switch (task->tk_status) {
> > > >                 case -EIO:
> > > >                 case -ETIMEDOUT:
> > > > +               case -EACCES:
> > > >                         nfsd4_mark_cb_down(clp, task-
> > > > >tk_status);
> > > >                 }
> > > >                 break;
> > > > --
> > > > 2.27.0
> > > > 
> > > 
> > 
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
> > 
> > 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-09 20:59       ` Trond Myklebust
@ 2021-03-10 14:47         ` J. Bruce Fields
  2021-03-10 22:09           ` Olga Kornievskaia
  0 siblings, 1 reply; 17+ messages in thread
From: J. Bruce Fields @ 2021-03-10 14:47 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: olga.kornievskaia, linux-nfs, chuck.lever

On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> > On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> > trondmy@hammerspace.com> wrote:
> > > 
> > > On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> > > > wrote:
> > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > > 
> > > > > When the server tries to do a callback and a client fails it
> > > > > due to
> > > > > authentication problems, we need the server to set callback
> > > > > down
> > > > > flag in RENEW so that client can recover.
> > > > 
> > > > I was looking at this.  It looks to me like this should really be
> > > > just:
> > > > 
> > > >         case 1:
> > > >                 if (task->tk_status)
> > > >                         nfsd4_mark_cb_down(clp, task->tk_status);
> > > > 
> > > > If tk_status showed an error, and the ->done method doesn't
> > > > return 0
> > > > to
> > > > tell us it something worth retrying, then the callback failed
> > > > permanently, so we should mark the callback path down, regardless
> > > > of
> > > > the
> > > > exact error.
> > > 
> > > I disagree. task->tk_status could be an unhandled NFSv4 error (see
> > > nfsd4_cb_recall_done()). The client might, for instance, be in the
> > > process of returning the delegation being recalled. Why should that
> > > result in the callback channel being marked as down?
> > > 
> > 
> > Are you talking about say the connection going down and server should
> > just reconnect instead of recovering the callback channel. I assumed
> > that connection break is something that's not  recoverable by the
> > callback but perhaps I'm wrong.
> 
> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
> not seeing why either of those errors should be handled by marking the
> callback channel as being down.
> 
> Looking further, it seems that the same function will also return '1'
> without checking the value of task->tk_status if the delegation has
> been revoked or returned. So that would mean that even NFS4ERR_DELAY
> could trigger the call to nfsd4_mark_cb_down() with the above change.

Yeah, OK, that's wrong, apologies.

I'm just a little worried about the attempt to enumerate transport level
errors in nfsd4_cb_done().  Are we sure that EIO, ETIMEDOUT, EACCESS is
the right list?

--b.

> 
> > 
> > > > 
> > > > --b.
> > > > 
> > > > > 
> > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > ---
> > > > >  fs/nfsd/nfs4callback.c | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > > index 052be5bf9ef5..7325592b456e 100644
> > > > > --- a/fs/nfsd/nfs4callback.c
> > > > > +++ b/fs/nfsd/nfs4callback.c
> > > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > > > *task, void *calldata)
> > > > >                 switch (task->tk_status) {
> > > > >                 case -EIO:
> > > > >                 case -ETIMEDOUT:
> > > > > +               case -EACCES:
> > > > >                         nfsd4_mark_cb_down(clp, task-
> > > > > >tk_status);
> > > > >                 }
> > > > >                 break;
> > > > > --
> > > > > 2.27.0
> > > > > 
> > > > 
> > > 
> > > --
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > trond.myklebust@hammerspace.com
> > > 
> > > 
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-10 14:47         ` J. Bruce Fields
@ 2021-03-10 22:09           ` Olga Kornievskaia
  2021-03-11 14:58             ` J. Bruce Fields
  2021-03-11 15:10             ` Chuck Lever III
  0 siblings, 2 replies; 17+ messages in thread
From: Olga Kornievskaia @ 2021-03-10 22:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Trond Myklebust, linux-nfs, chuck.lever

On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <bfields@redhat.com> wrote:
>
> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
> > On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> > > On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> > > trondmy@hammerspace.com> wrote:
> > > >
> > > > On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > > > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> > > > > wrote:
> > > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > > >
> > > > > > When the server tries to do a callback and a client fails it
> > > > > > due to
> > > > > > authentication problems, we need the server to set callback
> > > > > > down
> > > > > > flag in RENEW so that client can recover.
> > > > >
> > > > > I was looking at this.  It looks to me like this should really be
> > > > > just:
> > > > >
> > > > >         case 1:
> > > > >                 if (task->tk_status)
> > > > >                         nfsd4_mark_cb_down(clp, task->tk_status);
> > > > >
> > > > > If tk_status showed an error, and the ->done method doesn't
> > > > > return 0
> > > > > to
> > > > > tell us it something worth retrying, then the callback failed
> > > > > permanently, so we should mark the callback path down, regardless
> > > > > of
> > > > > the
> > > > > exact error.
> > > >
> > > > I disagree. task->tk_status could be an unhandled NFSv4 error (see
> > > > nfsd4_cb_recall_done()). The client might, for instance, be in the
> > > > process of returning the delegation being recalled. Why should that
> > > > result in the callback channel being marked as down?
> > > >
> > >
> > > Are you talking about say the connection going down and server should
> > > just reconnect instead of recovering the callback channel. I assumed
> > > that connection break is something that's not  recoverable by the
> > > callback but perhaps I'm wrong.
> >
> > No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
> > for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
> > not seeing why either of those errors should be handled by marking the
> > callback channel as being down.
> >
> > Looking further, it seems that the same function will also return '1'
> > without checking the value of task->tk_status if the delegation has
> > been revoked or returned. So that would mean that even NFS4ERR_DELAY
> > could trigger the call to nfsd4_mark_cb_down() with the above change.
>
> Yeah, OK, that's wrong, apologies.
>
> I'm just a little worried about the attempt to enumerate transport level
> errors in nfsd4_cb_done().  Are we sure that EIO, ETIMEDOUT, EACCESS is
> the right list?

Looking at call_transmit_status error handling, I don't think
connection errors are returned. Instead the code tries to fix the
connection by retrying unless the rpc_timeout is reached and then only
EIO,TIMEDOUT is returned.

Can then my original patch be considered without resubmission?

>
> --b.
>
> >
> > >
> > > > >
> > > > > --b.
> > > > >
> > > > > >
> > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > > ---
> > > > > >  fs/nfsd/nfs4callback.c | 1 +
> > > > > >  1 file changed, 1 insertion(+)
> > > > > >
> > > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > > > index 052be5bf9ef5..7325592b456e 100644
> > > > > > --- a/fs/nfsd/nfs4callback.c
> > > > > > +++ b/fs/nfsd/nfs4callback.c
> > > > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > > > > *task, void *calldata)
> > > > > >                 switch (task->tk_status) {
> > > > > >                 case -EIO:
> > > > > >                 case -ETIMEDOUT:
> > > > > > +               case -EACCES:
> > > > > >                         nfsd4_mark_cb_down(clp, task-
> > > > > > >tk_status);
> > > > > >                 }
> > > > > >                 break;
> > > > > > --
> > > > > > 2.27.0
> > > > > >
> > > > >
> > > >
> > > > --
> > > > Trond Myklebust
> > > > Linux NFS client maintainer, Hammerspace
> > > > trond.myklebust@hammerspace.com
> > > >
> > > >
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
> >
> >
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-10 22:09           ` Olga Kornievskaia
@ 2021-03-11 14:58             ` J. Bruce Fields
  2021-03-11 15:10             ` Chuck Lever III
  1 sibling, 0 replies; 17+ messages in thread
From: J. Bruce Fields @ 2021-03-11 14:58 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Trond Myklebust, linux-nfs, chuck.lever

On Wed, Mar 10, 2021 at 05:09:20PM -0500, Olga Kornievskaia wrote:
> On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <bfields@redhat.com> wrote:
> >
> > On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
> > > On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> > > > On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> > > > trondmy@hammerspace.com> wrote:
> > > > >
> > > > > On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> > > > > > On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> > > > > > wrote:
> > > > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > > > >
> > > > > > > When the server tries to do a callback and a client fails it
> > > > > > > due to
> > > > > > > authentication problems, we need the server to set callback
> > > > > > > down
> > > > > > > flag in RENEW so that client can recover.
> > > > > >
> > > > > > I was looking at this.  It looks to me like this should really be
> > > > > > just:
> > > > > >
> > > > > >         case 1:
> > > > > >                 if (task->tk_status)
> > > > > >                         nfsd4_mark_cb_down(clp, task->tk_status);
> > > > > >
> > > > > > If tk_status showed an error, and the ->done method doesn't
> > > > > > return 0
> > > > > > to
> > > > > > tell us it something worth retrying, then the callback failed
> > > > > > permanently, so we should mark the callback path down, regardless
> > > > > > of
> > > > > > the
> > > > > > exact error.
> > > > >
> > > > > I disagree. task->tk_status could be an unhandled NFSv4 error (see
> > > > > nfsd4_cb_recall_done()). The client might, for instance, be in the
> > > > > process of returning the delegation being recalled. Why should that
> > > > > result in the callback channel being marked as down?
> > > > >
> > > >
> > > > Are you talking about say the connection going down and server should
> > > > just reconnect instead of recovering the callback channel. I assumed
> > > > that connection break is something that's not  recoverable by the
> > > > callback but perhaps I'm wrong.
> > >
> > > No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
> > > for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
> > > not seeing why either of those errors should be handled by marking the
> > > callback channel as being down.
> > >
> > > Looking further, it seems that the same function will also return '1'
> > > without checking the value of task->tk_status if the delegation has
> > > been revoked or returned. So that would mean that even NFS4ERR_DELAY
> > > could trigger the call to nfsd4_mark_cb_down() with the above change.
> >
> > Yeah, OK, that's wrong, apologies.
> >
> > I'm just a little worried about the attempt to enumerate transport level
> > errors in nfsd4_cb_done().  Are we sure that EIO, ETIMEDOUT, EACCESS is
> > the right list?
> 
> Looking at call_transmit_status error handling, I don't think
> connection errors are returned. Instead the code tries to fix the
> connection by retrying unless the rpc_timeout is reached and then only
> EIO,TIMEDOUT is returned.
> 
> Can then my original patch be considered without resubmission?

Sure, thanks for checking that.

--b.

> 
> >
> > --b.
> >
> > >
> > > >
> > > > > >
> > > > > > --b.
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > > > ---
> > > > > > >  fs/nfsd/nfs4callback.c | 1 +
> > > > > > >  1 file changed, 1 insertion(+)
> > > > > > >
> > > > > > > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > > > > > > index 052be5bf9ef5..7325592b456e 100644
> > > > > > > --- a/fs/nfsd/nfs4callback.c
> > > > > > > +++ b/fs/nfsd/nfs4callback.c
> > > > > > > @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> > > > > > > *task, void *calldata)
> > > > > > >                 switch (task->tk_status) {
> > > > > > >                 case -EIO:
> > > > > > >                 case -ETIMEDOUT:
> > > > > > > +               case -EACCES:
> > > > > > >                         nfsd4_mark_cb_down(clp, task-
> > > > > > > >tk_status);
> > > > > > >                 }
> > > > > > >                 break;
> > > > > > > --
> > > > > > > 2.27.0
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Trond Myklebust
> > > > > Linux NFS client maintainer, Hammerspace
> > > > > trond.myklebust@hammerspace.com
> > > > >
> > > > >
> > >
> > > --
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > trond.myklebust@hammerspace.com
> > >
> > >
> >
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-10 22:09           ` Olga Kornievskaia
  2021-03-11 14:58             ` J. Bruce Fields
@ 2021-03-11 15:10             ` Chuck Lever III
  2021-03-11 15:16               ` Olga Kornievskaia
  1 sibling, 1 reply; 17+ messages in thread
From: Chuck Lever III @ 2021-03-11 15:10 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Bruce Fields, Trond Myklebust, Linux NFS Mailing List



> On Mar 10, 2021, at 5:09 PM, Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote:
> 
> On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <bfields@redhat.com> wrote:
>> 
>> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
>>> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
>>>> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
>>>> trondmy@hammerspace.com> wrote:
>>>>> 
>>>>> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
>>>>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
>>>>>> wrote:
>>>>>>> From: Olga Kornievskaia <kolga@netapp.com>
>>>>>>> 
>>>>>>> When the server tries to do a callback and a client fails it
>>>>>>> due to
>>>>>>> authentication problems, we need the server to set callback
>>>>>>> down
>>>>>>> flag in RENEW so that client can recover.
>>>>>> 
>>>>>> I was looking at this.  It looks to me like this should really be
>>>>>> just:
>>>>>> 
>>>>>>        case 1:
>>>>>>                if (task->tk_status)
>>>>>>                        nfsd4_mark_cb_down(clp, task->tk_status);
>>>>>> 
>>>>>> If tk_status showed an error, and the ->done method doesn't
>>>>>> return 0
>>>>>> to
>>>>>> tell us it something worth retrying, then the callback failed
>>>>>> permanently, so we should mark the callback path down, regardless
>>>>>> of
>>>>>> the
>>>>>> exact error.
>>>>> 
>>>>> I disagree. task->tk_status could be an unhandled NFSv4 error (see
>>>>> nfsd4_cb_recall_done()). The client might, for instance, be in the
>>>>> process of returning the delegation being recalled. Why should that
>>>>> result in the callback channel being marked as down?
>>>>> 
>>>> 
>>>> Are you talking about say the connection going down and server should
>>>> just reconnect instead of recovering the callback channel. I assumed
>>>> that connection break is something that's not  recoverable by the
>>>> callback but perhaps I'm wrong.
>>> 
>>> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
>>> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
>>> not seeing why either of those errors should be handled by marking the
>>> callback channel as being down.
>>> 
>>> Looking further, it seems that the same function will also return '1'
>>> without checking the value of task->tk_status if the delegation has
>>> been revoked or returned. So that would mean that even NFS4ERR_DELAY
>>> could trigger the call to nfsd4_mark_cb_down() with the above change.
>> 
>> Yeah, OK, that's wrong, apologies.
>> 
>> I'm just a little worried about the attempt to enumerate transport level
>> errors in nfsd4_cb_done().  Are we sure that EIO, ETIMEDOUT, EACCESS is
>> the right list?
> 
> Looking at call_transmit_status error handling, I don't think
> connection errors are returned. Instead the code tries to fix the
> connection by retrying unless the rpc_timeout is reached and then only
> EIO,TIMEDOUT is returned.
> 
> Can then my original patch be considered without resubmission?

Bruce has authorized v1 of this patch, but that one has the
uncorrected patch description. Post a v4?



>> --b.
>> 
>>> 
>>>> 
>>>>>> 
>>>>>> --b.
>>>>>> 
>>>>>>> 
>>>>>>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>>>>>>> ---
>>>>>>> fs/nfsd/nfs4callback.c | 1 +
>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>> 
>>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>>>> index 052be5bf9ef5..7325592b456e 100644
>>>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
>>>>>>> *task, void *calldata)
>>>>>>>                switch (task->tk_status) {
>>>>>>>                case -EIO:
>>>>>>>                case -ETIMEDOUT:
>>>>>>> +               case -EACCES:
>>>>>>>                        nfsd4_mark_cb_down(clp, task-
>>>>>>>> tk_status);
>>>>>>>                }
>>>>>>>                break;
>>>>>>> --
>>>>>>> 2.27.0
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Trond Myklebust
>>>>> Linux NFS client maintainer, Hammerspace
>>>>> trond.myklebust@hammerspace.com
>>>>> 
>>>>> 
>>> 
>>> --
>>> Trond Myklebust
>>> Linux NFS client maintainer, Hammerspace
>>> trond.myklebust@hammerspace.com

--
Chuck Lever




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-11 15:10             ` Chuck Lever III
@ 2021-03-11 15:16               ` Olga Kornievskaia
  2021-03-11 15:18                 ` Chuck Lever III
  0 siblings, 1 reply; 17+ messages in thread
From: Olga Kornievskaia @ 2021-03-11 15:16 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Bruce Fields, Trond Myklebust, Linux NFS Mailing List

On Thu, Mar 11, 2021 at 10:10 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Mar 10, 2021, at 5:09 PM, Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote:
> >
> > On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <bfields@redhat.com> wrote:
> >>
> >> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
> >>> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
> >>>> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
> >>>> trondmy@hammerspace.com> wrote:
> >>>>>
> >>>>> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
> >>>>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
> >>>>>> wrote:
> >>>>>>> From: Olga Kornievskaia <kolga@netapp.com>
> >>>>>>>
> >>>>>>> When the server tries to do a callback and a client fails it
> >>>>>>> due to
> >>>>>>> authentication problems, we need the server to set callback
> >>>>>>> down
> >>>>>>> flag in RENEW so that client can recover.
> >>>>>>
> >>>>>> I was looking at this.  It looks to me like this should really be
> >>>>>> just:
> >>>>>>
> >>>>>>        case 1:
> >>>>>>                if (task->tk_status)
> >>>>>>                        nfsd4_mark_cb_down(clp, task->tk_status);
> >>>>>>
> >>>>>> If tk_status showed an error, and the ->done method doesn't
> >>>>>> return 0
> >>>>>> to
> >>>>>> tell us it something worth retrying, then the callback failed
> >>>>>> permanently, so we should mark the callback path down, regardless
> >>>>>> of
> >>>>>> the
> >>>>>> exact error.
> >>>>>
> >>>>> I disagree. task->tk_status could be an unhandled NFSv4 error (see
> >>>>> nfsd4_cb_recall_done()). The client might, for instance, be in the
> >>>>> process of returning the delegation being recalled. Why should that
> >>>>> result in the callback channel being marked as down?
> >>>>>
> >>>>
> >>>> Are you talking about say the connection going down and server should
> >>>> just reconnect instead of recovering the callback channel. I assumed
> >>>> that connection break is something that's not  recoverable by the
> >>>> callback but perhaps I'm wrong.
> >>>
> >>> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
> >>> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
> >>> not seeing why either of those errors should be handled by marking the
> >>> callback channel as being down.
> >>>
> >>> Looking further, it seems that the same function will also return '1'
> >>> without checking the value of task->tk_status if the delegation has
> >>> been revoked or returned. So that would mean that even NFS4ERR_DELAY
> >>> could trigger the call to nfsd4_mark_cb_down() with the above change.
> >>
> >> Yeah, OK, that's wrong, apologies.
> >>
> >> I'm just a little worried about the attempt to enumerate transport level
> >> errors in nfsd4_cb_done().  Are we sure that EIO, ETIMEDOUT, EACCESS is
> >> the right list?
> >
> > Looking at call_transmit_status error handling, I don't think
> > connection errors are returned. Instead the code tries to fix the
> > connection by retrying unless the rpc_timeout is reached and then only
> > EIO,TIMEDOUT is returned.
> >
> > Can then my original patch be considered without resubmission?
>
> Bruce has authorized v1 of this patch, but that one has the
> uncorrected patch description. Post a v4?

v1's description is accurate. It reflects that only authentication
errors are handled.

>
>
>
> >> --b.
> >>
> >>>
> >>>>
> >>>>>>
> >>>>>> --b.
> >>>>>>
> >>>>>>>
> >>>>>>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> >>>>>>> ---
> >>>>>>> fs/nfsd/nfs4callback.c | 1 +
> >>>>>>> 1 file changed, 1 insertion(+)
> >>>>>>>
> >>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> >>>>>>> index 052be5bf9ef5..7325592b456e 100644
> >>>>>>> --- a/fs/nfsd/nfs4callback.c
> >>>>>>> +++ b/fs/nfsd/nfs4callback.c
> >>>>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
> >>>>>>> *task, void *calldata)
> >>>>>>>                switch (task->tk_status) {
> >>>>>>>                case -EIO:
> >>>>>>>                case -ETIMEDOUT:
> >>>>>>> +               case -EACCES:
> >>>>>>>                        nfsd4_mark_cb_down(clp, task-
> >>>>>>>> tk_status);
> >>>>>>>                }
> >>>>>>>                break;
> >>>>>>> --
> >>>>>>> 2.27.0
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Trond Myklebust
> >>>>> Linux NFS client maintainer, Hammerspace
> >>>>> trond.myklebust@hammerspace.com
> >>>>>
> >>>>>
> >>>
> >>> --
> >>> Trond Myklebust
> >>> Linux NFS client maintainer, Hammerspace
> >>> trond.myklebust@hammerspace.com
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] NFSD: fix error handling in callbacks
  2021-03-11 15:16               ` Olga Kornievskaia
@ 2021-03-11 15:18                 ` Chuck Lever III
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever III @ 2021-03-11 15:18 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Bruce Fields, Trond Myklebust, Linux NFS Mailing List



> On Mar 11, 2021, at 10:16 AM, Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote:
> 
> On Thu, Mar 11, 2021 at 10:10 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>> 
>> 
>> 
>>> On Mar 10, 2021, at 5:09 PM, Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote:
>>> 
>>> On Wed, Mar 10, 2021 at 9:47 AM J. Bruce Fields <bfields@redhat.com> wrote:
>>>> 
>>>> On Tue, Mar 09, 2021 at 08:59:51PM +0000, Trond Myklebust wrote:
>>>>> On Tue, 2021-03-09 at 15:41 -0500, Olga Kornievskaia wrote:
>>>>>> On Tue, Mar 9, 2021 at 3:22 PM Trond Myklebust <
>>>>>> trondmy@hammerspace.com> wrote:
>>>>>>> 
>>>>>>> On Tue, 2021-03-09 at 10:37 -0500, J. Bruce Fields wrote:
>>>>>>>> On Tue, Mar 09, 2021 at 09:41:27AM -0500, Olga Kornievskaia
>>>>>>>> wrote:
>>>>>>>>> From: Olga Kornievskaia <kolga@netapp.com>
>>>>>>>>> 
>>>>>>>>> When the server tries to do a callback and a client fails it
>>>>>>>>> due to
>>>>>>>>> authentication problems, we need the server to set callback
>>>>>>>>> down
>>>>>>>>> flag in RENEW so that client can recover.
>>>>>>>> 
>>>>>>>> I was looking at this.  It looks to me like this should really be
>>>>>>>> just:
>>>>>>>> 
>>>>>>>>       case 1:
>>>>>>>>               if (task->tk_status)
>>>>>>>>                       nfsd4_mark_cb_down(clp, task->tk_status);
>>>>>>>> 
>>>>>>>> If tk_status showed an error, and the ->done method doesn't
>>>>>>>> return 0
>>>>>>>> to
>>>>>>>> tell us it something worth retrying, then the callback failed
>>>>>>>> permanently, so we should mark the callback path down, regardless
>>>>>>>> of
>>>>>>>> the
>>>>>>>> exact error.
>>>>>>> 
>>>>>>> I disagree. task->tk_status could be an unhandled NFSv4 error (see
>>>>>>> nfsd4_cb_recall_done()). The client might, for instance, be in the
>>>>>>> process of returning the delegation being recalled. Why should that
>>>>>>> result in the callback channel being marked as down?
>>>>>>> 
>>>>>> 
>>>>>> Are you talking about say the connection going down and server should
>>>>>> just reconnect instead of recovering the callback channel. I assumed
>>>>>> that connection break is something that's not  recoverable by the
>>>>>> callback but perhaps I'm wrong.
>>>>> 
>>>>> No. I'm saying that nfsd4_cb_recall_done() will return a value of '1'
>>>>> for both task->tk_status == -EBADHANDLE and -NFS4ERR_BAD_STATEID. I'm
>>>>> not seeing why either of those errors should be handled by marking the
>>>>> callback channel as being down.
>>>>> 
>>>>> Looking further, it seems that the same function will also return '1'
>>>>> without checking the value of task->tk_status if the delegation has
>>>>> been revoked or returned. So that would mean that even NFS4ERR_DELAY
>>>>> could trigger the call to nfsd4_mark_cb_down() with the above change.
>>>> 
>>>> Yeah, OK, that's wrong, apologies.
>>>> 
>>>> I'm just a little worried about the attempt to enumerate transport level
>>>> errors in nfsd4_cb_done().  Are we sure that EIO, ETIMEDOUT, EACCESS is
>>>> the right list?
>>> 
>>> Looking at call_transmit_status error handling, I don't think
>>> connection errors are returned. Instead the code tries to fix the
>>> connection by retrying unless the rpc_timeout is reached and then only
>>> EIO,TIMEDOUT is returned.
>>> 
>>> Can then my original patch be considered without resubmission?
>> 
>> Bruce has authorized v1 of this patch, but that one has the
>> uncorrected patch description. Post a v4?
> 
> v1's description is accurate. It reflects that only authentication
> errors are handled.

Nit: The short description isn't specific to NFSv4.0, as the v3
one was.


>>>> --b.
>>>> 
>>>>> 
>>>>>> 
>>>>>>>> 
>>>>>>>> --b.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>>>>>>>>> ---
>>>>>>>>> fs/nfsd/nfs4callback.c | 1 +
>>>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>>>> 
>>>>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>>>>>> index 052be5bf9ef5..7325592b456e 100644
>>>>>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>>>>>> @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task
>>>>>>>>> *task, void *calldata)
>>>>>>>>>               switch (task->tk_status) {
>>>>>>>>>               case -EIO:
>>>>>>>>>               case -ETIMEDOUT:
>>>>>>>>> +               case -EACCES:
>>>>>>>>>                       nfsd4_mark_cb_down(clp, task-
>>>>>>>>>> tk_status);
>>>>>>>>>               }
>>>>>>>>>               break;
>>>>>>>>> --
>>>>>>>>> 2.27.0
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Trond Myklebust
>>>>>>> Linux NFS client maintainer, Hammerspace
>>>>>>> trond.myklebust@hammerspace.com
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> --
>>>>> Trond Myklebust
>>>>> Linux NFS client maintainer, Hammerspace
>>>>> trond.myklebust@hammerspace.com
>> 
>> --
>> Chuck Lever

--
Chuck Lever




^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-03-11 15:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-09 14:41 [PATCH 1/1] NFSD: fix error handling in callbacks Olga Kornievskaia
2021-03-09 15:37 ` J. Bruce Fields
2021-03-09 16:39   ` Olga Kornievskaia
2021-03-09 16:46     ` Olga Kornievskaia
2021-03-09 17:42       ` Chuck Lever
2021-03-09 19:45   ` Scott Mayhew
2021-03-09 20:12     ` J. Bruce Fields
2021-03-09 20:40       ` Scott Mayhew
2021-03-09 20:22   ` Trond Myklebust
2021-03-09 20:41     ` Olga Kornievskaia
2021-03-09 20:59       ` Trond Myklebust
2021-03-10 14:47         ` J. Bruce Fields
2021-03-10 22:09           ` Olga Kornievskaia
2021-03-11 14:58             ` J. Bruce Fields
2021-03-11 15:10             ` Chuck Lever III
2021-03-11 15:16               ` Olga Kornievskaia
2021-03-11 15:18                 ` Chuck Lever III

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.