All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH Version 3 1/1] NFSv4 wait on recovery for async session errors
@ 2013-11-15 21:36 andros
  2013-11-19 21:49 ` Myklebust, Trond
  0 siblings, 1 reply; 7+ messages in thread
From: andros @ 2013-11-15 21:36 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Andy Adamson

From: Andy Adamson <andros@netapp.com>

When the state manager is processing the NFS4CLNT_DELEGRETURN flag, session
draining is off, but DELEGRETURN can still get a session error.
The async handler calls nfs4_schedule_session_recovery returns -EAGAIN, and
the DELEGRETURN done then restarts the RPC task in the prepare state.
With the state manager still processing the NFS4CLNT_DELEGRETURN flag with
session draining off, these DELEGRETURNs will cycle with errors filling up the
session slots.

This prevents OPEN reclaims (from nfs_delegation_claim_opens) required by the
NFS4CLNT_DELEGRETURN state manager processing from completing, hanging the
state manager in the __rpc_wait_for_completion_task in nfs4_run_open_task
as seen in this kernel thread dump:

kernel: 4.12.32.53-ma D 0000000000000000     0  3393      2 0x00000000
kernel: ffff88013995fb60 0000000000000046 ffff880138cc5400 ffff88013a9df140
kernel: ffff8800000265c0 ffffffff8116eef0 ffff88013fc10080 0000000300000001
kernel: ffff88013a4ad058 ffff88013995ffd8 000000000000fbc8 ffff88013a4ad058
kernel: Call Trace:
kernel: [<ffffffff8116eef0>] ? cache_alloc_refill+0x1c0/0x240
kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc]
kernel: [<ffffffffa0358152>] rpc_wait_bit_killable+0x42/0xa0 [sunrpc]
kernel: [<ffffffff8152914f>] __wait_on_bit+0x5f/0x90
kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc]
kernel: [<ffffffff815291f8>] out_of_line_wait_on_bit+0x78/0x90
kernel: [<ffffffff8109b520>] ? wake_bit_function+0x0/0x50
kernel: [<ffffffffa035810d>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc]
kernel: [<ffffffffa040d44c>] nfs4_run_open_task+0x11c/0x160 [nfs]
kernel: [<ffffffffa04114e7>] nfs4_open_recover_helper+0x87/0x120 [nfs]
kernel: [<ffffffffa0411646>] nfs4_open_recover+0xc6/0x150 [nfs]
kernel: [<ffffffffa040cc6f>] ? nfs4_open_recoverdata_alloc+0x2f/0x60 [nfs]
kernel: [<ffffffffa0414e1a>] nfs4_open_delegation_recall+0x6a/0xa0 [nfs]
kernel: [<ffffffffa0424020>] nfs_end_delegation_return+0x120/0x2e0 [nfs]
kernel: [<ffffffff8109580f>] ? queue_work+0x1f/0x30
kernel: [<ffffffffa0424347>] nfs_client_return_marked_delegations+0xd7/0x110 [nfs]
kernel: [<ffffffffa04225d8>] nfs4_run_state_manager+0x548/0x620 [nfs]
kernel: [<ffffffffa0422090>] ? nfs4_run_state_manager+0x0/0x620 [nfs]
kernel: [<ffffffff8109b0f6>] kthread+0x96/0xa0
kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
kernel: [<ffffffff8109b060>] ? kthread+0x0/0xa0
kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20

The state manager can not therefore process the DELEGRETURN session errors.
Change the async handler to wait for recovery on session errors.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/nfs/nfs4proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 5ab33c0..1f4edfb 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4803,7 +4803,7 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
 			dprintk("%s ERROR %d, Reset session\n", __func__,
 				task->tk_status);
 			nfs4_schedule_session_recovery(clp->cl_session, task->tk_status);
-			goto restart_call;
+			goto wait_on_recovery;
 #endif /* CONFIG_NFS_V4_1 */
 		case -NFS4ERR_DELAY:
 			nfs_inc_server_stats(server, NFSIOS_DELAY);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH Version 3 1/1] NFSv4 wait on recovery for async session errors
  2013-11-15 21:36 [PATCH Version 3 1/1] NFSv4 wait on recovery for async session errors andros
@ 2013-11-19 21:49 ` Myklebust, Trond
  2013-11-19 22:34   ` Myklebust, Trond
  2013-11-20  9:29   ` William Dauchy
  0 siblings, 2 replies; 7+ messages in thread
From: Myklebust, Trond @ 2013-11-19 21:49 UTC (permalink / raw)
  To: Adamson, Andy; +Cc: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 1246 bytes --]

On Fri, 2013-11-15 at 16:36 -0500, andros@netapp.com wrote:
> From: Andy Adamson <andros@netapp.com>
> 
> When the state manager is processing the NFS4CLNT_DELEGRETURN flag, session
> draining is off, but DELEGRETURN can still get a session error.
> The async handler calls nfs4_schedule_session_recovery returns -EAGAIN, and
> the DELEGRETURN done then restarts the RPC task in the prepare state.
> With the state manager still processing the NFS4CLNT_DELEGRETURN flag with
> session draining off, these DELEGRETURNs will cycle with errors filling up the
> session slots.
> 
> This prevents OPEN reclaims (from nfs_delegation_claim_opens) required by the
> NFS4CLNT_DELEGRETURN state manager processing from completing, hanging the
> state manager in the __rpc_wait_for_completion_task in nfs4_run_open_task
> as seen in this kernel thread dump:
> 

Hi Andy,

There is a second patch that goes with this problem. Please see the
following attachment.

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-NFSv4-Update-list-of-irrecoverable-errors-on-DELEGRE.patch --]
[-- Type: text/x-patch; name="0001-NFSv4-Update-list-of-irrecoverable-errors-on-DELEGRE.patch", Size: 1398 bytes --]

From 20a4067243f81c1417bf62ecea7697b79901926f Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Tue, 19 Nov 2013 16:34:14 -0500
Subject: [PATCH] NFSv4: Update list of irrecoverable errors on DELEGRETURN

If the DELEGRETURN errors out with something like NFS4ERR_BAD_STATEID
then there is no recovery possible.

Also, the client must not assume that the NFSv4 lease has been renewed
when it sees an error on DELEGRETURN.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
---
 fs/nfs/nfs4proc.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 1f4edfbb4a70..aa16a22ad349 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4988,10 +4988,14 @@ static void nfs4_delegreturn_done(struct rpc_task *task, void *calldata)
 
 	trace_nfs4_delegreturn_exit(&data->args, &data->res, task->tk_status);
 	switch (task->tk_status) {
-	case -NFS4ERR_STALE_STATEID:
-	case -NFS4ERR_EXPIRED:
 	case 0:
 		renew_lease(data->res.server, data->timestamp);
+	case -NFS4ERR_ADMIN_REVOKED:
+	case -NFS4ERR_DELEG_REVOKED:
+	case -NFS4ERR_BAD_STATEID:
+	case -NFS4ERR_OLD_STATEID:
+	case -NFS4ERR_STALE_STATEID:
+	case -NFS4ERR_EXPIRED:
 		break;
 	default:
 		if (nfs4_async_handle_error(task, data->res.server, NULL) ==
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH Version 3 1/1] NFSv4 wait on recovery for async session errors
  2013-11-19 21:49 ` Myklebust, Trond
@ 2013-11-19 22:34   ` Myklebust, Trond
  2013-11-19 22:43     ` Adamson, Andy
  2013-11-20  9:29   ` William Dauchy
  1 sibling, 1 reply; 7+ messages in thread
From: Myklebust, Trond @ 2013-11-19 22:34 UTC (permalink / raw)
  To: Adamson, Andy; +Cc: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 354 bytes --]

On Tue, 2013-11-19 at 16:49 -0500, Trond Myklebust wrote:
> There is a second patch that goes with this problem. Please see the
> following attachment.

V2: Don't return an error when we know that the stateid is no longer
valid.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-NFSv4-Update-list-of-irrecoverable-errors-on-DELEGRE.patch --]
[-- Type: text/x-patch; name="0001-NFSv4-Update-list-of-irrecoverable-errors-on-DELEGRE.patch", Size: 1505 bytes --]

From ba64b3649ce323004cc2bd75e5c9f785e6386779 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Tue, 19 Nov 2013 16:34:14 -0500
Subject: [PATCH v2] NFSv4: Update list of irrecoverable errors on DELEGRETURN

If the DELEGRETURN errors out with something like NFS4ERR_BAD_STATEID
then there is no recovery possible. Just quit without returning an error.

Also, note that the client must not assume that the NFSv4 lease has been
renewed when it sees an error on DELEGRETURN.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
---
 fs/nfs/nfs4proc.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 1f4edfbb4a70..ca36d0d50b7d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4988,11 +4988,17 @@ static void nfs4_delegreturn_done(struct rpc_task *task, void *calldata)
 
 	trace_nfs4_delegreturn_exit(&data->args, &data->res, task->tk_status);
 	switch (task->tk_status) {
-	case -NFS4ERR_STALE_STATEID:
-	case -NFS4ERR_EXPIRED:
 	case 0:
 		renew_lease(data->res.server, data->timestamp);
 		break;
+	case -NFS4ERR_ADMIN_REVOKED:
+	case -NFS4ERR_DELEG_REVOKED:
+	case -NFS4ERR_BAD_STATEID:
+	case -NFS4ERR_OLD_STATEID:
+	case -NFS4ERR_STALE_STATEID:
+	case -NFS4ERR_EXPIRED:
+		task->tk_status = 0;
+		break;
 	default:
 		if (nfs4_async_handle_error(task, data->res.server, NULL) ==
 				-EAGAIN) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH Version 3 1/1] NFSv4 wait on recovery for async session errors
  2013-11-19 22:34   ` Myklebust, Trond
@ 2013-11-19 22:43     ` Adamson, Andy
  2013-11-19 22:47       ` Myklebust, Trond
  0 siblings, 1 reply; 7+ messages in thread
From: Adamson, Andy @ 2013-11-19 22:43 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Adamson, Andy, linux-nfs

Why not recover the lease? Shouldn't NFS4ERR_EXPIRED be left for the async handler?

-->Andy

On Nov 19, 2013, at 5:34 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com>
 wrote:

> On Tue, 2013-11-19 at 16:49 -0500, Trond Myklebust wrote:
>> There is a second patch that goes with this problem. Please see the
>> following attachment.
> 
> V2: Don't return an error when we know that the stateid is no longer
> valid.
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com
> <0001-NFSv4-Update-list-of-irrecoverable-errors-on-DELEGRE.patch>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH Version 3 1/1] NFSv4 wait on recovery for async session errors
  2013-11-19 22:43     ` Adamson, Andy
@ 2013-11-19 22:47       ` Myklebust, Trond
  0 siblings, 0 replies; 7+ messages in thread
From: Myklebust, Trond @ 2013-11-19 22:47 UTC (permalink / raw)
  To: Adamson, Andy; +Cc: linux-nfs

On Tue, 2013-11-19 at 22:43 +0000, Adamson, Andy wrote:
> Why not recover the lease? Shouldn't NFS4ERR_EXPIRED be left for the async handler?

No, because then it will retry the operation. Since it is impossible to
recover the delegation at this point, then that will result in another
failure.

> -->Andy
> 
> On Nov 19, 2013, at 5:34 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com>
>  wrote:
> 
> > On Tue, 2013-11-19 at 16:49 -0500, Trond Myklebust wrote:
> >> There is a second patch that goes with this problem. Please see the
> >> following attachment.
> > 
> > V2: Don't return an error when we know that the stateid is no longer
> > valid.
> > 
> > -- 
> > Trond Myklebust
> > Linux NFS client maintainer
> > 
> > NetApp
> > Trond.Myklebust@netapp.com
> > www.netapp.com
> > <0001-NFSv4-Update-list-of-irrecoverable-errors-on-DELEGRE.patch>
> 

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH Version 3 1/1] NFSv4 wait on recovery for async session errors
  2013-11-19 21:49 ` Myklebust, Trond
  2013-11-19 22:34   ` Myklebust, Trond
@ 2013-11-20  9:29   ` William Dauchy
  2013-11-21 23:48     ` William Dauchy
  1 sibling, 1 reply; 7+ messages in thread
From: William Dauchy @ 2013-11-20  9:29 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Adamson, Andy, linux-nfs

Hi Trond,

On Tue, Nov 19, 2013 at 10:49 PM, Myklebust, Trond
<Trond.Myklebust@netapp.com> wrote:
> There is a second patch that goes with this problem. Please see the
> following attachment.

ba64b36 NFSv4: Update list of irrecoverable errors on DELEGRETURN
is cc'ed for stable

If these two patches are connected, why the first one:
d3b173a NFSv4 wait on recovery for async session errors
is not cc'ed for stable as well?

For example in stable v3.10.x we currently have:

        case -NFS4ERR_SEQ_MISORDERED:
            dprintk("%s ERROR %d, Reset session\n", __func__,
                task->tk_status);
            nfs4_schedule_session_recovery(clp->cl_session, task->tk_status);
            task->tk_status = 0;
            return -EAGAIN;

but with the last pacthes we now have in mainline:

        case -NFS4ERR_SEQ_MISORDERED:
            dprintk("%s ERROR: %d Reset session\n", __func__,
                errorcode);
            nfs4_schedule_session_recovery(clp->cl_session, errorcode);
            goto wait_on_recovery;


and this was orignally change because of commit
f1478c1 NFS: Re-use exit code in nfs4_async_handle_error()

The result in stable v3.10.x doesn't really make sense to me. Did I
missed something?

Regards,
-- 
William

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH Version 3 1/1] NFSv4 wait on recovery for async session errors
  2013-11-20  9:29   ` William Dauchy
@ 2013-11-21 23:48     ` William Dauchy
  0 siblings, 0 replies; 7+ messages in thread
From: William Dauchy @ 2013-11-21 23:48 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Adamson, Andy, linux-nfs

On Wed, Nov 20, 2013 at 10:29 AM, William Dauchy <wdauchy@gmail.com> wrote:
> ba64b36 NFSv4: Update list of irrecoverable errors on DELEGRETURN
> is cc'ed for stable
>
> If these two patches are connected, why the first one:
> d3b173a NFSv4 wait on recovery for async session errors
> is not cc'ed for stable as well?

you last push
4a82fd7 NFSv4 wait on recovery for async session errors
is now cc'ed for stable

Thanks,
-- 
William

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-11-21 23:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-15 21:36 [PATCH Version 3 1/1] NFSv4 wait on recovery for async session errors andros
2013-11-19 21:49 ` Myklebust, Trond
2013-11-19 22:34   ` Myklebust, Trond
2013-11-19 22:43     ` Adamson, Andy
2013-11-19 22:47       ` Myklebust, Trond
2013-11-20  9:29   ` William Dauchy
2013-11-21 23:48     ` William Dauchy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.