All of lore.kernel.org
 help / color / mirror / Atom feed
* NMI/soft lockup in nfs_delegation_need_return()
@ 2014-09-25 13:04 David Howells
  2014-09-25 13:07 ` David Howells
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: David Howells @ 2014-09-25 13:04 UTC (permalink / raw)
  To: trond.myklebust; +Cc: dhowells, linux-nfs, steved

[Let me put a subject in this time]

Hi Trond,

I'm seeing the soft lockup below occasionally.  I'm using the head of Linus's
master branch (005f800508eb391480f463dad3d54e5b4ec67d57).  Plus a pair of
cachefiles patches that should have no bearing on the matter.  I'm seeing the
lockup both with and without a cache.  I did the following:

	insmod /tmp/sunrpc.ko
	mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs
	insmod /tmp/lockd.ko
	insmod /tmp/nfs_acl.ko
	insmod /tmp/auth_rpcgss.ko
	insmod /tmp/fscache.ko
	insmod /tmp/nfs.ko
	insmod /tmp/nfsv2.ko
	insmod /tmp/nfsv3.ko
	insmod /tmp/nfsv4.ko
	mount warthog:/warthog /warthog -o async,intr
	tar cf - /warthog/fs/linux-2.6-fscache | cat >/dev/null

where the directory being tar'd is a kernel tree with a couple of builds in
there plus a load of git data (about 3GB worth).

The RIP points here:

   0x0000000000010e81 <+141>:   lea    0x48(%r12),%rdx
   0x0000000000010e86 <+146>:   lock btrl $0x1,0x48(%r12)
   0x0000000000010e8e <+154>:   jb     0x10e95 <nfs_client_return_marked_delegations+161>

and it appears to be the first test_and_clear_bit() in
nfs_delegation_need_return() which seems to have been inlined.

I got at least three barks from the watchdog the last time I ran it before I
killed it.

David
---
NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [90.155.74.18-ma:2776]
Modules linked in: nfsv4 nfsv3 nfsv2 nfs fscache auth_rpcgss nfs_acl lockd sunrpc
irq event stamp: 134630
hardirqs last  enabled at (134629): [<ffffffff81088a54>] __call_rcu+0x203/0x212
hardirqs last disabled at (134630): [<ffffffff8151133a>] apic_timer_interrupt+0x6a/0x80
softirqs last  enabled at (134618): [<ffffffff8103de29>] __do_softirq+0x243/0x305
softirqs last disabled at (134613): [<ffffffff8103e106>] irq_exit+0x61/0xda
CPU: 1 PID: 2776 Comm: 90.155.74.18-ma Tainted: G        W      3.17.0-rc6-fsdevel+ #794
Hardware name:                  /DG965RY, BIOS MQ96510J.86A.0816.2006.0716.2308 07/16/2006
task: ffff880012ab6250 ti: ffff8800129e8000 task.ti: ffff8800129e8000
RIP: 0010:[<ffffffffa00fde8e>]  [<ffffffffa00fde8e>] nfs_client_return_marked_delegations+0x9a/0x1aa [nfsv4]
RSP: 0018:ffff8800129ebde8  EFLAGS: 00000202
RAX: ffff880020bfd4c0 RBX: ffff8800129ebdd0 RCX: 0000000000000000
RDX: ffff880020bfd508 RSI: ffff880012ab6250 RDI: ffff880012ab6250
RBP: ffff8800129ebe28 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8800129ebd78 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffffffff81a43340 R15: ffff8800129ebd78
FS:  0000000000000000(0000) GS:ffff88003db00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000a62418 CR3: 00000000060f1000 CR4: 00000000000007e0
Stack:
 ffff8800065e58c8 ffff88000610f7f0 ffff880020bfd4c0 ffff8800065e5800
 ffff8800065e5800 ffff8800065e5930 0000000000000000 0000000000000000
 ffff8800129ebe68 ffffffffa00fc180 000000000000000b ffff8800062f8e80
Call Trace:
 [<ffffffffa00fc180>] nfs4_run_state_manager+0x614/0x619 [nfsv4]
 [<ffffffffa00fbb6c>] ? nfs4_do_reclaim+0x527/0x527 [nfsv4]
 [<ffffffff81053a97>] kthread+0xfa/0x102
 [<ffffffff8105399d>] ? kthread_create_on_node+0x1ca/0x1ca
 [<ffffffff8151036c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8105399d>] ? kthread_create_on_node+0x1ca/0x1ca
Code: 00 48 89 45 d0 4c 8b 65 d0 48 8d 83 f0 07 00 00 48 89 45 c8 4c 3b 65 c8 0f 84 d7 00 00 00 49 8d 54 24 48 f0 41 0f ba 74 24 48 01 <72> 05 45 31 f6 eb 03 41 b6 01 f0 0f ba 32 02 72 02 eb 36 45 84 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NMI/soft lockup in nfs_delegation_need_return()
  2014-09-25 13:04 NMI/soft lockup in nfs_delegation_need_return() David Howells
@ 2014-09-25 13:07 ` David Howells
  2014-09-25 14:15 ` Trond Myklebust
  2015-02-26 10:45 ` David Howells
  2 siblings, 0 replies; 11+ messages in thread
From: David Howells @ 2014-09-25 13:07 UTC (permalink / raw)
  To: trond.myklebust; +Cc: dhowells, linux-nfs, steved

David Howells <dhowells@redhat.com> wrote:

> I'm seeing the soft lockup below occasionally.  I'm using the head of Linus's
> master branch (005f800508eb391480f463dad3d54e5b4ec67d57).  Plus a pair of
> cachefiles patches that should have no bearing on the matter.  I'm seeing the
> lockup both with and without a cache.  I did the following:

I should mention that the server is running the most recent Fedora 20 kernel:

	3.16.2-201.fc20.x86_64

and that and the client are connected by gigabit ethernet with not much other
traffic on the network.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NMI/soft lockup in nfs_delegation_need_return()
  2014-09-25 13:04 NMI/soft lockup in nfs_delegation_need_return() David Howells
  2014-09-25 13:07 ` David Howells
@ 2014-09-25 14:15 ` Trond Myklebust
  2015-02-26 10:45 ` David Howells
  2 siblings, 0 replies; 11+ messages in thread
From: Trond Myklebust @ 2014-09-25 14:15 UTC (permalink / raw)
  To: David Howells; +Cc: linux-nfs, steved

Hi David,

On Thu, 2014-09-25 at 14:04 +0100, David Howells wrote:
> [Let me put a subject in this time]
> 
> Hi Trond,
> 
> I'm seeing the soft lockup below occasionally.  I'm using the head of Linus's
> master branch (005f800508eb391480f463dad3d54e5b4ec67d57).  Plus a pair of
> cachefiles patches that should have no bearing on the matter.  I'm seeing the
> lockup both with and without a cache.  I did the following:
> 
> 	insmod /tmp/sunrpc.ko
> 	mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs
> 	insmod /tmp/lockd.ko
> 	insmod /tmp/nfs_acl.ko
> 	insmod /tmp/auth_rpcgss.ko
> 	insmod /tmp/fscache.ko
> 	insmod /tmp/nfs.ko
> 	insmod /tmp/nfsv2.ko
> 	insmod /tmp/nfsv3.ko
> 	insmod /tmp/nfsv4.ko
> 	mount warthog:/warthog /warthog -o async,intr
> 	tar cf - /warthog/fs/linux-2.6-fscache | cat >/dev/null
> 
> where the directory being tar'd is a kernel tree with a couple of builds in
> there plus a load of git data (about 3GB worth).
> 
> The RIP points here:
> 
>    0x0000000000010e81 <+141>:   lea    0x48(%r12),%rdx
>    0x0000000000010e86 <+146>:   lock btrl $0x1,0x48(%r12)
>    0x0000000000010e8e <+154>:   jb     0x10e95 <nfs_client_return_marked_delegations+161>
> 
> and it appears to be the first test_and_clear_bit() in
> nfs_delegation_need_return() which seems to have been inlined.
> 
> I got at least three barks from the watchdog the last time I ran it before I
> killed it.
> 
> David
> ---
> NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [90.155.74.18-ma:2776]
> Modules linked in: nfsv4 nfsv3 nfsv2 nfs fscache auth_rpcgss nfs_acl lockd sunrpc
> irq event stamp: 134630
> hardirqs last  enabled at (134629): [<ffffffff81088a54>] __call_rcu+0x203/0x212
> hardirqs last disabled at (134630): [<ffffffff8151133a>] apic_timer_interrupt+0x6a/0x80
> softirqs last  enabled at (134618): [<ffffffff8103de29>] __do_softirq+0x243/0x305
> softirqs last disabled at (134613): [<ffffffff8103e106>] irq_exit+0x61/0xda
> CPU: 1 PID: 2776 Comm: 90.155.74.18-ma Tainted: G        W      3.17.0-rc6-fsdevel+ #794
> Hardware name:                  /DG965RY, BIOS MQ96510J.86A.0816.2006.0716.2308 07/16/2006
> task: ffff880012ab6250 ti: ffff8800129e8000 task.ti: ffff8800129e8000
> RIP: 0010:[<ffffffffa00fde8e>]  [<ffffffffa00fde8e>] nfs_client_return_marked_delegations+0x9a/0x1aa [nfsv4]
> RSP: 0018:ffff8800129ebde8  EFLAGS: 00000202
> RAX: ffff880020bfd4c0 RBX: ffff8800129ebdd0 RCX: 0000000000000000
> RDX: ffff880020bfd508 RSI: ffff880012ab6250 RDI: ffff880012ab6250
> RBP: ffff8800129ebe28 R08: 0000000000000000 R09: 0000000000000000
> R10: ffff8800129ebd78 R11: 0000000000000000 R12: 0000000000000000
> R13: 0000000000000000 R14: ffffffff81a43340 R15: ffff8800129ebd78
> FS:  0000000000000000(0000) GS:ffff88003db00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000a62418 CR3: 00000000060f1000 CR4: 00000000000007e0
> Stack:
>  ffff8800065e58c8 ffff88000610f7f0 ffff880020bfd4c0 ffff8800065e5800
>  ffff8800065e5800 ffff8800065e5930 0000000000000000 0000000000000000
>  ffff8800129ebe68 ffffffffa00fc180 000000000000000b ffff8800062f8e80
> Call Trace:
>  [<ffffffffa00fc180>] nfs4_run_state_manager+0x614/0x619 [nfsv4]
>  [<ffffffffa00fbb6c>] ? nfs4_do_reclaim+0x527/0x527 [nfsv4]
>  [<ffffffff81053a97>] kthread+0xfa/0x102
>  [<ffffffff8105399d>] ? kthread_create_on_node+0x1ca/0x1ca
>  [<ffffffff8151036c>] ret_from_fork+0x7c/0xb0
>  [<ffffffff8105399d>] ? kthread_create_on_node+0x1ca/0x1ca
> Code: 00 48 89 45 d0 4c 8b 65 d0 48 8d 83 f0 07 00 00 48 89 45 c8 4c 3b 65 c8 0f 84 d7 00 00 00 49 8d 54 24 48 f0 41 0f ba 74 24 48 01 <72> 05 45 31 f6 eb 03 41 b6 01 f0 0f ba 32 02 72 02 eb 36 45 84 

The 22s hang above appears to occur in a timer interrupt. Could that be
triggering the Oops by delaying the state manager thread for long enough
that the delegation is getting freed?

I can't see anything obviously violating the RCU rules in the delegation
code:
- All delegations are freed via kfree_rcu(), so they should not normally
be able to disappear from underneath us while we're in a rcu_read_lock()
protected section such as above.
- All additions/removals from the server->delegations list are done
using list_add/del_rcu(), and I don't see any way for a delegation to be
put on the list twice or to be put back on the list once it has been
removed.
- The nfs_server code calls synchronize_rcu() after removing the server
from the clp->cl_superblocks list (which it does using list_del_rcu()).

Cheers
  Trond


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NMI/soft lockup in nfs_delegation_need_return()
  2014-09-25 13:04 NMI/soft lockup in nfs_delegation_need_return() David Howells
  2014-09-25 13:07 ` David Howells
  2014-09-25 14:15 ` Trond Myklebust
@ 2015-02-26 10:45 ` David Howells
  2015-02-26 15:36   ` Trond Myklebust
                     ` (2 more replies)
  2 siblings, 3 replies; 11+ messages in thread
From: David Howells @ 2015-02-26 10:45 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: dhowells, linux-nfs, steved

Seems I can still reproduce this (see below).  I don't suppose you've had any
further insights since September?

David
---
NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [90.155.74.18-ma:2834]
Modules linked in: cachefiles nfsv4 nfsv3 nfsv2 nfs fscache auth_rpcgss nfs_acl lockd grace sunrpc
irq event stamp: 135774
hardirqs last  enabled at (135773): [<ffffffff81091701>] __call_rcu+0x241/0x253
hardirqs last disabled at (135774): [<ffffffff8155895a>] apic_timer_interrupt+0x6a/0x80
softirqs last  enabled at (135762): [<ffffffff810439a8>] __do_softirq+0x25a/0x319
softirqs last disabled at (135757): [<ffffffff81043c73>] irq_exit+0x5e/0xd6
CPU: 1 PID: 2834 Comm: 90.155.74.18-ma Tainted: G        W       3.19.0-fsdevel+ #1143
Hardware name:                  /DG965RY, BIOS MQ96510J.86A.0816.2006.0716.2308 07/16/2006
task: ffff88000ab6a410 ti: ffff88001cab8000 task.ti: ffff88001cab8000
RIP: 0010:[<ffffffffa010c418>]  [<ffffffffa010c418>] nfs_client_return_marked_delegations+0x9b/0x1a9 [nfsv4]
RSP: 0018:ffff88001cabbdb8  EFLAGS: 00000292
RAX: ffff880000e4b7c0 RBX: ffffffffa010c39c RCX: 0000000000000000
RDX: ffff880000e4b808 RSI: ffff88000ab6a410 RDI: ffff88000ab6a410
RBP: ffff88001cabbe08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000296
R13: ffff88001cabbd98 R14: 0000000000000002 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88003db00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f419ab7a000 CR3: 000000003a7fa000 CR4: 00000000000007e0
Stack:
 ffff880038dc9cc8 ffff880010e8d7f8 ffff88000ab6a410 ffff880000e4b7c0
 ffff88001cabbe08 ffff880038dc9c00 ffff880038dc9c00 ffff880038dc9d30
 0000000000000000 0000000000000000 ffff88001cabbe58 ffffffffa010a54f
Call Trace:
 [<ffffffffa010a54f>] nfs4_run_state_manager+0x5da/0x5df [nfsv4]
 [<ffffffffa0109f75>] ? nfs4_do_reclaim+0x55d/0x55d [nfsv4]
 [<ffffffffa0109f75>] ? nfs4_do_reclaim+0x55d/0x55d [nfsv4]
 [<ffffffff81059b9d>] kthread+0x10e/0x116
 [<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb
 [<ffffffff815579ac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb
Code: 00 48 89 45 c8 4c 8b 65 c8 48 8d 83 f8 07 00 00 48 89 45 b8 4c 3b 65 b8 0f 84 d6 00 00 00 49 8d 54 24 48 f0 41 0f ba 74 24 48 01 <72> 05 45 31 f6 eb 03 41 b6 01 f0 0f ba 32 02 72 02 eb 36 45 84 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NMI/soft lockup in nfs_delegation_need_return()
  2015-02-26 10:45 ` David Howells
@ 2015-02-26 15:36   ` Trond Myklebust
  2015-03-04 14:06     ` Andrew W Elble
  2015-02-26 16:51   ` David Howells
  2015-02-26 16:59   ` David Howells
  2 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2015-02-26 15:36 UTC (permalink / raw)
  To: David Howells; +Cc: linux-nfs, steved

On Thu, 2015-02-26 at 10:45 +0000, David Howells wrote:
> Seems I can still reproduce this (see below).  I don't suppose you've had any
> further insights since September?
> 
> David
> ---
> NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [90.155.74.18-ma:2834]
> Modules linked in: cachefiles nfsv4 nfsv3 nfsv2 nfs fscache auth_rpcgss nfs_acl lockd grace sunrpc
> irq event stamp: 135774
> hardirqs last  enabled at (135773): [<ffffffff81091701>] __call_rcu+0x241/0x253
> hardirqs last disabled at (135774): [<ffffffff8155895a>] apic_timer_interrupt+0x6a/0x80
> softirqs last  enabled at (135762): [<ffffffff810439a8>] __do_softirq+0x25a/0x319
> softirqs last disabled at (135757): [<ffffffff81043c73>] irq_exit+0x5e/0xd6
> CPU: 1 PID: 2834 Comm: 90.155.74.18-ma Tainted: G        W       3.19.0-fsdevel+ #1143
> Hardware name:                  /DG965RY, BIOS MQ96510J.86A.0816.2006.0716.2308 07/16/2006
> task: ffff88000ab6a410 ti: ffff88001cab8000 task.ti: ffff88001cab8000
> RIP: 0010:[<ffffffffa010c418>]  [<ffffffffa010c418>] nfs_client_return_marked_delegations+0x9b/0x1a9 [nfsv4]
> RSP: 0018:ffff88001cabbdb8  EFLAGS: 00000292
> RAX: ffff880000e4b7c0 RBX: ffffffffa010c39c RCX: 0000000000000000
> RDX: ffff880000e4b808 RSI: ffff88000ab6a410 RDI: ffff88000ab6a410
> RBP: ffff88001cabbe08 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000296
> R13: ffff88001cabbd98 R14: 0000000000000002 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff88003db00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f419ab7a000 CR3: 000000003a7fa000 CR4: 00000000000007e0
> Stack:
>  ffff880038dc9cc8 ffff880010e8d7f8 ffff88000ab6a410 ffff880000e4b7c0
>  ffff88001cabbe08 ffff880038dc9c00 ffff880038dc9c00 ffff880038dc9d30
>  0000000000000000 0000000000000000 ffff88001cabbe58 ffffffffa010a54f
> Call Trace:
>  [<ffffffffa010a54f>] nfs4_run_state_manager+0x5da/0x5df [nfsv4]
>  [<ffffffffa0109f75>] ? nfs4_do_reclaim+0x55d/0x55d [nfsv4]
>  [<ffffffffa0109f75>] ? nfs4_do_reclaim+0x55d/0x55d [nfsv4]
>  [<ffffffff81059b9d>] kthread+0x10e/0x116
>  [<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb
>  [<ffffffff815579ac>] ret_from_fork+0x7c/0xb0
>  [<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb
> Code: 00 48 89 45 c8 4c 8b 65 c8 48 8d 83 f8 07 00 00 48 89 45 b8 4c 3b 65 b8 0f 84 d6 00 00 00 49 8d 54 24 48 f0 41 0f ba 74 24 48 01 <72> 05 45 31 f6 eb 03 41 b6 01 f0 0f ba 32 02 72 02 eb 36 45 84 

Maybe. Does the following patch help?

Cheers
  Trond
8<--------------------------------------------------------
>From 3a6839513e9ef00a5bd519b9965a301d8d156a7d Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@primarydata.com>
Date: Thu, 26 Feb 2015 09:57:34 -0500
Subject: [PATCH] NFSv4: Pin the superblock while we're returning the
 delegation

This patch ensures that the superblock doesn't go ahead and disappear
underneath us while the state manager thread is returning delegations.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/delegation.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index a1f0685b42ff..dcc5af078d48 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -471,14 +471,20 @@ restart:
 								super_list) {
 			if (!nfs_delegation_need_return(delegation))
 				continue;
-			inode = nfs_delegation_grab_inode(delegation);
-			if (inode == NULL)
+			if (!nfs_sb_active(server->super))
 				continue;
+			inode = nfs_delegation_grab_inode(delegation);
+			if (inode == NULL) {
+				rcu_read_unlock();
+				nfs_sb_deactive(server->super);
+				goto restart;
+			}
 			delegation = nfs_start_delegation_return_locked(NFS_I(inode));
 			rcu_read_unlock();
 
 			err = nfs_end_delegation_return(inode, delegation, 0);
 			iput(inode);
+			nfs_sb_deactive(server->super);
 			if (!err)
 				goto restart;
 			set_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state);
@@ -812,9 +818,14 @@ restart:
 			if (test_bit(NFS_DELEGATION_NEED_RECLAIM,
 						&delegation->flags) == 0)
 				continue;
-			inode = nfs_delegation_grab_inode(delegation);
-			if (inode == NULL)
+			if (!nfs_sb_active(server->super))
 				continue;
+			inode = nfs_delegation_grab_inode(delegation);
+			if (inode == NULL) {
+				rcu_read_unlock();
+				nfs_sb_deactive(server->super);
+				goto restart;
+			}
 			delegation = nfs_detach_delegation(NFS_I(inode),
 					delegation, server);
 			rcu_read_unlock();
@@ -822,6 +833,7 @@ restart:
 			if (delegation != NULL)
 				nfs_free_delegation(delegation);
 			iput(inode);
+			nfs_sb_deactive(server->super);
 			goto restart;
 		}
 	}
-- 
2.1.0



-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com





^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: NMI/soft lockup in nfs_delegation_need_return()
  2015-02-26 10:45 ` David Howells
  2015-02-26 15:36   ` Trond Myklebust
@ 2015-02-26 16:51   ` David Howells
  2015-02-26 16:59   ` David Howells
  2 siblings, 0 replies; 11+ messages in thread
From: David Howells @ 2015-02-26 16:51 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: dhowells, linux-nfs, steved

Trond Myklebust <trond.myklebust@primarydata.com> wrote:

> This patch ensures that the superblock doesn't go ahead and disappear
> underneath us while the state manager thread is returning delegations.

It doesn't help.  It seem likely that it would, though.  The superblock is
unlikely to disappear since there's an active tar running against it.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NMI/soft lockup in nfs_delegation_need_return()
  2015-02-26 10:45 ` David Howells
  2015-02-26 15:36   ` Trond Myklebust
  2015-02-26 16:51   ` David Howells
@ 2015-02-26 16:59   ` David Howells
  2015-02-28 15:21     ` Trond Myklebust
                       ` (2 more replies)
  2 siblings, 3 replies; 11+ messages in thread
From: David Howells @ 2015-02-26 16:59 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: dhowells, linux-nfs, steved

David Howells <dhowells@redhat.com> wrote:

> Trond Myklebust <trond.myklebust@primarydata.com> wrote:
> 
> > This patch ensures that the superblock doesn't go ahead and disappear
> > underneath us while the state manager thread is returning delegations.
> 
> It doesn't help.  It seem likely that it would, though.  The superblock is

It doesn't seem likely, that is...

> unlikely to disappear since there's an active tar running against it.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NMI/soft lockup in nfs_delegation_need_return()
  2015-02-26 16:59   ` David Howells
@ 2015-02-28 15:21     ` Trond Myklebust
  2015-03-04 13:21     ` David Howells
  2015-03-04 14:45     ` David Howells
  2 siblings, 0 replies; 11+ messages in thread
From: Trond Myklebust @ 2015-02-28 15:21 UTC (permalink / raw)
  To: David Howells; +Cc: Linux NFS Mailing List, Steve Dickson

On Thu, Feb 26, 2015 at 11:59 AM, David Howells <dhowells@redhat.com> wrote:
> David Howells <dhowells@redhat.com> wrote:
>
>> Trond Myklebust <trond.myklebust@primarydata.com> wrote:
>>
>> > This patch ensures that the superblock doesn't go ahead and disappear
>> > underneath us while the state manager thread is returning delegations.
>>
>> It doesn't help.  It seem likely that it would, though.  The superblock is
>
> It doesn't seem likely, that is...
>
>> unlikely to disappear since there's an active tar running against it.
>
> David

Hi David,

Could you please retest using the patches that are currently in my
'devel' git branch?

git pull git://git.linux-nfs.org/projects/trondmy/linux-nfs.git devel

They fix up a number of minor issues with the delegation code,
specifically ones that might cause a delegation to be removed
prematurely.

Thanks!
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NMI/soft lockup in nfs_delegation_need_return()
  2015-02-26 16:59   ` David Howells
  2015-02-28 15:21     ` Trond Myklebust
@ 2015-03-04 13:21     ` David Howells
  2015-03-04 14:45     ` David Howells
  2 siblings, 0 replies; 11+ messages in thread
From: David Howells @ 2015-03-04 13:21 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: dhowells, Linux NFS Mailing List, Steve Dickson

Trond Myklebust <trond.myklebust@primarydata.com> wrote:

> Could you please retest using the patches that are currently in my
> 'devel' git branch?
> 
> git pull git://git.linux-nfs.org/projects/trondmy/linux-nfs.git devel
> 
> They fix up a number of minor issues with the delegation code,
> specifically ones that might cause a delegation to be removed
> prematurely.

The bug still happens and the backtrace looks much the same.  It might be a
bit less likely, however, but it's hard to say.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NMI/soft lockup in nfs_delegation_need_return()
  2015-02-26 15:36   ` Trond Myklebust
@ 2015-03-04 14:06     ` Andrew W Elble
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew W Elble @ 2015-03-04 14:06 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Trond,

   Some feedback on patches you have in your devel branch. We've been
  seeing a problem where the server/client conversation after a few
  hours (usually after we leave the environment overnight) becomes
  nothing but sequence operations back and forth with the server
  continually asserting SEQ4_STATUS_RECALLABLE_STATE_REVOKED. The
  client is in an unusable state which can rapidly degrade to a
  lock or crash. (We've seen this with both 3.18.8 and RHEL7
  3.10.0-123.20.1.el7.x86_64)

  Cherry-picking these:

  9f0f8e12c48e4bb89192a0de876c77dc1fbfaa75
    NFSv4: Pin the superblock while we're returning the delegation

  ade04647dd56881e285983af3db702d56ee97e86
    NFSv4: Ensure we honour NFS_DELEGATION_RETURNING in nfs_inode_set_delegation()

  b04b22f4ca691280f0ab3f77954f5a21500881e7
    NFSv4: Ensure that we don't reap a delegation that is being returned

  ec3ca4e57e00d52ff724b0ae49f4489667a9c311
    NFSv4: Ensure we skip delegations that are already being returned

  Plus this:

  ea7c38fef0b774a5dc16fb0ca5935f0ae8568176
    NFSv4: Ensure we reference the inode for return-on-close in delegreturn

  And applying to 3.18.8 has eliminated this from manifesting for at
  least last night.

  Thanks,

Andy

-- 
Andrew W. Elble
aweits@discipline.rit.edu
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NMI/soft lockup in nfs_delegation_need_return()
  2015-02-26 16:59   ` David Howells
  2015-02-28 15:21     ` Trond Myklebust
  2015-03-04 13:21     ` David Howells
@ 2015-03-04 14:45     ` David Howells
  2 siblings, 0 replies; 11+ messages in thread
From: David Howells @ 2015-03-04 14:45 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: dhowells, Linux NFS Mailing List, Steve Dickson


Here's tshark output from around the dodgy point.  There's a huge gap from
23.440722 to 60.461998 which is devoid of NFS ops.  I've inserted a couple of
blank lines to delineate the section.  At this point, the NFS client starts
churning out a huge slew of DELEGRETURN ops.

My test case is just tar'ing up a compiled kernel tree over NFS.  The output
of tar goes through a throughtput monitor and thence to /dev/null:

  nice -n 30 tar cf - /warthog/fs/linux-2.6-fscache | /tmp/progress >/dev/null

Note that the tshark command excluded ssh traffic from the client.

I wonder, is it spending this time scanning the delegations to decide what to
return?

David
---
103565  23.392234 90.155.74.21 -> 90.155.74.18 NFS 330 V4 Call OPEN DH: 0xe7d8aba6/bottom
103566  23.392318 90.155.74.18 -> 90.155.74.21 NFS 414 V4 Reply (Call In 103565) OPEN StateID: 0x8d1c
103567  23.392862 90.155.74.21 -> 90.155.74.18 NFS 262 V4 Call READ StateID: 0x8d1c Offset: 0 Len: 41
103568  23.392908 90.155.74.18 -> 90.155.74.21 NFS 174 V4 Reply (Call In 103567) READ
103569  23.400370 90.155.74.21 -> 90.155.74.18 NFS 178 V4 Call RENEW CID: 0x21ff
103570  23.400402 90.155.74.18 -> 90.155.74.21 NFS 114 V4 Reply (Call In 103569) RENEW
103571  23.400610 90.155.74.21 -> 90.155.74.18 NFS 246 V4 Call GETATTR FH: 0xe7d8aba6
103572  23.400644 90.155.74.18 -> 90.155.74.21 NFS 258 V4 Reply (Call In 103571) GETATTR
103573  23.400859 90.155.74.21 -> 90.155.74.18 TCP 66 803→2049 [ACK] Seq=8062089 Ack=47346145 Win=11449 Len=0 TSval=2028290 TSecr=781547240
103574  23.401176 90.155.74.21 -> 90.155.74.18 NFS 270 V4 Call CLOSE StateID: 0x8d1c
103575  23.401212 90.155.74.18 -> 90.155.74.21 NFS 202 V4 Reply (Call In 103574) CLOSE
103576  23.401852 90.155.74.21 -> 90.155.74.18 NFS 254 V4 Call ACCESS FH: 0xe7d8aba6, [Check: RD MD XT XE]
103577  23.401888 90.155.74.18 -> 90.155.74.21 NFS 194 V4 Reply (Call In 103576) ACCESS, [Access Denied: XE], [Allowed: RD MD XT]
103578  23.440722 90.155.74.21 -> 90.155.74.18 TCP 66 803→2049 [ACK] Seq=8062481 Ack=47346409 Win=11449 Len=0 TSval=2028300 TSecr=781547242

103579  33.775771 fe80::96de:80ff:feb4:ecc3 -> 2001:8b0:194:0:216:76ff:fece:3a3c ICMPv6 86 Neighbor Solicitation for 2001:8b0:194:0:216:76ff:fece:3a3c from 94:de:80:b4:ec:c3
103580  33.776256 2001:8b0:194:0:216:76ff:fece:3a3c -> fe80::96de:80ff:feb4:ecc3 ICMPv6 78 Neighbor Advertisement 2001:8b0:194:0:216:76ff:fece:3a3c (sol)
103581  59.679778 fe80::96de:80ff:feb4:ecc3 -> 2001:8b0:194:0:216:76ff:fece:3a3c ICMPv6 86 Neighbor Solicitation for 2001:8b0:194:0:216:76ff:fece:3a3c from 94:de:80:b4:ec:c3
103582  59.680047 2001:8b0:194:0:216:76ff:fece:3a3c -> fe80::96de:80ff:feb4:ecc3 ICMPv6 78 Neighbor Advertisement 2001:8b0:194:0:216:76ff:fece:3a3c (sol)

103583  60.461998 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x5e92
103584  60.462196 90.155.74.18 -> 90.155.74.21 NFS 186 V4 Reply (Call In 103583) DELEGRETURN
103585  60.462415 90.155.74.21 -> 90.155.74.18 TCP 66 803→2049 [ACK] Seq=8062681 Ack=47346529 Win=11449 Len=0 TSval=2037555 TSecr=781584302
103586  60.529223 90.155.74.21 -> 90.155.74.18 NFS 334 V4 Call OPEN DH: 0xe7d8aba6/authemail
103587  60.529266 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x0a42
103588  60.529276 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x39f2
103589  60.529395 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x1540
103590  60.529418 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x26f0
103591  60.529442 90.155.74.18 -> 90.155.74.21 TCP 66 2049→803 [ACK] Seq=47346529 Ack=8063349 Win=32885 Len=0 TSval=781584369 TSecr=2037572
103592  60.529500 90.155.74.18 -> 90.155.74.21 NFS 186 V4 Reply (Call In 103587) DELEGRETURN
103593  60.529593 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x7220
103594  60.529614 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x4190
103595  60.529621 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0xdb80
103596  60.529650 90.155.74.18 -> 90.155.74.21 NFS 654 V4 Reply (Call In 103588) DELEGRETURN  ; V4 Reply (Call In 103586) OPEN StateID: 0x8d1c  ; V4 Reply (Call In 103589) DELEGRETURN
103597  60.529676 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0xe830
103598  60.529685 90.155.74.21 -> 90.155.74.18 TCP 66 803→2049 [ACK] Seq=8064549 Ack=47346649 Win=11449 Len=0 TSval=2037572 TSecr=781584369
103599  60.529700 90.155.74.18 -> 90.155.74.21 NFS 306 V4 Reply (Call In 103593) DELEGRETURN  ; V4 Reply (Call In 103590) DELEGRETURN
103600  60.529839 90.155.74.18 -> 90.155.74.21 NFS 426 V4 Reply (Call In 103594) DELEGRETURN  ; V4 Reply (Call In 103595) DELEGRETURN  ; V4 Reply (Call In 103597) DELEGRETURN
103601  60.529993 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0xbce0
103602  60.530022 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x8f50
103603  60.530065 90.155.74.21 -> 90.155.74.18 TCP 66 803→2049 [ACK] Seq=8064949 Ack=47347837 Win=11449 Len=0 TSval=2037572 TSecr=781584369
103604  60.530109 90.155.74.18 -> 90.155.74.21 TCP 66 2049→803 [ACK] Seq=47347837 Ack=8064949 Win=32885 Len=0 TSval=781584370 TSecr=2037572
103605  60.530146 90.155.74.18 -> 90.155.74.21 NFS 186 V4 Reply (Call In 103601) DELEGRETURN
103606  60.530256 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x80d1
103607  60.530271 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0xb361
103608  60.530277 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0xe7b1
103609  60.530305 90.155.74.18 -> 90.155.74.21 NFS 186 V4 Reply (Call In 103602) DELEGRETURN
103610  60.530442 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0xd401
103611  60.530453 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x4e11
103612  60.530463 90.155.74.18 -> 90.155.74.21 NFS 426 V4 Reply (Call In 103607) DELEGRETURN  ; V4 Reply (Call In 103606) DELEGRETURN  ; V4 Reply (Call In 103608) DELEGRETURN
103613  60.530509 90.155.74.18 -> 90.155.74.21 NFS 186 V4 Reply (Call In 103610) DELEGRETURN
103614  60.530639 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x7da1
103615  60.530651 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x2971
103616  60.530664 90.155.74.18 -> 90.155.74.21 NFS 186 V4 Reply (Call In 103611) DELEGRETURN
103617  60.530669 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x1ac1
103618  60.530717 90.155.74.21 -> 90.155.74.18 TCP 66 803→2049 [ACK] Seq=8066549 Ack=47348557 Win=11449 Len=0 TSval=2037572 TSecr=781584370
103619  60.530731 90.155.74.18 -> 90.155.74.21 NFS 186 V4 Reply (Call In 103615) DELEGRETURN
103620  60.530910 90.155.74.18 -> 90.155.74.21 NFS 306 V4 Reply (Call In 103617) DELEGRETURN  ; V4 Reply (Call In 103614) DELEGRETURN
103621  60.531059 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x7015
103622  60.531092 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x43a5
103623  60.531100 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x1775
103624  60.531159 90.155.74.18 -> 90.155.74.21 NFS 186 V4 Reply (Call In 103622) DELEGRETURN
103625  60.531314 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x24c5
103626  60.531342 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0xbed5
103627  60.531362 90.155.74.18 -> 90.155.74.21 NFS 306 V4 Reply (Call In 103621) DELEGRETURN  ; V4 Reply (Call In 103623) DELEGRETURN
103628  60.531368 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x8d65
103629  60.531408 90.155.74.18 -> 90.155.74.21 NFS 186 V4 Reply (Call In 103625) DELEGRETURN
103630  60.531512 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0xd9b5
103631  60.531543 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0xea05
103632  60.531549 90.155.74.21 -> 90.155.74.18 NFS 262 V4 Call READ StateID: 0x8d1c Offset: 0 Len: 19
103633  60.531570 90.155.74.21 -> 90.155.74.18 TCP 66 803→2049 [ACK] Seq=8068345 Ack=47349517 Win=11449 Len=0 TSval=2037572 TSecr=781584371
103634  60.531589 90.155.74.18 -> 90.155.74.21 NFS 426 V4 Reply (Call In 103626) DELEGRETURN  ; V4 Reply (Call In 103628) DELEGRETURN  ; V4 Reply (Call In 103630) DELEGRETURN
103635  60.531636 90.155.74.18 -> 90.155.74.21 NFS 150 V4 Reply (Call In 103632) READ
103636  60.531832 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0xe584
103637  60.531862 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0xd634
103638  60.531868 90.155.74.21 -> 90.155.74.18 TCP 66 803→2049 [ACK] Seq=8068745 Ack=47349961 Win=11449 Len=0 TSval=2037572 TSecr=781584371
103639  60.531886 90.155.74.18 -> 90.155.74.21 NFS 186 V4 Reply (Call In 103631) DELEGRETURN
103640  60.532083 90.155.74.18 -> 90.155.74.21 NFS 306 V4 Reply (Call In 103637) DELEGRETURN  ; V4 Reply (Call In 103636) DELEGRETURN
103641  60.532098 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x82e4
103642  60.532108 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0xb154
103643  60.532114 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x2b44
103644  60.532174 90.155.74.18 -> 90.155.74.21 TCP 66 2049→803 [ACK] Seq=47350321 Ack=8069345 Win=32885 Len=0 TSval=781584372 TSecr=2037572
103645  60.532259 90.155.74.18 -> 90.155.74.21 NFS 426 V4 Reply (Call In 103641) DELEGRETURN  ; V4 Reply (Call In 103642) DELEGRETURN  ; V4 Reply (Call In 103643) DELEGRETURN
103646  60.532343 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x18f4
103647  60.532358 90.155.74.21 -> 90.155.74.18 NFS 266 V4 Call DELEGRETURN StateID: 0x4c24
103648  60.532400 90.155.74.18 -> 90.155.74.21 TCP 66 2049→803 [ACK] Seq=47350681 Ack=8069745 Win=32885 Len=0 TSval=781584372 TSecr=2037572

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-03-04 14:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-25 13:04 NMI/soft lockup in nfs_delegation_need_return() David Howells
2014-09-25 13:07 ` David Howells
2014-09-25 14:15 ` Trond Myklebust
2015-02-26 10:45 ` David Howells
2015-02-26 15:36   ` Trond Myklebust
2015-03-04 14:06     ` Andrew W Elble
2015-02-26 16:51   ` David Howells
2015-02-26 16:59   ` David Howells
2015-02-28 15:21     ` Trond Myklebust
2015-03-04 13:21     ` David Howells
2015-03-04 14:45     ` David Howells

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.