All of lore.kernel.org
 help / color / mirror / Atom feed
* [2.6.31] NFS4ERR_GRACE unhandled...
@ 2009-09-26 18:14 Daniel J Blueman
       [not found] ` <6278d2220909261114g2e1529dfn4961d450460b00dc-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel J Blueman @ 2009-09-26 18:14 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Hi Trond,

After rebooting my 2.6.31 NFS4 server, I see a list of NFS kernel
errors [1] on the 2.6.31 client corresponding to NFS4ERR_GRACE, so
lock or file state recovery failed. Is this expected noting that I
have an internal firewall allowing incoming TCP port 2049 on the
server, and no firewall on the client, however I can't see how it can
thus be callback related?

Many thanks,
  Daniel

--- [1]

[52664.396008] nfs: server x1 not responding, still trying
[53305.376017] nfs: server x1 not responding, timed out
[53305.376024] nfs: server x1 not responding, still trying
[53305.376026] nfs: server x1 not responding, still trying
[53305.376038] nfs: server x1 not responding, still trying
[53305.376041] nfs: server x1 not responding, still trying
[53305.376044] nfs: server x1 not responding, still trying
[53499.312009] nfs: server x1 not responding, still trying
[53551.332276] nfs: server x1 OK
[53551.332679] nfs: server x1 OK
[53551.366774] nfs: server x1 OK
[53551.366820] nfs: server x1 OK
[53551.382566] nfs: server x1 OK
[53551.608916] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.609117] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.609322] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.609528] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.609732] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.609937] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.610142] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.610346] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.610551] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.610757] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.610960] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.611369] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.611779] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.612393] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.619820] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.620070] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.620431] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.633668] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53551.633868] nfs4_reclaim_open_state: unhandled error -10013. Zeroing state
[53611.317298] nfs: server x1 OK
[53611.324266] nfs: server x1 OK
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.6.31] NFS4ERR_GRACE unhandled...
       [not found] ` <6278d2220909261114g2e1529dfn4961d450460b00dc-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-09-28 18:16   ` Trond Myklebust
       [not found]     ` <1254161800.3308.1.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2009-09-28 18:16 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: linux-nfs

On Sat, 2009-09-26 at 19:14 +0100, Daniel J Blueman wrote:
> Hi Trond,
> 
> After rebooting my 2.6.31 NFS4 server, I see a list of NFS kernel
> errors [1] on the 2.6.31 client corresponding to NFS4ERR_GRACE, so
> lock or file state recovery failed. Is this expected noting that I
> have an internal firewall allowing incoming TCP port 2049 on the
> server, and no firewall on the client, however I can't see how it can
> thus be callback related?

No. It looks as if your server rebooted while the client was recovering
an expired lease.

The following patch should prevent future occurrences of this bug...

Cheers
  Trond
------------------------------------------------------------------
NFSv4: Handle NFS4ERR_GRACE when recovering an expired lease.

From: Trond Myklebust <Trond.Myklebust@netapp.com>

If our lease expires, and the server subsequently reboot, we need to be
able to handle the case where the server refuses to let us recover state,
because it is in the grace period.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/nfs4proc.c |   22 ++++++++++++++++++----
 1 files changed, 18 insertions(+), 4 deletions(-)


diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ed7c269..56182eb 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1491,10 +1491,17 @@ static inline int nfs4_do_open_expired(struct nfs_open_context *ctx, struct nfs4
 
 	do {
 		err = _nfs4_open_expired(ctx, state);
-		if (err != -NFS4ERR_DELAY)
+		switch (err) {
+		case -NFS4ERR_FILE_OPEN:
+		case -NFS4ERR_GRACE:
+		case -NFS4ERR_DELAY:
+			nfs4_handle_exception(server, err, &exception);
 			break;
-		nfs4_handle_exception(server, err, &exception);
+		default:
+			goto out;
+		}
 	} while (exception.retry);
+out:
 	return err;
 }
 
@@ -4047,10 +4054,17 @@ static int nfs4_lock_expired(struct nfs4_state *state, struct file_lock *request
 		if (test_bit(NFS_DELEGATED_STATE, &state->flags) != 0)
 			return 0;
 		err = _nfs4_do_setlk(state, F_SETLK, request, 0);
-		if (err != -NFS4ERR_DELAY)
+		switch (err) {
+		case -NFS4ERR_FILE_OPEN:
+		case -NFS4ERR_GRACE:
+		case -NFS4ERR_DELAY:
+			nfs4_handle_exception(server, err, &exception);
 			break;
-		nfs4_handle_exception(server, err, &exception);
+		default:
+			goto out;
+		}
 	} while (exception.retry);
+out:
 	return err;
 }
 


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [2.6.31] NFS4ERR_GRACE unhandled...
       [not found]     ` <1254161800.3308.1.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-10-03 15:59       ` Daniel J Blueman
       [not found]         ` <6278d2220910030859s1fb8d200n7d032e2c1b235ce-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel J Blueman @ 2009-10-03 15:59 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Hi Trond,

On Mon, Sep 28, 2009 at 7:16 PM, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
> On Sat, 2009-09-26 at 19:14 +0100, Daniel J Blueman wrote:
>> Hi Trond,
>>
>> After rebooting my 2.6.31 NFS4 server, I see a list of NFS kernel
>> errors [1] on the 2.6.31 client corresponding to NFS4ERR_GRACE, so
>> lock or file state recovery failed. Is this expected noting that I
>> have an internal firewall allowing incoming TCP port 2049 on the
>> server, and no firewall on the client, however I can't see how it ca=
n
>> thus be callback related?
>
> No. It looks as if your server rebooted while the client was recoveri=
ng
> an expired lease.
>
> The following patch should prevent future occurrences of this bug...
>
> Cheers
> =A0Trond
> ------------------------------------------------------------------
> NFSv4: Handle NFS4ERR_GRACE when recovering an expired lease.
>
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
>
> If our lease expires, and the server subsequently reboot, we need to =
be
> able to handle the case where the server refuses to let us recover st=
ate,
> because it is in the grace period.

On the client, I didn't see the error messages with this patch,
however I did see firefox (via sqlite) continue to hang [1] (after
other processes continued), and an unusual level of activity with
rpciod/0 and rpciod/1 kernel threads. Other NFS-related kernel thread
state is given.

I'm able to reproduce [2] this behaviour with 2.6.31-rc1 and your
patch; let me know if you're unable to.

Thanks,
  Daniel

--- [1]

# echo t >/proc/sysrq-trigger
# dmesg
<aargh, missing rpciod/0 and rpciod/1 which are in a unexpected state,
since CONFIG_LOG_BUF_SHIFT is only 18 by default and crashdump doesn't
work without the vmcoreinfo file>
nfsv4.0-svc   S ffff880126e3ed70  4600  2628      2 0x00000000
 ffff88011fe25d20 0000000000000046 000000000000031a ffff88011fd5a5d8
 ffff88011fd5a5a0 ffffffff825cb770 ffff88011fd5af68 ffffffff00000320
 0000000000000001 ffffffff8220e0a0 ffff88011fe24000 0000000000017768
Call Trace:
 [<ffffffff8112e7ea>] ? kfree+0xea/0x1c0
 [<ffffffff816394e5>] schedule_timeout+0x275/0x360
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff81067158>] ? local_bh_enable_ip+0x98/0x130
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff815ff433>] ? svc_recv+0x793/0x8a0
 [<ffffffff81095b1d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff815ff43c>] svc_recv+0x79c/0x8a0
 [<ffffffff81054c00>] ? default_wake_function+0x0/0x10
 [<ffffffff8121b803>] nfs4_callback_svc+0x43/0xb0
 [<ffffffff8121b7c0>] ? nfs4_callback_svc+0x0/0xb0
 [<ffffffff8107eb3e>] kthread+0x8e/0xa0
 [<ffffffff8100d15a>] child_rip+0xa/0x20
 [<ffffffff8100ca94>] ? restore_args+0x0/0x30
 [<ffffffff8107eab0>] ? kthread+0x0/0xa0
 [<ffffffff8100d150>] ? child_rip+0x0/0x20
firefox       S 0000000000000000  2512  2695      1 0x00000000
 ffff880117229c38 0000000000000046 0000000000000000 00000055b199802a
 ffff8801171daf30 ffffffff821f0210 0000000000019400 ffff88002efe0bd8
 0000000000000000 00000000ffffe926 ffff880117228000 0000000000017768
Call Trace:
 [<ffffffff8109aa0d>] futex_wait_queue_me+0xbd/0x100
 [<ffffffff8109b068>] futex_wait+0x168/0x2f0
 [<ffffffff8109caa9>] do_futex+0x109/0xac0
 [<ffffffff8107ef60>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff8109d526>] sys_futex+0xc6/0x170
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff8163b30e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       S 0000000000000000  4824  2698      1 0x00000000
 ffff880129d71a58 0000000000000046 0000000000000000 ffffffff81096872
 0000000000000002 ffffffff82204790 0000000000000042 ffff88002efe0bd8
 0000000000000000 00000000ffffe731 ffff880129d70000 0000000000017768
Call Trace:
 [<ffffffff81096872>] ? __lock_acquire+0x482/0x1440
 [<ffffffff8163a445>] schedule_hrtimeout_range+0x145/0x170
 [<ffffffff8163bf35>] ? _spin_unlock_irqrestore+0x45/0x80
 [<ffffffff8107f2a1>] ? add_wait_queue+0x41/0x50
 [<ffffffff8114dbdf>] ? __pollwait+0x6f/0xf0
 [<ffffffff8114da94>] poll_schedule_timeout+0x34/0x50
 [<ffffffff8114e89f>] do_sys_poll+0x40f/0x520
 [<ffffffff815536a2>] ? dev_queue_xmit+0x52/0x520
 [<ffffffff8114db70>] ? __pollwait+0x0/0xf0
 [<ffffffff8114dc60>] ? pollwake+0x0/0x60
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff810925de>] ? put_lock_stats+0xe/0x30
 [<ffffffff8163bf55>] ? _spin_unlock_irqrestore+0x65/0x80
 [<ffffffff810645d2>] ? current_fs_time+0x22/0x30
 [<ffffffff81157118>] ? mnt_drop_write+0x38/0x60
 [<ffffffff81152bf5>] ? touch_atime+0x135/0x180
 [<ffffffff81145925>] ? pipe_read+0x2d5/0x4e0
 [<ffffffff8113ce12>] ? do_sync_read+0xf2/0x130
 [<ffffffff8107ef60>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff8100bffa>] ? sysret_check+0x2e/0x69
 [<ffffffff8114eb97>] sys_poll+0x77/0x110
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       S ffff880126c28000  5944  2699      1 0x00000000
 ffff88011729bc38 0000000000000046 ffff88011729bb88 ffffffff8163bf35
 ffff88011729bd28 0000024240c36955 ffff88011729bc28 ffffffff81082b30
 ffff88011729bbb8 ffffffff8109044a ffff88011729a000 0000000000017768
Call Trace:
 [<ffffffff8163bf35>] ? _spin_unlock_irqrestore+0x45/0x80
 [<ffffffff81082b30>] ? __hrtimer_start_range_ns+0x170/0x480
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff8109aa0d>] futex_wait_queue_me+0xbd/0x100
 [<ffffffff8109b068>] futex_wait+0x168/0x2f0
 [<ffffffff81082380>] ? hrtimer_wakeup+0x0/0x30
 [<ffffffff8109caa9>] do_futex+0x109/0xac0
 [<ffffffff8163bf9b>] ? _spin_unlock_irq+0x2b/0x60
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff81095b1d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8163bfac>] ? _spin_unlock_irq+0x3c/0x60
 [<ffffffff81049baf>] ? finish_task_switch+0x6f/0xf0
 [<ffffffff81049b40>] ? finish_task_switch+0x0/0xf0
 [<ffffffff816388bb>] ? thread_return+0x98/0x82d
 [<ffffffff8163b30e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8109d526>] sys_futex+0xc6/0x170
 [<ffffffff81095a01>] ? trace_hardirqs_on_caller+0x91/0x1a0
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       S 0000000000000000  5944  2700      1 0x00000000
 ffff8801170b9c38 0000000000000046 0000000000000000 ffffffff8163bf35
 ffff8801170b9d28 0000035df665233f ffff8801170b9c28 ffff88002efe0bd8
 0000000000000000 0000000100016139 ffff8801170b8000 0000000000017768
Call Trace:
 [<ffffffff8163bf35>] ? _spin_unlock_irqrestore+0x45/0x80
 [<ffffffff8109aa0d>] futex_wait_queue_me+0xbd/0x100
 [<ffffffff8109b068>] futex_wait+0x168/0x2f0
 [<ffffffff81082380>] ? hrtimer_wakeup+0x0/0x30
 [<ffffffff8109caa9>] do_futex+0x109/0xac0
 [<ffffffff8107ef60>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff8109d526>] sys_futex+0xc6/0x170
 [<ffffffff81095a01>] ? trace_hardirqs_on_caller+0x91/0x1a0
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       D 0000000000000002  3832  2710      1 0x00000000
 ffff880117283b58 0000000000000046 ffff88011fd72fa0 ffffffff821e0250
 0000000000019400 ffff88011fd725a0 0000000000000001 0000000000000007
 0000000000000006 0000000000000002 ffff880117282000 0000000000017768
Call Trace:
 [<ffffffff81095b1d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff810ef7e0>] ? sync_page+0x0/0x60
 [<ffffffff81639092>] io_schedule+0x42/0x60
 [<ffffffff810ef825>] sync_page+0x45/0x60
 [<ffffffff816397c7>] __wait_on_bit+0x57/0x80
 [<ffffffff810f1e00>] ? find_get_pages_tag+0x130/0x1d0
 [<ffffffff810f1cd0>] ? find_get_pages_tag+0x0/0x1d0
 [<ffffffff810ef95e>] wait_on_page_bit+0x6e/0x80
 [<ffffffff8107efa0>] ? wake_bit_function+0x0/0x40
 [<ffffffff810fb150>] ? pagevec_lookup_tag+0x20/0x30
 [<ffffffff810efad4>] wait_on_page_writeback_range+0xf4/0x1a0
 [<ffffffff81096872>] ? __lock_acquire+0x482/0x1440
 [<ffffffff810fa1bf>] ? do_writepages+0x1f/0x40
 [<ffffffff810efc2b>] ? __filemap_fdatawrite_range+0x5b/0x60
 [<ffffffff810efba6>] filemap_fdatawait+0x26/0x30
 [<ffffffff810efed4>] filemap_write_and_wait+0x44/0x50
 [<ffffffff811f2794>] nfs_setattr+0x174/0x190
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff810925de>] ? put_lock_stats+0xe/0x30
 [<ffffffff81154d12>] ? notify_change+0x1b2/0x2e0
 [<ffffffff81154c64>] notify_change+0x104/0x2e0
 [<ffffffff8113bf91>] do_truncate+0x61/0x90
 [<ffffffff8113c620>] sys_ftruncate+0xd0/0xe0
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       S 0000000000000000  5048  2713      1 0x00000000
 ffff88011727fc38 0000000000000046 0000000000000000 000000209bf0709a
 ffff88011febd4d0 ffffffff821f0210 0000000000019400 ffff88002efe0bd8
 0000000000000000 00000000ffffbe3c ffff88011727e000 0000000000017768
Call Trace:
 [<ffffffff8109aa0d>] futex_wait_queue_me+0xbd/0x100
 [<ffffffff8109b068>] futex_wait+0x168/0x2f0
 [<ffffffff8109caa9>] do_futex+0x109/0xac0
 [<ffffffff8153d795>] ? sockfd_lookup_light+0x25/0x80
 [<ffffffff8107ef60>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff8109d526>] sys_futex+0xc6/0x170
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff8163b30e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       S ffff880126c2a5a0  6216  2714      1 0x00000000
 ffff880117275c38 0000000000000046 ffff880117275b88 00000012ae79dbd1
 ffff880126c2af30 ffffffff821f0210 0000000000019400 ffff880126c2a5a0
 ffff880117275bb8 ffffffff8109044a ffff880117274000 0000000000017768
Call Trace:
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff812d4f7e>] ? _raw_spin_unlock+0x5e/0xb0
 [<ffffffff8109aa0d>] futex_wait_queue_me+0xbd/0x100
 [<ffffffff8109b068>] futex_wait+0x168/0x2f0
 [<ffffffff8109caa9>] do_futex+0x109/0xac0
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff810925de>] ? put_lock_stats+0xe/0x30
 [<ffffffff8163bf9b>] ? _spin_unlock_irq+0x2b/0x60
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff81095b1d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8109d526>] sys_futex+0xc6/0x170
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff8163b30e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       S 0000000000000000  2864  2715      1 0x00000000
 ffff880117297c38 0000000000000046 0000000000000000 000000209b22fd4e
 ffff8801347ad4d0 ffffffff821f0210 0000000000019400 ffff88002efe0bd8
 0000000000000000 00000000ffffbe3c ffff880117296000 0000000000017768
Call Trace:
 [<ffffffff8109aa0d>] futex_wait_queue_me+0xbd/0x100
 [<ffffffff8109b068>] futex_wait+0x168/0x2f0
 [<ffffffff8109caa9>] do_futex+0x109/0xac0
 [<ffffffff8107ef60>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff8109d526>] sys_futex+0xc6/0x170
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff8163b30e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
kthreadd      D 00000000ffffffff  6816 23477      2 0x00000000
 ffff8801344b9eb0 0000000000000046 ffff8801344b9e78 ffff8801344b9e74
 ffff8801344b9e00 ffff88002ee17cc0 ffff88002f1e0bc0 ffff88002efe0bd8
 0000000000000000 00000001000354b3 ffff8801344b8000 0000000000017768
Call Trace:
 [<ffffffff812181f0>] ? nfs4_run_state_manager+0x0/0x4c0
 [<ffffffff8107eb22>] kthread+0x72/0xa0
 [<ffffffff8100d15a>] child_rip+0xa/0x20
 [<ffffffff8100ca94>] ? restore_args+0x0/0x30
 [<ffffffff8107eab0>] ? kthread+0x0/0xa0
 [<ffffffff8100d150>] ? child_rip+0x0/0x20

--- [2]

1. ensure NFS4 client processes are updating databases/files (eg run
firefox or renderer)
2. reboot server (perhaps unneeded)
3. on server, stop nfs-kernel service for some minutes
4. start nfs-kernel service again
5. verify client application resume without dataloss
--=20
Daniel J Blueman

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.6.31] NFS4ERR_GRACE unhandled...
       [not found]         ` <6278d2220910030859s1fb8d200n7d032e2c1b235ce-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-10-04 22:10           ` Trond Myklebust
       [not found]             ` <1254694252.30515.63.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2009-10-04 22:10 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: linux-nfs

On Sat, 2009-10-03 at 16:59 +0100, Daniel J Blueman wrote:
> Hi Trond,
> 
> On Mon, Sep 28, 2009 at 7:16 PM, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
> > On Sat, 2009-09-26 at 19:14 +0100, Daniel J Blueman wrote:
> >> Hi Trond,
> >>
> >> After rebooting my 2.6.31 NFS4 server, I see a list of NFS kernel
> >> errors [1] on the 2.6.31 client corresponding to NFS4ERR_GRACE, so
> >> lock or file state recovery failed. Is this expected noting that I
> >> have an internal firewall allowing incoming TCP port 2049 on the
> >> server, and no firewall on the client, however I can't see how it can
> >> thus be callback related?
> >
> > No. It looks as if your server rebooted while the client was recovering
> > an expired lease.
> >
> > The following patch should prevent future occurrences of this bug...
> >
> > Cheers
> >  Trond
> > ------------------------------------------------------------------
> > NFSv4: Handle NFS4ERR_GRACE when recovering an expired lease.
> >
> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> >
> > If our lease expires, and the server subsequently reboot, we need to be
> > able to handle the case where the server refuses to let us recover state,
> > because it is in the grace period.
> 
> On the client, I didn't see the error messages with this patch,
> however I did see firefox (via sqlite) continue to hang [1] (after
> other processes continued), and an unusual level of activity with
> rpciod/0 and rpciod/1 kernel threads. Other NFS-related kernel thread
> state is given.

What are your mount options?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.6.31] NFS4ERR_GRACE unhandled...
       [not found]             ` <1254694252.30515.63.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-10-04 22:22               ` Daniel J Blueman
       [not found]                 ` <6278d2220910041522g47a96070nd948f3b61ff9dc7b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel J Blueman @ 2009-10-04 22:22 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Sun, Oct 4, 2009 at 11:10 PM, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
> On Sat, 2009-10-03 at 16:59 +0100, Daniel J Blueman wrote:
>> Hi Trond,
>>
>> On Mon, Sep 28, 2009 at 7:16 PM, Trond Myklebust
>> <Trond.Myklebust@netapp.com> wrote:
>> > On Sat, 2009-09-26 at 19:14 +0100, Daniel J Blueman wrote:
>> >> Hi Trond,
>> >>
>> >> After rebooting my 2.6.31 NFS4 server, I see a list of NFS kernel
>> >> errors [1] on the 2.6.31 client corresponding to NFS4ERR_GRACE, s=
o
>> >> lock or file state recovery failed. Is this expected noting that =
I
>> >> have an internal firewall allowing incoming TCP port 2049 on the
>> >> server, and no firewall on the client, however I can't see how it=
 can
>> >> thus be callback related?
>> >
>> > No. It looks as if your server rebooted while the client was recov=
ering
>> > an expired lease.
>> >
>> > The following patch should prevent future occurrences of this bug.=
=2E.
>> >
>> > Cheers
>> > =A0Trond
>> > ------------------------------------------------------------------
>> > NFSv4: Handle NFS4ERR_GRACE when recovering an expired lease.
>> >
>> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
>> >
>> > If our lease expires, and the server subsequently reboot, we need =
to be
>> > able to handle the case where the server refuses to let us recover=
 state,
>> > because it is in the grace period.
>>
>> On the client, I didn't see the error messages with this patch,
>> however I did see firefox (via sqlite) continue to hang [1] (after
>> other processes continued), and an unusual level of activity with
>> rpciod/0 and rpciod/1 kernel threads. Other NFS-related kernel threa=
d
>> state is given.
>
> What are your mount options?

$ grep nfs /proc/mounts
x1:/ /net nfs4 rw,relatime,vers=3D4,rsize=3D262144,wsize=3D262144,namle=
n=3D255,hard,proto=3Dtcp,timeo=3D600,retrans=3D2,sec=3Dsys,clientaddr=3D=
192.168.10.2,addr=3D192.168.10.250
0 0

All procfs settings are default; let me know if anything else will
help and thanks for taking a look!

Daniel
--=20
Daniel J Blueman

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.6.31] NFS4ERR_GRACE unhandled...
       [not found]                 ` <6278d2220910041522g47a96070nd948f3b61ff9dc7b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-10-29 19:33                   ` Daniel J Blueman
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel J Blueman @ 2009-10-29 19:33 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Sun, Oct 4, 2009 at 10:22 PM, Daniel J Blueman
<daniel.blueman@gmail.com> wrote:
> On Sun, Oct 4, 2009 at 11:10 PM, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
>> On Sat, 2009-10-03 at 16:59 +0100, Daniel J Blueman wrote:
>>> Hi Trond,
>>>
>>> On Mon, Sep 28, 2009 at 7:16 PM, Trond Myklebust
>>> <Trond.Myklebust@netapp.com> wrote:
>>> > On Sat, 2009-09-26 at 19:14 +0100, Daniel J Blueman wrote:
>>> >> Hi Trond,
>>> >>
>>> >> After rebooting my 2.6.31 NFS4 server, I see a list of NFS kerne=
l
>>> >> errors [1] on the 2.6.31 client corresponding to NFS4ERR_GRACE, =
so
>>> >> lock or file state recovery failed. Is this expected noting that=
 I
>>> >> have an internal firewall allowing incoming TCP port 2049 on the
>>> >> server, and no firewall on the client, however I can't see how i=
t can
>>> >> thus be callback related?
>>> >
>>> > No. It looks as if your server rebooted while the client was reco=
vering
>>> > an expired lease.
>>> >
>>> > The following patch should prevent future occurrences of this bug=
=2E..
>>> >
>>> > Cheers
>>> > =A0Trond
>>> > -----------------------------------------------------------------=
-
>>> > NFSv4: Handle NFS4ERR_GRACE when recovering an expired lease.
>>> >
>>> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
>>> >
>>> > If our lease expires, and the server subsequently reboot, we need=
 to be
>>> > able to handle the case where the server refuses to let us recove=
r state,
>>> > because it is in the grace period.
>>>
>>> On the client, I didn't see the error messages with this patch,
>>> however I did see firefox (via sqlite) continue to hang [1] (after
>>> other processes continued), and an unusual level of activity with
>>> rpciod/0 and rpciod/1 kernel threads. Other NFS-related kernel thre=
ad
>>> state is given.
>>
>> What are your mount options?
>
> $ grep nfs /proc/mounts
> x1:/ /net nfs4 rw,relatime,vers=3D4,rsize=3D262144,wsize=3D262144,nam=
len=3D255,hard,proto=3Dtcp,timeo=3D600,retrans=3D2,sec=3Dsys,clientaddr=
=3D192.168.10.2,addr=3D192.168.10.250
> 0 0
>
> All procfs settings are default; let me know if anything else will
> help and thanks for taking a look!

In the same situation but with 2.6.32-rc5 on the server and 2.6.31.4
on the client, I see on the client's kernel log
"nfs4_reclaim_open_state: Lock reclaim failed", and the application
(reproducible with firefox) shows a failure mode (eg empty lists in
live bookmarks).

Is this expected behaviour, ie is there a finite state recovery window?

Many thanks,
  Daniel
--=20
Daniel J Blueman

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-10-29 19:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-26 18:14 [2.6.31] NFS4ERR_GRACE unhandled Daniel J Blueman
     [not found] ` <6278d2220909261114g2e1529dfn4961d450460b00dc-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-28 18:16   ` Trond Myklebust
     [not found]     ` <1254161800.3308.1.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-10-03 15:59       ` Daniel J Blueman
     [not found]         ` <6278d2220910030859s1fb8d200n7d032e2c1b235ce-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-04 22:10           ` Trond Myklebust
     [not found]             ` <1254694252.30515.63.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-10-04 22:22               ` Daniel J Blueman
     [not found]                 ` <6278d2220910041522g47a96070nd948f3b61ff9dc7b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-29 19:33                   ` Daniel J Blueman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.