All of lore.kernel.org
 help / color / mirror / Atom feed
* NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!
@ 2012-06-10  9:03 Jamie Heilman
  2012-06-11  3:41 ` Jamie Heilman
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Jamie Heilman @ 2012-06-10  9:03 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel

Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
reproduce this reliably:

------------[ cut here ]------------
kernel BUG at fs/nfsd/nfs4state.c:1044!
invalid opcode: 0000 [#1] 
Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button

Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
EIP is at free_client.isra.47+0x3/0x5 [nfsd]
EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
 DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
Stack:
 e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
 e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
 f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
Call Trace:
 [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
 [<f86a6d36>] nfsd4_setclientid_confirm+0x12e/0x1c7 [nfsd]
 [<f869f63c>] nfsd4_proc_compound+0x1fd/0x354 [nfsd]
 [<f86946e3>] nfsd_dispatch+0x89/0x134 [nfsd]
 [<f86c6d64>] svc_process+0x327/0x4e3 [sunrpc]
 [<f86940d2>] nfsd+0xd2/0x10b [nfsd]
 [<f8694000>] ? 0xf8693fff
 [<c102d53a>] kthread+0x66/0x6b
 [<c102d4d4>] ? flush_kthread_worker+0x74/0x74
 [<c123193e>] kernel_thread_helper+0x6/0xd
Code: 01 80 3e 00 74 04 89 f2 eb a2 8b 4d e8 8b 55 ec 8b 45 f0 0f c8 89 19 89 02 31 c0 83 c4 18 5b 5e 5f 5d c3 55 89 e5 0f 0b 55 89 e5 <0f> 0b 55 89 e5 57 56 89 d6 53 89 c3 8b 78 34 89 d0 e8 b4 8f ff 
EIP: [<f86ac683>] free_client.isra.47+0x3/0x5 [nfsd] SS:ESP 0068:e9cf9eb4
---[ end trace bf1613e00e6f3bbe ]---

System is a 32-bit Via C7; to reproduce I need only establish a NFSv4
mount from a client, then reboot the client.  The BUG happens when the
client attempts to reconnect.  At this point nfs service on the server
becomes something of a lost cause.  Userspace is Debian stable
(nfs-utils 1.2.2 based).  I haven't had a chance to bisect the issue
yet, I'll give that shot in the coming week if nobody knows of any
obvious fixes.  Let me know if there's any other info I can provide
that's useful.

-- 
Jamie Heilman                     http://audible.transient.net/~jamie/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!
  2012-06-10  9:03 NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044! Jamie Heilman
@ 2012-06-11  3:41 ` Jamie Heilman
  2012-06-11 16:14 ` J. Bruce Fields
  2012-06-11 18:07 ` Sergey Bolshakov
  2 siblings, 0 replies; 11+ messages in thread
From: Jamie Heilman @ 2012-06-11  3:41 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel

Jamie Heilman wrote:
> Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
> reproduce this reliably:
> 
> ------------[ cut here ]------------
> kernel BUG at fs/nfsd/nfs4state.c:1044!
> invalid opcode: 0000 [#1] 
> Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
> 
> Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
> EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
> EIP is at free_client.isra.47+0x3/0x5 [nfsd]
> EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
> ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
>  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
> Stack:
>  e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
>  e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
>  f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
> Call Trace:
>  [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
>  [<f86a6d36>] nfsd4_setclientid_confirm+0x12e/0x1c7 [nfsd]
>  [<f869f63c>] nfsd4_proc_compound+0x1fd/0x354 [nfsd]
>  [<f86946e3>] nfsd_dispatch+0x89/0x134 [nfsd]
>  [<f86c6d64>] svc_process+0x327/0x4e3 [sunrpc]
>  [<f86940d2>] nfsd+0xd2/0x10b [nfsd]
>  [<f8694000>] ? 0xf8693fff
>  [<c102d53a>] kthread+0x66/0x6b
>  [<c102d4d4>] ? flush_kthread_worker+0x74/0x74
>  [<c123193e>] kernel_thread_helper+0x6/0xd
> Code: 01 80 3e 00 74 04 89 f2 eb a2 8b 4d e8 8b 55 ec 8b 45 f0 0f c8 89 19 89 02 31 c0 83 c4 18 5b 5e 5f 5d c3 55 89 e5 0f 0b 55 89 e5 <0f> 0b 55 89 e5 57 56 89 d6 53 89 c3 8b 78 34 89 d0 e8 b4 8f ff 
> EIP: [<f86ac683>] free_client.isra.47+0x3/0x5 [nfsd] SS:ESP 0068:e9cf9eb4
> ---[ end trace bf1613e00e6f3bbe ]---
> 
> System is a 32-bit Via C7; to reproduce I need only establish a NFSv4
> mount from a client, then reboot the client.  The BUG happens when the
> client attempts to reconnect.  At this point nfs service on the server
> becomes something of a lost cause.  Userspace is Debian stable
> (nfs-utils 1.2.2 based).  I haven't had a chance to bisect the issue
> yet, I'll give that shot in the coming week if nobody knows of any
> obvious fixes.  Let me know if there's any other info I can provide
> that's useful.

Turns out bisection isn't going to be straightforward as between v3.3
and v3.4-rc1 there was some other issue introduced and fixed that
prevents my test system from booting at all, it freezes up after the
MSR00000000: 0000000000000000 ... spew, and I haven't figured out what
it was.  I tried 3.5-rc2 to see if the issue had been fixed by further
nfs tree merges, and no it hasn't been.  I was also able to trigger
this a couple of times, just by attempting to mv a ~4M file onto a
recovered nfs volume after an NFS server reboot:

RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
------------[ cut here ]------------
kernel BUG at fs/nfsd/nfs4state.c:1083!
invalid opcode: 0000 [#1]
Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs fscache auth_rpcgss lockd sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep tpm_tis tpm tpm_bios snd_pcm snd_timer snd soundcore via_rhine evdev snd_page_alloc mii via_agp via_velocity crc_ccitt agpgart button

Pid: 515, comm: kworker/u:5 Not tainted 3.5.0-rc2 #7 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
EIP: 0060:[<f8740eb5>] EFLAGS: 00010246 CPU: 0
EIP is at free_client.isra.46+0x3/0x5 [nfsd]
EAX: 00000000 EBX: f5be9000 ECX: f5ba5f58 EDX: f5ba5f58
ESI: f5be9010 EDI: f5ba5f28 EBP: f5ba5f1c ESP: f5ba5f1c
 DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
CR0: 8005003b CR2: b7732000 CR3: 3513b000 CR4: 000006b0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process kworker/u:5 (pid: 515, ti=f5ba4000 task=f59e5d00 task.ti=f5ba4000)
Stack:
 f5ba5f3c f873a72b f5be9028 f5ba5f28 f5ba5f28 00000029 f5ba5f58 f5be9000
 f5ba5f6c f873b434 f59e5d00 00000000 f5be9030 4fd5563d f5ba5f58 f5ba5f58
 f5ba5f58 f59ffb00 f8746368 00000000 f5ba5fa0 c102b950 0000007b 00000000
Call Trace:
 [<f873a72b>] expire_client+0xb1/0xb9 [nfsd]
 [<f873b434>] laundromat_main+0x132/0x1fa [nfsd]
 [<c102b950>] process_one_work+0xe0/0x1aa
 [<f873b302>] ? nfsd4_exchange_id+0x292/0x292 [nfsd]
 [<c102bbd3>] worker_thread+0xa4/0x12c
 [<c102bb2f>] ? rescuer_thread+0xf4/0xf4
 [<c102edda>] kthread+0x66/0x6b
 [<c102ed74>] ? flush_kthread_worker+0x74/0x74
 [<c12392fe>] kernel_thread_helper+0x6/0xd
Code: 03 8d 73 01 8a 06 84 c0 75 84 8b 45 e4 89 10 8b 55 ec 8b 45 f0 0f c8 89 02 31 c0 83 c4 20 5b 5e 5f 5d c3 55 89 e5 0f 0b 55 89 e5 <0f> 0b 55 89 e5 57 56 89 d6 53 53 89 c3 8b 78 34 89 d0 e8 cb 82
EIP: [<f8740eb5>] free_client.isra.46+0x3/0x5 [nfsd] SS:ESP 0068:f5ba5f1c
---[ end trace 59132c98eb4d8731 ]---
RPC: fragment too large: 0x000800cc
BUG: unable to handle kernel paging request at fffffffc
IP: [<c102ede9>] kthread_data+0xa/0xe
*pdpt = 0000000001397001 *pde = 000000000139b067 *pte = 0000000000000000
Oops: 0000 [#2]
Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs fscache auth_rpcgss lockd sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep tpm_tis tpm tpm_bios snd_pcm snd_timer snd soundcore via_rhine evdev snd_page_alloc mii via_agp via_velocity crc_ccitt agpgart button

Pid: 515, comm: kworker/u:5 Tainted: G      D      3.5.0-rc2 #7 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
EIP: 0060:[<c102ede9>] EFLAGS: 00010046 CPU: 0
EIP is at kthread_data+0xa/0xe
EAX: 00000000 EBX: c13b49a4 ECX: ffd23940 EDX: 00000000
ESI: 00000000 EDI: f59e5e70 EBP: f5ba5d20 ESP: f5ba5d14
 DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
CR0: 8005003b CR2: fffffffc CR3: 34da6000 CR4: 000006b0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400   
Process kworker/u:5 (pid: 515, ti=f5ba4000 task=f59e5d00 task.ti=f5ba4000)
Stack:
 c102c884 f59e5d00 f5ba5ce4 f5ba5d9c c1237e5d f5fe9600 00000046 f5ba5d58
 c102c587 00000000 f59e5d00 f514b2f8 f5bee600 f59e5d00 00000000 00000000
 00000202 f5ba5d64 c102c5d2 f4a07c80 f5ba5d6c c102c5fe f5ba5d74 00000202
Call Trace:
 [<c102c884>] ? wq_worker_sleeping+0x11/0x5b
 [<c1237e5d>] __schedule+0xa8/0x3bf
 [<c102c587>] ? __queue_work+0x163/0x181
 [<c102c5d2>] ? queue_work_on+0x13/0x1f
 [<c102c5fe>] ? queue_work+0xe/0x10
 [<c1100bc7>] ? put_io_context+0x3e/0x55
 [<c1100c55>] ? put_io_context_active+0x40/0x45
 [<c1238219>] schedule+0x51/0x53
 [<c10208c7>] do_exit+0x5c6/0x5c8
 [<c100430d>] oops_end+0x6b/0x70
 [<c100444c>] die+0x54/0x5c   
 [<c10025c6>] do_trap+0x8a/0xa3
 [<c1002814>] ? do_bounds+0x69/0x69
 [<c100289c>] do_invalid_op+0x88/0x92
 [<f8740eb5>] ? free_client.isra.46+0x3/0x5 [nfsd]
 [<c1238219>] ? schedule+0x51/0x53
 [<c1237540>] ? schedule_timeout+0x15/0x96
 [<c10346a8>] ? check_preempt_curr+0x27/0x62
 [<c1237cea>] ? wait_for_common+0x5c/0xa4
 [<c1034753>] ? try_to_wake_up+0x70/0x70
 [<c1237db3>] ? wait_for_completion+0x12/0x14
 [<c1238b45>] error_code+0x65/0x6c
 [<f8740eb5>] ? free_client.isra.46+0x3/0x5 [nfsd]
 [<f873a72b>] expire_client+0xb1/0xb9 [nfsd]
 [<f873b434>] laundromat_main+0x132/0x1fa [nfsd]
 [<c102b950>] process_one_work+0xe0/0x1aa
 [<f873b302>] ? nfsd4_exchange_id+0x292/0x292 [nfsd]
 [<c102bbd3>] worker_thread+0xa4/0x12c
 [<c102bb2f>] ? rescuer_thread+0xf4/0xf4
 [<c102edda>] kthread+0x66/0x6b
 [<c102ed74>] ? flush_kthread_worker+0x74/0x74
 [<c12392fe>] kernel_thread_helper+0x6/0xd
Code: 8d 43 10 e8 fe 53 00 00 e8 fd 93 20 00 b8 fc ff ff ff 83 7d e0 00 75 04 89 f0 ff d7 e8 22 15 ff ff 55 8b 80 44 01 00 00 89 e5 5d <8b> 40 fc c3 55 31 c0 89 e5 5d c3 55 ba fa 11 2b c1 89 e5 57 56
EIP: [<c102ede9>] kthread_data+0xa/0xe SS:ESP 0068:f5ba5d14
CR2: 00000000fffffffc
---[ end trace 59132c98eb4d8732 ]---
Fixing recursive fault but reboot is needed!
BUG: unable to handle kernel paging request at fffffffc
IP: [<c102ede9>] kthread_data+0xa/0xe
*pdpt = 0000000001397001 *pde = 000000000139b067 *pte = 0000000000000000
Oops: 0000 [#3]

and at this point the system gets thrown into a loop and spews errors
until it eventually reboots, which may be due to my watchdog going
off or something, I'm not sure, I couldn't capture all of the output.


-- 
Jamie Heilman                     http://audible.transient.net/~jamie/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!
  2012-06-10  9:03 NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044! Jamie Heilman
  2012-06-11  3:41 ` Jamie Heilman
@ 2012-06-11 16:14 ` J. Bruce Fields
  2012-06-12 10:13   ` Jamie Heilman
  2012-06-11 18:07 ` Sergey Bolshakov
  2 siblings, 1 reply; 11+ messages in thread
From: J. Bruce Fields @ 2012-06-11 16:14 UTC (permalink / raw)
  To: linux-nfs, linux-kernel

On Sun, Jun 10, 2012 at 09:03:42AM +0000, Jamie Heilman wrote:
> Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
> reproduce this reliably:
> 
> ------------[ cut here ]------------
> kernel BUG at fs/nfsd/nfs4state.c:1044!

That's 

	free_client(struct nfs4_client *clp)
	{
--->        	BUG_ON(!spin_is_locked(&client_lock));


> invalid opcode: 0000 [#1] 
> Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
> 
> Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
> EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
> EIP is at free_client.isra.47+0x3/0x5 [nfsd]
> EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
> ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
>  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
> Stack:
>  e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
>  e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
>  f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
> Call Trace:
>  [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]

And the only free_client call there is:

	spin_lock(&client_lock);
	unhash_client_locked(clp);
	if (atomic_read(&clp->cl_refcount) == 0)
--->		free_client(clp);
	spin_unlock(&client_lock);

So, that's strange.

--b.


>  [<f86a6d36>] nfsd4_setclientid_confirm+0x12e/0x1c7 [nfsd]
>  [<f869f63c>] nfsd4_proc_compound+0x1fd/0x354 [nfsd]
>  [<f86946e3>] nfsd_dispatch+0x89/0x134 [nfsd]
>  [<f86c6d64>] svc_process+0x327/0x4e3 [sunrpc]
>  [<f86940d2>] nfsd+0xd2/0x10b [nfsd]
>  [<f8694000>] ? 0xf8693fff
>  [<c102d53a>] kthread+0x66/0x6b
>  [<c102d4d4>] ? flush_kthread_worker+0x74/0x74
>  [<c123193e>] kernel_thread_helper+0x6/0xd
> Code: 01 80 3e 00 74 04 89 f2 eb a2 8b 4d e8 8b 55 ec 8b 45 f0 0f c8 89 19 89 02 31 c0 83 c4 18 5b 5e 5f 5d c3 55 89 e5 0f 0b 55 89 e5 <0f> 0b 55 89 e5 57 56 89 d6 53 89 c3 8b 78 34 89 d0 e8 b4 8f ff 
> EIP: [<f86ac683>] free_client.isra.47+0x3/0x5 [nfsd] SS:ESP 0068:e9cf9eb4
> ---[ end trace bf1613e00e6f3bbe ]---
> 
> System is a 32-bit Via C7; to reproduce I need only establish a NFSv4
> mount from a client, then reboot the client.  The BUG happens when the
> client attempts to reconnect.  At this point nfs service on the server
> becomes something of a lost cause.  Userspace is Debian stable
> (nfs-utils 1.2.2 based).  I haven't had a chance to bisect the issue
> yet, I'll give that shot in the coming week if nobody knows of any
> obvious fixes.  Let me know if there's any other info I can provide
> that's useful.
> 
> -- 
> Jamie Heilman                     http://audible.transient.net/~jamie/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!
  2012-06-10  9:03 NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044! Jamie Heilman
  2012-06-11  3:41 ` Jamie Heilman
  2012-06-11 16:14 ` J. Bruce Fields
@ 2012-06-11 18:07 ` Sergey Bolshakov
  2 siblings, 0 replies; 11+ messages in thread
From: Sergey Bolshakov @ 2012-06-11 18:07 UTC (permalink / raw)
  To: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 145 bytes --]


Same here, although i do not even need to shutdown client,
oops on server takes place just within minutes after mounting
via v4 on sole client:

[-- Attachment #2: oops --]
[-- Type: text/plain, Size: 12702 bytes --]

------------[ cut here ]------------
kernel BUG at fs/nfsd/nfs4state.c:1044!
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in: nfsd nfs lockd auth_rpcgss nfs_acl autofs4 sunrpc iptable_filter ip_tables x_tables dm_mod ehci_hcd usbcore f75111 usb_common e1000 ext4 mbcache jbd2 sata_vsc libata sd_mod scsi_mod
CPU: 0    Not tainted  (3.4.0-iop32x-alt1 #1)
PC is at expire_client+0x15c/0x16c [nfsd]
LR is at flush_workqueue+0x320/0x330
pc : [<bf2e2c7c>]    lr : [<c002bcd8>]    psr: 60000013
sp : df26bf30  ip : df268044  fp : dc473e05
r10: 0000009a  r9 : c03f3d00  r8 : dfacc000
r7 : df26bf48  r6 : df26bf48  r5 : dfacc010  r4 : dfacc000
r3 : 00000000  r2 : dfacc280  r1 : dfacc280  r0 : df26bf48
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 0000397f  Table: bfabc000  DAC: 00000017
Process kworker/u:3 (pid: 497, stack limit = 0xdf26a278)
Stack: (0xdf26bf30 to 0xdf26c000)
bf20:                                     df26bf30 df26bf30 df073400 0000005a
bf40: 4fd35d85 bf2e372c df26bf48 df26bf48 df229000 dc473e00 bf2e35e8 00000000
bf60: bf2eefb0 c03f3d00 0000009a c002b0e8 df229000 df229000 c03f3b80 df26a000
bf80: df229010 c03ed007 c03f3d00 00000001 c03d321c c002cd20 df229000 df07bb00
bfa0: a0000013 df269f18 df229000 df26bfd4 c002cb3c 00000000 00000000 00000000
bfc0: 00000000 c00304e4 c00096a4 00000000 df229000 00000000 df26bfd8 df26bfd8
bfe0: 00000000 df269f18 c003045c c00096a4 00000013 c00096a4 00000000 00000000
[<bf2e2c7c>] (expire_client+0x15c/0x16c [nfsd]) from [<bf2e372c>] (laundromat_main+0x144/0x26c [nfsd])
[<bf2e372c>] (laundromat_main+0x144/0x26c [nfsd]) from [<c002b0e8>] (process_one_work+0x244/0x3f8)
[<c002b0e8>] (process_one_work+0x244/0x3f8) from [<c002cd20>] (worker_thread+0x1e4/0x314)
[<c002cd20>] (worker_thread+0x1e4/0x314) from [<c00304e4>] (kthread+0x88/0x94)
[<c00304e4>] (kthread+0x88/0x94) from [<c00096a4>] (kernel_thread_exit+0x0/0x8)
Code: 1afffff3 e5943300 e3530000 1a000000 (e7f001f2)
---[ end trace df220aa7c4aab192 ]---
Unable to handle kernel paging request at virtual address fffffffc
pgd = c0004000
[fffffffc] *pgd=bfffe821, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#2] ARM
Modules linked in: nfsd nfs lockd auth_rpcgss nfs_acl autofs4 sunrpc iptable_filter ip_tables x_tables dm_mod ehci_hcd usbcore f75111 usb_common e1000 ext4 mbcache jbd2 sata_vsc libata sd_mod scsi_mod
CPU: 0    Tainted: G      D       (3.4.0-iop32x-alt1 #1)
PC is at kthread_data+0x4/0xc
LR is at wq_worker_sleeping+0xc/0xb4
pc : [<c00306d4>]    lr : [<c002ceb0>]    psr: 20000093
sp : df26bd30  ip : 00000000  fp : df26bdb4
r10: c028a1e8  r9 : df07bbf8  r8 : df07bc74
r7 : 00000001  r6 : df26a000  r5 : c03d38a8  r4 : 00000000
r3 : 00000000  r2 : df800160  r1 : 00000000  r0 : df07bb00
Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0000397f  Table: bfabc000  DAC: 00000015
Process kworker/u:3 (pid: 497, stack limit = 0xdf26a278)
Stack: (0xdf26bd30 to 0xdf26c000)
bd20:                                     df07bb00 c028144c 00000000 00373934
bd40: df07bb30 df10a3e0 c001848c 00001374 df10a3e0 c001848c 00001374 c03d0ca0
bd60: c03ee054 c00ac270 c03ee04c df07bb00 df145540 00000001 df07bb00 c00184bc
bd80: 04208064 df10a3e0 00000000 c03f7c60 df26bd90 df07bb00 00000001 df26bdbc
bda0: 00000001 00000001 df07bbf8 00000001 df07baf8 c0019bc0 c0282168 df26bdbc
bdc0: df26bdbc df07bc40 df26bddc df26bea8 df26a000 df07bb00 00000001 00000000
bde0: c0282168 0000009a dc473e05 c000bea4 df26a278 0000000b df26bea8 bf2e2c7c
be00: 070001f0 df26bedc dfacc000 c0008364 00000006 00000000 00000004 00000000
be20: 00030001 bf2e2c7c 4328ac7d 00000153 00000001 20000013 df34a000 df263400
be40: 00000000 00000000 ded1455c 7fffffff 7fffffff df26a000 00000002 00000000
be60: c03f3d00 00000000 df26bee4 c0280398 df07bb30 df07af30 df07bb00 df07af00
be80: c03d3d10 c03d38a8 df07bb30 c003e890 c03d38a8 7fffffff bf2e2c80 60000013
bea0: ffffffff c0282168 df26bf48 dfacc280 dfacc280 00000000 dfacc000 dfacc010
bec0: df26bf48 df26bf48 dfacc000 c03f3d00 0000009a dc473e05 df268044 df26bf30
bee0: c002bcd8 bf2e2c7c 60000013 ffffffff df26beec 00000000 00000000 df26befc
bf00: df26befc 00000000 df26befc df26befc dc473e05 dfacc000 dfacc010 df26bf48
bf20: df26bf48 dfacc000 0000009a bf2e2bd4 df26bf30 df26bf30 df073400 0000005a
bf40: 4fd35d85 bf2e372c df26bf48 df26bf48 df229000 dc473e00 bf2e35e8 00000000
bf60: bf2eefb0 c03f3d00 0000009a c002b0e8 df229000 df229000 c03f3b80 df26a000
bf80: df229010 c03ed007 c03f3d00 00000001 c03d321c c002cd20 df229000 df07bb00
bfa0: a0000013 df269f18 df229000 df26bfd4 c002cb3c 00000000 00000000 00000000
bfc0: 00000000 c00304e4 c00096a4 00000000 df229000 00000001 df26bfd8 df26bfd8
bfe0: 00000000 df269f18 c003045c c00096a4 00000013 c00096a4 00000000 00000000
[<c00306d4>] (kthread_data+0x4/0xc) from [<c002ceb0>] (wq_worker_sleeping+0xc/0xb4)
[<c002ceb0>] (wq_worker_sleeping+0xc/0xb4) from [<c028144c>] (__schedule+0xd0/0x3d0)
[<c028144c>] (__schedule+0xd0/0x3d0) from [<c0019bc0>] (do_exit+0x6f4/0x72c)
[<c0019bc0>] (do_exit+0x6f4/0x72c) from [<c000bea4>] (die+0x1d8/0x208)
[<c000bea4>] (die+0x1d8/0x208) from [<c0008364>] (do_undefinstr+0x160/0x180)
[<c0008364>] (do_undefinstr+0x160/0x180) from [<c0282168>] (__und_svc+0x48/0x60)
Exception stack(0xdf26bea8 to 0xdf26bef0)
bea0:                   df26bf48 dfacc280 dfacc280 00000000 dfacc000 dfacc010
bec0: df26bf48 df26bf48 dfacc000 c03f3d00 0000009a dc473e05 df268044 df26bf30
bee0: c002bcd8 bf2e2c7c 60000013 ffffffff
[<c0282168>] (__und_svc+0x48/0x60) from [<bf2e2c7c>] (expire_client+0x15c/0x16c [nfsd])
[<bf2e2c7c>] (expire_client+0x15c/0x16c [nfsd]) from [<bf2e372c>] (laundromat_main+0x144/0x26c [nfsd])
[<bf2e372c>] (laundromat_main+0x144/0x26c [nfsd]) from [<c002b0e8>] (process_one_work+0x244/0x3f8)
[<c002b0e8>] (process_one_work+0x244/0x3f8) from [<c002cd20>] (worker_thread+0x1e4/0x314)
[<c002cd20>] (worker_thread+0x1e4/0x314) from [<c00304e4>] (kthread+0x88/0x94)
[<c00304e4>] (kthread+0x88/0x94) from [<c00096a4>] (kernel_thread_exit+0x0/0x8)
Code: c03f3fa8 c02899c4 c0289018 e5903148 (e5130004)
---[ end trace df220aa7c4aab193 ]---
Fixing recursive fault but reboot is needed!
Unable to handle kernel paging request at virtual address fffffffc
pgd = c0004000
[fffffffc] *pgd=bfffe821, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#3] ARM
Modules linked in: nfsd nfs lockd auth_rpcgss nfs_acl autofs4 sunrpc iptable_filter ip_tables x_tables dm_mod ehci_hcd usbcore f75111 usb_common e1000 ext4 mbcache jbd2 sata_vsc libata sd_mod scsi_mod
CPU: 0    Tainted: G      D       (3.4.0-iop32x-alt1 #1)
PC is at kthread_data+0x4/0xc
LR is at wq_worker_sleeping+0xc/0xb4
pc : [<c00306d4>]    lr : [<c002ceb0>]    psr: 20000093
sp : df26bac8  ip : c03ee554  fp : df26bb4c
r10: c028a1e8  r9 : 00000028  r8 : df07bc74
r7 : 00000001  r6 : df26a000  r5 : c03d38a8  r4 : 00000000
r3 : 00000000  r2 : df800160  r1 : 00000000  r0 : df07bb00
Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0000397f  Table: bfabc000  DAC: 00000015
Process kworker/u:3 (pid: 497, stack limit = 0xdf26a278)
Stack: (0xdf26bac8 to 0xdf26c000)
bac0:                   df07bb00 c028144c 60000013 c00171f4 00000001 60000013
bae0: 00000001 c03f29e4 00000001 c001bca0 00000000 c03d5d54 00000000 ffffffff
bb00: 00000001 00ffffff 00000000 df07bb00 0000000b df07bb00 00000001 00000017
bb20: 00000028 00000017 df26bdb4 df07bb00 0000000b df07bb00 00000001 00000017
bb40: 00000028 00000017 df26bdb4 c00195e4 00000028 00000017 00000001 c033d206
bb60: df26bb74 df26bce8 df26a000 df07bb00 00000001 00000017 00000028 00000017
bb80: df26bdb4 c000bea4 df26a278 0000000b fffffffc 00000017 00000000 df26bce8
bba0: 00000000 c0010074 df26bce8 00000007 fffffffc df07bb00 00000000 c02838b8
bbc0: 00000000 00000000 0a91b739 00000000 14303a9d c03d38a8 df07bb30 df366720
bbe0: df07bb00 df3b6f00 c03d3d10 c03d38a8 df26bc24 c003e8b0 c03d38a8 c028a1e8
bc00: 00000000 00000000 00000001 00000000 00000000 c003aa2c df3b6f00 df3b6f00
bc20: df26bc3c 00000017 00000007 fffffffc df26bce8 c03ce104 df07bbf8 c028a1e8
bc40: df26bdb4 c00083b4 00000002 df26bc6c df26bd49 c0176344 00000000 00000000
bc60: c03d38a8 dc60e630 00000000 00343937 dc60e630 ffffffff ffffffff df580a80
bc80: df580a80 000057da df26a000 00000000 df580a80 c00885c4 df26bc98 df26bc98
bca0: df581364 df5812e0 df5812e0 df580a80 df580a80 df26bd30 df26bd49 c03d38f0
bcc0: df07bb30 00000001 c03d38a8 00000001 c00306d4 20000093 ffffffff df26bd1c
bce0: df07bc74 c0282078 df07bb00 00000000 df800160 00000000 00000000 c03d38a8
bd00: df26a000 00000001 df07bc74 df07bbf8 c028a1e8 df26bdb4 00000000 df26bd30
bd20: c002ceb0 c00306d4 20000093 ffffffff df07bb00 c028144c 00000000 00373934
bd40: df07bb30 df10a3e0 c001848c 00001374 df10a3e0 c001848c 00001374 c03d0ca0
bd60: c03ee054 c00ac270 c03ee04c df07bb00 df145540 00000001 df07bb00 c00184bc
bd80: 04208064 df10a3e0 00000000 c03f7c60 df26bd90 df07bb00 00000001 df26bdbc
bda0: 00000001 00000001 df07bbf8 00000001 df07baf8 c0019bc0 c0282168 df26bdbc
bdc0: df26bdbc df07bc40 df26bddc df26bea8 df26a000 df07bb00 00000001 00000000
bde0: c0282168 0000009a dc473e05 c000bea4 df26a278 0000000b df26bea8 bf2e2c7c
be00: 070001f0 df26bedc dfacc000 c0008364 00000006 00000000 00000004 00000000
be20: 00030001 bf2e2c7c 4328ac7d 00000153 00000001 20000013 df34a000 df263400
be40: 00000000 00000000 ded1455c 7fffffff 7fffffff df26a000 00000002 00000000
be60: c03f3d00 00000000 df26bee4 c0280398 df07bb30 df07af30 df07bb00 df07af00
be80: c03d3d10 c03d38a8 df07bb30 c003e890 c03d38a8 7fffffff bf2e2c80 60000013
bea0: ffffffff c0282168 df26bf48 dfacc280 dfacc280 00000000 dfacc000 dfacc010
bec0: df26bf48 df26bf48 dfacc000 c03f3d00 0000009a dc473e05 df268044 df26bf30
bee0: c002bcd8 bf2e2c7c 60000013 ffffffff df26beec 00000000 00000000 df26befc
bf00: df26befc 00000000 df26befc df26befc dc473e05 dfacc000 dfacc010 df26bf48
bf20: df26bf48 dfacc000 0000009a bf2e2bd4 df26bf30 df26bf30 df073400 0000005a
bf40: 4fd35d85 bf2e372c df26bf48 df26bf48 df229000 dc473e00 bf2e35e8 00000000
bf60: bf2eefb0 c03f3d00 0000009a c002b0e8 df229000 df229000 c03f3b80 df26a000
bf80: df229010 c03ed007 c03f3d00 00000001 c03d321c c002cd20 df229000 df07bb00
bfa0: a0000013 df269f18 df229000 df26bfd4 c002cb3c 00000000 00000000 00000000
bfc0: 00000000 c00304e4 c00096a4 00000000 df229000 00000001 df26bfd8 df26bfd8
bfe0: 00000000 df269f18 c003045c c00096a4 00000013 c00096a4 00000000 00000000
[<c00306d4>] (kthread_data+0x4/0xc) from [<c002ceb0>] (wq_worker_sleeping+0xc/0xb4)
[<c002ceb0>] (wq_worker_sleeping+0xc/0xb4) from [<c028144c>] (__schedule+0xd0/0x3d0)
[<c028144c>] (__schedule+0xd0/0x3d0) from [<c00195e4>] (do_exit+0x118/0x72c)
[<c00195e4>] (do_exit+0x118/0x72c) from [<c000bea4>] (die+0x1d8/0x208)
[<c000bea4>] (die+0x1d8/0x208) from [<c0010074>] (__do_kernel_fault+0x64/0x88)
[<c0010074>] (__do_kernel_fault+0x64/0x88) from [<c02838b8>] (do_page_fault+0x35c/0x384)
[<c02838b8>] (do_page_fault+0x35c/0x384) from [<c00083b4>] (do_DataAbort+0x30/0x98)
[<c00083b4>] (do_DataAbort+0x30/0x98) from [<c0282078>] (__dabt_svc+0x38/0x60)
Exception stack(0xdf26bce8 to 0xdf26bd30)
bce0:                   df07bb00 00000000 df800160 00000000 00000000 c03d38a8
bd00: df26a000 00000001 df07bc74 df07bbf8 c028a1e8 df26bdb4 00000000 df26bd30
bd20: c002ceb0 c00306d4 20000093 ffffffff
[<c0282078>] (__dabt_svc+0x38/0x60) from [<c00306d4>] (kthread_data+0x4/0xc)
[<c00306d4>] (kthread_data+0x4/0xc) from [<c002ceb0>] (wq_worker_sleeping+0xc/0xb4)
[<c002ceb0>] (wq_worker_sleeping+0xc/0xb4) from [<c028144c>] (__schedule+0xd0/0x3d0)
[<c028144c>] (__schedule+0xd0/0x3d0) from [<c0019bc0>] (do_exit+0x6f4/0x72c)
[<c0019bc0>] (do_exit+0x6f4/0x72c) from [<c000bea4>] (die+0x1d8/0x208)
[<c000bea4>] (die+0x1d8/0x208) from [<c0008364>] (do_undefinstr+0x160/0x180)
[<c0008364>] (do_undefinstr+0x160/0x180) from [<c0282168>] (__und_svc+0x48/0x60)
Exception stack(0xdf26bea8 to 0xdf26bef0)
bea0:                   df26bf48 dfacc280 dfacc280 00000000 dfacc000 dfacc010
bec0: df26bf48 df26bf48 dfacc000 c03f3d00 0000009a dc473e05 df268044 df26bf30
bee0: c002bcd8 bf2e2c7c 60000013 ffffffff
[<c0282168>] (__und_svc+0x48/0x60) from [<bf2e2c7c>] (expire_client+0x15c/0x16c [nfsd])
[<bf2e2c7c>] (expire_client+0x15c/0x16c [nfsd]) from [<bf2e372c>] (laundromat_main+0x144/0x26c [nfsd])
[<bf2e372c>] (laundromat_main+0x144/0x26c [nfsd]) from [<c002b0e8>] (process_one_work+0x244/0x3f8)
[<c002b0e8>] (process_one_work+0x244/0x3f8) from [<c002cd20>] (worker_thread+0x1e4/0x314)
[<c002cd20>] (worker_thread+0x1e4/0x314) from [<c00304e4>] (kthread+0x88/0x94)
[<c00304e4>] (kthread+0x88/0x94) from [<c00096a4>] (kernel_thread_exit+0x0/0x8)
Code: c03f3fa8 c02899c4 c0289018 e5903148 (e5130004)
---[ end trace df220aa7c4aab194 ]---

[-- Attachment #3: Type: text/plain, Size: 5 bytes --]


-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!
  2012-06-11 16:14 ` J. Bruce Fields
@ 2012-06-12 10:13   ` Jamie Heilman
  2012-06-12 11:18     ` J. Bruce Fields
  0 siblings, 1 reply; 11+ messages in thread
From: Jamie Heilman @ 2012-06-12 10:13 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel

J. Bruce Fields wrote:
> On Sun, Jun 10, 2012 at 09:03:42AM +0000, Jamie Heilman wrote:
> > Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
> > reproduce this reliably:
> > 
> > ------------[ cut here ]------------
> > kernel BUG at fs/nfsd/nfs4state.c:1044!
> 
> That's 
> 
> 	free_client(struct nfs4_client *clp)
> 	{
> --->        	BUG_ON(!spin_is_locked(&client_lock));
> 
> 
> > invalid opcode: 0000 [#1] 
> > Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
> > 
> > Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
> > EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
> > EIP is at free_client.isra.47+0x3/0x5 [nfsd]
> > EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
> > ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
> >  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> > CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
> > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > DR6: ffff0ff0 DR7: 00000400
> > Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
> > Stack:
> >  e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
> >  e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
> >  f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
> > Call Trace:
> >  [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
> 
> And the only free_client call there is:
> 
> 	spin_lock(&client_lock);
> 	unhash_client_locked(clp);
> 	if (atomic_read(&clp->cl_refcount) == 0)
> --->		free_client(clp);
> 	spin_unlock(&client_lock);
> 
> So, that's strange.

OK, other observations which may or may not have any value:
- I can't trigger the BUG if I build with CONFIG_SMP (which obviously
  changes the RCU as well)
- I can't trigger the BUG if I build with CONFIG_DEBUG_SPINLOCK
- the "RPC: fragment too large" errors I mentioned appear to only occur
  with a 3.5-rc client; I can't repro them with a 3.4 client

-- 
Jamie Heilman                     http://audible.transient.net/~jamie/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!
  2012-06-12 10:13   ` Jamie Heilman
@ 2012-06-12 11:18     ` J. Bruce Fields
  2012-06-12 12:36       ` J. Bruce Fields
  0 siblings, 1 reply; 11+ messages in thread
From: J. Bruce Fields @ 2012-06-12 11:18 UTC (permalink / raw)
  To: linux-nfs, linux-kernel

On Tue, Jun 12, 2012 at 10:13:39AM +0000, Jamie Heilman wrote:
> J. Bruce Fields wrote:
> > On Sun, Jun 10, 2012 at 09:03:42AM +0000, Jamie Heilman wrote:
> > > Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
> > > reproduce this reliably:
> > > 
> > > ------------[ cut here ]------------
> > > kernel BUG at fs/nfsd/nfs4state.c:1044!
> > 
> > That's 
> > 
> > 	free_client(struct nfs4_client *clp)
> > 	{
> > --->        	BUG_ON(!spin_is_locked(&client_lock));
> > 
> > 
> > > invalid opcode: 0000 [#1] 
> > > Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
> > > 
> > > Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
> > > EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
> > > EIP is at free_client.isra.47+0x3/0x5 [nfsd]
> > > EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
> > > ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
> > >  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> > > CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
> > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > DR6: ffff0ff0 DR7: 00000400
> > > Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
> > > Stack:
> > >  e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
> > >  e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
> > >  f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
> > > Call Trace:
> > >  [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
> > 
> > And the only free_client call there is:
> > 
> > 	spin_lock(&client_lock);
> > 	unhash_client_locked(clp);
> > 	if (atomic_read(&clp->cl_refcount) == 0)
> > --->		free_client(clp);
> > 	spin_unlock(&client_lock);
> > 
> > So, that's strange.
> 
> OK, other observations which may or may not have any value:
> - I can't trigger the BUG if I build with CONFIG_SMP (which obviously
>   changes the RCU as well)
> - I can't trigger the BUG if I build with CONFIG_DEBUG_SPINLOCK

Oh, OK, so probably spin_is_locked() just isn't reliable in the
!CONFIG_SMP case.  That would make sense.

So at a minimum we should change that BUG_ON to a WARN_ON_ONCE.

--b.

> - the "RPC: fragment too large" errors I mentioned appear to only occur
>   with a 3.5-rc client; I can't repro them with a 3.4 client
> 
> -- 
> Jamie Heilman                     http://audible.transient.net/~jamie/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!
  2012-06-12 11:18     ` J. Bruce Fields
@ 2012-06-12 12:36       ` J. Bruce Fields
  2012-06-13  6:06         ` Jamie Heilman
  2012-06-19  6:15         ` NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044! Benny Halevy
  0 siblings, 2 replies; 11+ messages in thread
From: J. Bruce Fields @ 2012-06-12 12:36 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel

On Tue, Jun 12, 2012 at 07:18:03AM -0400, J. Bruce Fields wrote:
> On Tue, Jun 12, 2012 at 10:13:39AM +0000, Jamie Heilman wrote:
> > J. Bruce Fields wrote:
> > > On Sun, Jun 10, 2012 at 09:03:42AM +0000, Jamie Heilman wrote:
> > > > Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
> > > > reproduce this reliably:
> > > > 
> > > > ------------[ cut here ]------------
> > > > kernel BUG at fs/nfsd/nfs4state.c:1044!
> > > 
> > > That's 
> > > 
> > > 	free_client(struct nfs4_client *clp)
> > > 	{
> > > --->        	BUG_ON(!spin_is_locked(&client_lock));
> > > 
> > > 
> > > > invalid opcode: 0000 [#1] 
> > > > Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
> > > > 
> > > > Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
> > > > EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
> > > > EIP is at free_client.isra.47+0x3/0x5 [nfsd]
> > > > EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
> > > > ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
> > > >  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> > > > CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
> > > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > > DR6: ffff0ff0 DR7: 00000400
> > > > Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
> > > > Stack:
> > > >  e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
> > > >  e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
> > > >  f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
> > > > Call Trace:
> > > >  [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
> > > 
> > > And the only free_client call there is:
> > > 
> > > 	spin_lock(&client_lock);
> > > 	unhash_client_locked(clp);
> > > 	if (atomic_read(&clp->cl_refcount) == 0)
> > > --->		free_client(clp);
> > > 	spin_unlock(&client_lock);
> > > 
> > > So, that's strange.
> > 
> > OK, other observations which may or may not have any value:
> > - I can't trigger the BUG if I build with CONFIG_SMP (which obviously
> >   changes the RCU as well)
> > - I can't trigger the BUG if I build with CONFIG_DEBUG_SPINLOCK
> 
> Oh, OK, so probably spin_is_locked() just isn't reliable in the
> !CONFIG_SMP case.  That would make sense.
> 
> So at a minimum we should change that BUG_ON to a WARN_ON_ONCE.

Doh.  Googline around--looks like we want something like this.

I've got a couple other bugfixes to collect and then I'll try to get
this to Linus by the end of the week.

--b.


commit 4cdc12ca178dfa74a9619451be41419617424a07
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Tue Jun 12 08:28:48 2012 -0400

    nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels
    
    Most frequent symptom was a BUG triggering in expire_client, with the
    server locking up shortly thereafter.
    
    Introduced by 508dc6e110c6dbdc0bbe84298ccfe22de7538486 "nfsd41:
    free_session/free_client must be called under the client_lock".
    
    Cc: stable@kernel.org
    Cc: Benny Halevy <bhalevy@tonian.com>
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 8fdc9ec..94effd5 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -900,7 +900,7 @@ static void free_session(struct kref *kref)
 	struct nfsd4_session *ses;
 	int mem;
 
-	BUG_ON(!spin_is_locked(&client_lock));
+	lockdep_assert_held(&client_lock);
 	ses = container_of(kref, struct nfsd4_session, se_ref);
 	nfsd4_del_conns(ses);
 	spin_lock(&nfsd_drc_lock);
@@ -1080,7 +1080,7 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name)
 static inline void
 free_client(struct nfs4_client *clp)
 {
-	BUG_ON(!spin_is_locked(&client_lock));
+	lockdep_assert_held(&client_lock);
 	while (!list_empty(&clp->cl_sessions)) {
 		struct nfsd4_session *ses;
 		ses = list_entry(clp->cl_sessions.next, struct nfsd4_session,

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!
  2012-06-12 12:36       ` J. Bruce Fields
@ 2012-06-13  6:06         ` Jamie Heilman
  2012-06-13  9:27           ` RPC: fragment too large with transition to 3.4 Jamie Heilman
  2012-06-19  6:15         ` NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044! Benny Halevy
  1 sibling, 1 reply; 11+ messages in thread
From: Jamie Heilman @ 2012-06-13  6:06 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs, linux-kernel

J. Bruce Fields wrote:
> On Tue, Jun 12, 2012 at 07:18:03AM -0400, J. Bruce Fields wrote:
> > On Tue, Jun 12, 2012 at 10:13:39AM +0000, Jamie Heilman wrote:
> > > J. Bruce Fields wrote:
> > > > On Sun, Jun 10, 2012 at 09:03:42AM +0000, Jamie Heilman wrote:
> > > > > Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
> > > > > reproduce this reliably:
> > > > > 
> > > > > ------------[ cut here ]------------
> > > > > kernel BUG at fs/nfsd/nfs4state.c:1044!
> > > > 
> > > > That's 
> > > > 
> > > > 	free_client(struct nfs4_client *clp)
> > > > 	{
> > > > --->        	BUG_ON(!spin_is_locked(&client_lock));
> > > > 
> > > > 
> > > > > invalid opcode: 0000 [#1] 
> > > > > Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
> > > > > 
> > > > > Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
> > > > > EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
> > > > > EIP is at free_client.isra.47+0x3/0x5 [nfsd]
> > > > > EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
> > > > > ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
> > > > >  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> > > > > CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
> > > > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > > > DR6: ffff0ff0 DR7: 00000400
> > > > > Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
> > > > > Stack:
> > > > >  e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
> > > > >  e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
> > > > >  f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
> > > > > Call Trace:
> > > > >  [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
> > > > 
> > > > And the only free_client call there is:
> > > > 
> > > > 	spin_lock(&client_lock);
> > > > 	unhash_client_locked(clp);
> > > > 	if (atomic_read(&clp->cl_refcount) == 0)
> > > > --->		free_client(clp);
> > > > 	spin_unlock(&client_lock);
> > > > 
> > > > So, that's strange.
> > > 
> > > OK, other observations which may or may not have any value:
> > > - I can't trigger the BUG if I build with CONFIG_SMP (which obviously
> > >   changes the RCU as well)
> > > - I can't trigger the BUG if I build with CONFIG_DEBUG_SPINLOCK
> > 
> > Oh, OK, so probably spin_is_locked() just isn't reliable in the
> > !CONFIG_SMP case.  That would make sense.
> > 
> > So at a minimum we should change that BUG_ON to a WARN_ON_ONCE.
> 
> Doh.  Googline around--looks like we want something like this.
> 
> I've got a couple other bugfixes to collect and then I'll try to get
> this to Linus by the end of the week.
> 
> --b.
> 
> 
> commit 4cdc12ca178dfa74a9619451be41419617424a07
> Author: J. Bruce Fields <bfields@redhat.com>
> Date:   Tue Jun 12 08:28:48 2012 -0400
> 
>     nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels
>     
>     Most frequent symptom was a BUG triggering in expire_client, with the
>     server locking up shortly thereafter.
>     
>     Introduced by 508dc6e110c6dbdc0bbe84298ccfe22de7538486 "nfsd41:
>     free_session/free_client must be called under the client_lock".
>     
>     Cc: stable@kernel.org
>     Cc: Benny Halevy <bhalevy@tonian.com>
>     Signed-off-by: J. Bruce Fields <bfields@redhat.com>

Yeah, this fixes the BUG for me.  It's looking like my issues with
"RPC: fragment too large" may be something else entirely at this
point, I've noticed other weird network behavior that I'm gonna have
to try to isolate before I keep blaming nfs changes.  Though for some
reason my /proc/fs/nfsd/max_block_size ends up only 128KiB w/3.4 where
it was 512KiB w/3.3.  But all that aside, this patch fixes the BUG I was
seeing, so thanks!

-- 
Jamie Heilman                     http://audible.transient.net/~jamie/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RPC: fragment too large with transition to 3.4
  2012-06-13  6:06         ` Jamie Heilman
@ 2012-06-13  9:27           ` Jamie Heilman
  2012-06-13 15:56             ` J. Bruce Fields
  0 siblings, 1 reply; 11+ messages in thread
From: Jamie Heilman @ 2012-06-13  9:27 UTC (permalink / raw)
  To: J. Bruce Fields, J. Bruce Fields, linux-nfs, linux-kernel

Jamie Heilman wrote:
> It's looking like my issues with "RPC: fragment too large" may be
> something else entirely at this point, I've noticed other weird
> network behavior that I'm gonna have to try to isolate before I keep
> blaming nfs changes.  Though for some reason my
> /proc/fs/nfsd/max_block_size ends up only 128KiB w/3.4 where it was
> 512KiB w/3.3.

OK, I get it now.  32-bit PAE system w/4G of RAM (minus a chunk for
the IGP video etc.) for my NFS server, and the max_block_size
calculation changed significantly in commit
508f92275624fc755104b17945bdc822936f1918 to account for rpc buffers
only being in low memory.  That means whereas in 3.3 the math came out
to having a target size of roughly 843241 my new target size in 3.4 is
only 219959-ish, so choosing 128KiB is understandable.  The problem
was that all my clients had negotiated their nfs mounts against the
v3.3 value of 512KiB, and when I rebooted into 3.4... they hit the
wall attempting larger transfers and become uselessly stuck at that
point.  If I remount everything before doing any large transfers, then
it negotiates a lower wsize and things work fine.  So everything is
working as planned I suppose... the transition between 3.3 and 3.4 is
just a bit rough.

-- 
Jamie Heilman                     http://audible.transient.net/~jamie/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RPC: fragment too large with transition to 3.4
  2012-06-13  9:27           ` RPC: fragment too large with transition to 3.4 Jamie Heilman
@ 2012-06-13 15:56             ` J. Bruce Fields
  0 siblings, 0 replies; 11+ messages in thread
From: J. Bruce Fields @ 2012-06-13 15:56 UTC (permalink / raw)
  To: J. Bruce Fields, linux-nfs, linux-kernel

On Wed, Jun 13, 2012 at 09:27:03AM +0000, Jamie Heilman wrote:
> Jamie Heilman wrote:
> > It's looking like my issues with "RPC: fragment too large" may be
> > something else entirely at this point, I've noticed other weird
> > network behavior that I'm gonna have to try to isolate before I keep
> > blaming nfs changes.  Though for some reason my
> > /proc/fs/nfsd/max_block_size ends up only 128KiB w/3.4 where it was
> > 512KiB w/3.3.
> 
> OK, I get it now.  32-bit PAE system w/4G of RAM (minus a chunk for
> the IGP video etc.) for my NFS server, and the max_block_size
> calculation changed significantly in commit
> 508f92275624fc755104b17945bdc822936f1918 to account for rpc buffers
> only being in low memory.  That means whereas in 3.3 the math came out
> to having a target size of roughly 843241 my new target size in 3.4 is
> only 219959-ish, so choosing 128KiB is understandable.  The problem
> was that all my clients had negotiated their nfs mounts against the
> v3.3 value of 512KiB, and when I rebooted into 3.4... they hit the
> wall attempting larger transfers and become uselessly stuck at that
> point.  If I remount everything before doing any large transfers, then
> it negotiates a lower wsize and things work fine.  So everything is
> working as planned I suppose... the transition between 3.3 and 3.4 is
> just a bit rough.

Oh, got it, thanks.  Yes, now I remember I've seen that problem before.

Perhaps we should be more careful about tweaks to that calculation that
may result in a decreased r/wsize.  You could also get into the same
situation if you took the server down to change the amount of RAM, but
only if you were *removing* memory, which is probably unusual.

Best might be if distributions set max_block_size--it should be easier
for userspace to remember the value across reboots.

While we're at it we also want to create an /etc/nfsd.conf that rpc.nfsd
could read, for setting this, and number of threads, and a few other
things.  The systemd people would prefer that to the current practice of
sourcing a shell script in /etc/sysconfig or /etc/default.

We could warn about this problem ("don't decrease max_block_size on a
server without unmounting clients first") next to that variable in the
config file.

I think we can do the same calculation as nfsd_create_serv() does from
userspace to set an initial default.  I don't know if that should happen
on package install or on first run of rpc.nfsd.

For now that's a project looking for a volunteer, though.

--b.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!
  2012-06-12 12:36       ` J. Bruce Fields
  2012-06-13  6:06         ` Jamie Heilman
@ 2012-06-19  6:15         ` Benny Halevy
  1 sibling, 0 replies; 11+ messages in thread
From: Benny Halevy @ 2012-06-19  6:15 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs, linux-kernel

On 2012-06-12 15:36, J. Bruce Fields wrote:
> On Tue, Jun 12, 2012 at 07:18:03AM -0400, J. Bruce Fields wrote:
>> On Tue, Jun 12, 2012 at 10:13:39AM +0000, Jamie Heilman wrote:
>>> J. Bruce Fields wrote:
>>>> On Sun, Jun 10, 2012 at 09:03:42AM +0000, Jamie Heilman wrote:
>>>>> Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
>>>>> reproduce this reliably:
>>>>>
>>>>> ------------[ cut here ]------------
>>>>> kernel BUG at fs/nfsd/nfs4state.c:1044!
>>>>
>>>> That's 
>>>>
>>>> 	free_client(struct nfs4_client *clp)
>>>> 	{
>>>> --->        	BUG_ON(!spin_is_locked(&client_lock));
>>>>
>>>>
>>>>> invalid opcode: 0000 [#1] 
>>>>> Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
>>>>>
>>>>> Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
>>>>> EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
>>>>> EIP is at free_client.isra.47+0x3/0x5 [nfsd]
>>>>> EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
>>>>> ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
>>>>>  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
>>>>> CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
>>>>> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>>>>> DR6: ffff0ff0 DR7: 00000400
>>>>> Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
>>>>> Stack:
>>>>>  e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
>>>>>  e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
>>>>>  f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
>>>>> Call Trace:
>>>>>  [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
>>>>
>>>> And the only free_client call there is:
>>>>
>>>> 	spin_lock(&client_lock);
>>>> 	unhash_client_locked(clp);
>>>> 	if (atomic_read(&clp->cl_refcount) == 0)
>>>> --->		free_client(clp);
>>>> 	spin_unlock(&client_lock);
>>>>
>>>> So, that's strange.
>>>
>>> OK, other observations which may or may not have any value:
>>> - I can't trigger the BUG if I build with CONFIG_SMP (which obviously
>>>   changes the RCU as well)
>>> - I can't trigger the BUG if I build with CONFIG_DEBUG_SPINLOCK
>>
>> Oh, OK, so probably spin_is_locked() just isn't reliable in the
>> !CONFIG_SMP case.  That would make sense.
>>
>> So at a minimum we should change that BUG_ON to a WARN_ON_ONCE.
> 
> Doh.  Googline around--looks like we want something like this.
> 
> I've got a couple other bugfixes to collect and then I'll try to get
> this to Linus by the end of the week.
> 
> --b.
> 
> 
> commit 4cdc12ca178dfa74a9619451be41419617424a07
> Author: J. Bruce Fields <bfields@redhat.com>
> Date:   Tue Jun 12 08:28:48 2012 -0400
> 
>     nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels
>     
>     Most frequent symptom was a BUG triggering in expire_client, with the
>     server locking up shortly thereafter.
>     
>     Introduced by 508dc6e110c6dbdc0bbe84298ccfe22de7538486 "nfsd41:
>     free_session/free_client must be called under the client_lock".
>     
>     Cc: stable@kernel.org
>     Cc: Benny Halevy <bhalevy@tonian.com>

Ack.  Thanks!

Benny

>     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 8fdc9ec..94effd5 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -900,7 +900,7 @@ static void free_session(struct kref *kref)
>  	struct nfsd4_session *ses;
>  	int mem;
>  
> -	BUG_ON(!spin_is_locked(&client_lock));
> +	lockdep_assert_held(&client_lock);
>  	ses = container_of(kref, struct nfsd4_session, se_ref);
>  	nfsd4_del_conns(ses);
>  	spin_lock(&nfsd_drc_lock);
> @@ -1080,7 +1080,7 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name)
>  static inline void
>  free_client(struct nfs4_client *clp)
>  {
> -	BUG_ON(!spin_is_locked(&client_lock));
> +	lockdep_assert_held(&client_lock);
>  	while (!list_empty(&clp->cl_sessions)) {
>  		struct nfsd4_session *ses;
>  		ses = list_entry(clp->cl_sessions.next, struct nfsd4_session,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-06-19  6:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-10  9:03 NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044! Jamie Heilman
2012-06-11  3:41 ` Jamie Heilman
2012-06-11 16:14 ` J. Bruce Fields
2012-06-12 10:13   ` Jamie Heilman
2012-06-12 11:18     ` J. Bruce Fields
2012-06-12 12:36       ` J. Bruce Fields
2012-06-13  6:06         ` Jamie Heilman
2012-06-13  9:27           ` RPC: fragment too large with transition to 3.4 Jamie Heilman
2012-06-13 15:56             ` J. Bruce Fields
2012-06-19  6:15         ` NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044! Benny Halevy
2012-06-11 18:07 ` Sergey Bolshakov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.