linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CPU lockup in or near new filecache code
@ 2019-12-10 16:27 Chuck Lever
  2019-12-10 18:49 ` Bruce Fields
  2019-12-10 20:45 ` Trond Myklebust
  0 siblings, 2 replies; 11+ messages in thread
From: Chuck Lever @ 2019-12-10 16:27 UTC (permalink / raw)
  To: Bruce Fields, Jeff Layton; +Cc: Linux NFS Mailing List

Under stress, I'm seeing BUGs similar to this quite a bit on my v5.5-rc1 NFS server.
As near as I can tell, the nfsd thread is looping under nfsd_file_acquire somewhere.

Dec  9 13:22:52 klimt kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:2002]
Dec  9 13:22:52 klimt kernel: Modules linked in: rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 iTCO_wdt iTCO_vendor_support aesni_intel glue_helper crypto_simd cryptd pcspkr rpcrdma i2c_i801 lpc_ich rdma_ucm mfd_core ib_iser rdma_cm mei_me mei iw_cm raid0 ib_cm libiscsi sg scsi_transport_iscsi ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sr_mod cdrom sd_mod qedr ast drm_vram_helper drm_ttm_helper ttm drm_kms_helper crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata igb mlx4_core dca i2c_algo_bit i2c_core nvme nvme_core qede qed dm_mirror dm_region_hash dm_log dm_mod crc8 ib_uverbs dax ib_core
Dec  9 13:22:52 klimt kernel: CPU: 0 PID: 2002 Comm: nfsd Tainted: G             L    5.5.0-rc1-00002-gc56d8b96a170 #1400
Dec  9 13:22:52 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
Dec  9 13:22:52 klimt kernel: RIP: 0010:put_task_struct+0xc/0x28
Dec  9 13:22:52 klimt kernel: Code: 05 11 01 00 75 17 48 c7 c7 b1 b4 eb 81 31 c0 c6 05 4f 05 11 01 01 e8 a7 ad fd ff 0f 0b c3 48 8d 57 20 83 c8 ff f0 0f c1 47 20 <83> f8 01 75 05 e9 cf 7e fd ff 85 c0 7f 0d be 03 00 00 00 48 89 d7
Dec  9 13:22:52 klimt kernel: RSP: 0018:ffffc90000f7bb88 EFLAGS: 00000213 ORIG_RAX: ffffffffffffff13
Dec  9 13:22:52 klimt kernel: RAX: 0000000000000002 RBX: ffff888844120000 RCX: 0000000000000000
Dec  9 13:22:52 klimt kernel: RDX: ffff888844120020 RSI: 0000000000000000 RDI: ffff888844120000
Dec  9 13:22:52 klimt kernel: RBP: 0000000000000001 R08: ffff888817527b00 R09: ffffffff8121d707
Dec  9 13:22:52 klimt kernel: R10: ffffc90000f7bbc8 R11: 000000008e6571d9 R12: ffff888855055750
Dec  9 13:22:52 klimt kernel: R13: 0000000000000000 R14: ffff88882dcd9320 R15: ffff88881741a8c0
Dec  9 13:22:52 klimt kernel: FS:  0000000000000000(0000) GS:ffff88885fc00000(0000) knlGS:0000000000000000
Dec  9 13:22:52 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  9 13:22:52 klimt kernel: CR2: 00007f816a386000 CR3: 0000000855686003 CR4: 00000000001606f0
Dec  9 13:22:52 klimt kernel: Call Trace:
Dec  9 13:22:52 klimt kernel: wake_up_q+0x34/0x40
Dec  9 13:22:52 klimt kernel: __mutex_unlock_slowpath.isra.14+0x9d/0xeb
Dec  9 13:22:52 klimt kernel: fsnotify_add_mark+0x53/0x5d
Dec  9 13:22:52 klimt kernel: nfsd_file_acquire+0x423/0x576 [nfsd]
Dec  9 13:22:52 klimt kernel: nfs4_get_vfs_file+0x14c/0x20f [nfsd]
Dec  9 13:22:52 klimt kernel: nfsd4_process_open2+0xcd6/0xd98 [nfsd]
Dec  9 13:22:52 klimt kernel: ? fh_verify+0x42e/0x4ef [nfsd]
Dec  9 13:22:52 klimt kernel: ? nfsd4_process_open1+0x233/0x29d [nfsd]
Dec  9 13:22:52 klimt kernel: nfsd4_open+0x500/0x5cb [nfsd]
Dec  9 13:22:52 klimt kernel: nfsd4_proc_compound+0x32a/0x5c7 [nfsd]
Dec  9 13:22:52 klimt kernel: nfsd_dispatch+0x102/0x1e2 [nfsd]
Dec  9 13:22:52 klimt kernel: svc_process_common+0x3b3/0x65d [sunrpc]
Dec  9 13:22:52 klimt kernel: ? svc_xprt_put+0x12/0x21 [sunrpc]
Dec  9 13:22:52 klimt kernel: ? nfsd_svc+0x2be/0x2be [nfsd]
Dec  9 13:22:52 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
Dec  9 13:22:52 klimt kernel: svc_process+0xf6/0x115 [sunrpc]
Dec  9 13:22:52 klimt kernel: nfsd+0xf2/0x149 [nfsd]
Dec  9 13:22:52 klimt kernel: kthread+0xf6/0xfb
Dec  9 13:22:52 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
Dec  9 13:22:52 klimt kernel: ret_from_fork+0x3a/0x50


--
Chuck Lever




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CPU lockup in or near new filecache code
  2019-12-10 16:27 CPU lockup in or near new filecache code Chuck Lever
@ 2019-12-10 18:49 ` Bruce Fields
  2019-12-10 20:45 ` Trond Myklebust
  1 sibling, 0 replies; 11+ messages in thread
From: Bruce Fields @ 2019-12-10 18:49 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Jeff Layton, Linux NFS Mailing List

On Tue, Dec 10, 2019 at 11:27:15AM -0500, Chuck Lever wrote:
> Under stress, I'm seeing BUGs similar to this quite a bit on my v5.5-rc1 NFS server.
> As near as I can tell, the nfsd thread is looping under nfsd_file_acquire somewhere.

Thanks.  Olga also reported a soft lockup in the new file caching code,
though with a different stack.

--b.

> 
> Dec  9 13:22:52 klimt kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:2002]
> Dec  9 13:22:52 klimt kernel: Modules linked in: rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 iTCO_wdt iTCO_vendor_support aesni_intel glue_helper crypto_simd cryptd pcspkr rpcrdma i2c_i801 lpc_ich rdma_ucm mfd_core ib_iser rdma_cm mei_me mei iw_cm raid0 ib_cm libiscsi sg scsi_transport_iscsi ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sr_mod cdrom sd_mod qedr ast drm_vram_helper drm_ttm_helper ttm drm_kms_helper crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata igb mlx4_core dca i2c_algo_bit i2c_core nvme nvme_core qede qed dm_mirror dm_region_hash dm_log dm_mod crc8 ib_uverbs dax ib_core
> Dec  9 13:22:52 klimt kernel: CPU: 0 PID: 2002 Comm: nfsd Tainted: G             L    5.5.0-rc1-00002-gc56d8b96a170 #1400
> Dec  9 13:22:52 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
> Dec  9 13:22:52 klimt kernel: RIP: 0010:put_task_struct+0xc/0x28
> Dec  9 13:22:52 klimt kernel: Code: 05 11 01 00 75 17 48 c7 c7 b1 b4 eb 81 31 c0 c6 05 4f 05 11 01 01 e8 a7 ad fd ff 0f 0b c3 48 8d 57 20 83 c8 ff f0 0f c1 47 20 <83> f8 01 75 05 e9 cf 7e fd ff 85 c0 7f 0d be 03 00 00 00 48 89 d7
> Dec  9 13:22:52 klimt kernel: RSP: 0018:ffffc90000f7bb88 EFLAGS: 00000213 ORIG_RAX: ffffffffffffff13
> Dec  9 13:22:52 klimt kernel: RAX: 0000000000000002 RBX: ffff888844120000 RCX: 0000000000000000
> Dec  9 13:22:52 klimt kernel: RDX: ffff888844120020 RSI: 0000000000000000 RDI: ffff888844120000
> Dec  9 13:22:52 klimt kernel: RBP: 0000000000000001 R08: ffff888817527b00 R09: ffffffff8121d707
> Dec  9 13:22:52 klimt kernel: R10: ffffc90000f7bbc8 R11: 000000008e6571d9 R12: ffff888855055750
> Dec  9 13:22:52 klimt kernel: R13: 0000000000000000 R14: ffff88882dcd9320 R15: ffff88881741a8c0
> Dec  9 13:22:52 klimt kernel: FS:  0000000000000000(0000) GS:ffff88885fc00000(0000) knlGS:0000000000000000
> Dec  9 13:22:52 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Dec  9 13:22:52 klimt kernel: CR2: 00007f816a386000 CR3: 0000000855686003 CR4: 00000000001606f0
> Dec  9 13:22:52 klimt kernel: Call Trace:
> Dec  9 13:22:52 klimt kernel: wake_up_q+0x34/0x40
> Dec  9 13:22:52 klimt kernel: __mutex_unlock_slowpath.isra.14+0x9d/0xeb
> Dec  9 13:22:52 klimt kernel: fsnotify_add_mark+0x53/0x5d
> Dec  9 13:22:52 klimt kernel: nfsd_file_acquire+0x423/0x576 [nfsd]
> Dec  9 13:22:52 klimt kernel: nfs4_get_vfs_file+0x14c/0x20f [nfsd]
> Dec  9 13:22:52 klimt kernel: nfsd4_process_open2+0xcd6/0xd98 [nfsd]
> Dec  9 13:22:52 klimt kernel: ? fh_verify+0x42e/0x4ef [nfsd]
> Dec  9 13:22:52 klimt kernel: ? nfsd4_process_open1+0x233/0x29d [nfsd]
> Dec  9 13:22:52 klimt kernel: nfsd4_open+0x500/0x5cb [nfsd]
> Dec  9 13:22:52 klimt kernel: nfsd4_proc_compound+0x32a/0x5c7 [nfsd]
> Dec  9 13:22:52 klimt kernel: nfsd_dispatch+0x102/0x1e2 [nfsd]
> Dec  9 13:22:52 klimt kernel: svc_process_common+0x3b3/0x65d [sunrpc]
> Dec  9 13:22:52 klimt kernel: ? svc_xprt_put+0x12/0x21 [sunrpc]
> Dec  9 13:22:52 klimt kernel: ? nfsd_svc+0x2be/0x2be [nfsd]
> Dec  9 13:22:52 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
> Dec  9 13:22:52 klimt kernel: svc_process+0xf6/0x115 [sunrpc]
> Dec  9 13:22:52 klimt kernel: nfsd+0xf2/0x149 [nfsd]
> Dec  9 13:22:52 klimt kernel: kthread+0xf6/0xfb
> Dec  9 13:22:52 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
> Dec  9 13:22:52 klimt kernel: ret_from_fork+0x3a/0x50
> 
> 
> --
> Chuck Lever
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CPU lockup in or near new filecache code
  2019-12-10 16:27 CPU lockup in or near new filecache code Chuck Lever
  2019-12-10 18:49 ` Bruce Fields
@ 2019-12-10 20:45 ` Trond Myklebust
  2019-12-11 18:14   ` Chuck Lever
  1 sibling, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2019-12-10 20:45 UTC (permalink / raw)
  To: bfields, jlayton, chuck.lever; +Cc: linux-nfs

On Tue, 2019-12-10 at 11:27 -0500, Chuck Lever wrote:
> Under stress, I'm seeing BUGs similar to this quite a bit on my v5.5-
> rc1 NFS server.
> As near as I can tell, the nfsd thread is looping under
> nfsd_file_acquire somewhere.
> 
> Dec  9 13:22:52 klimt kernel: watchdog: BUG: soft lockup - CPU#0
> stuck for 22s! [nfsd:2002]
> Dec  9 13:22:52 klimt kernel: Modules linked in: rpcsec_gss_krb5
> ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
> ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal
> coretemp kvm_intel kvm irqbypass ext4 crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel mbcache jbd2 iTCO_wdt iTCO_vendor_support
> aesni_intel glue_helper crypto_simd cryptd pcspkr rpcrdma i2c_i801
> lpc_ich rdma_ucm mfd_core ib_iser rdma_cm mei_me mei iw_cm raid0
> ib_cm libiscsi sg scsi_transport_iscsi ioatdma wmi ipmi_si
> ipmi_devintf ipmi_msghandler acpi_power_meter nfsd nfs_acl lockd
> auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sr_mod cdrom
> sd_mod qedr ast drm_vram_helper drm_ttm_helper ttm drm_kms_helper
> crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci
> libahci libata igb mlx4_core dca i2c_algo_bit i2c_core nvme nvme_core
> qede qed dm_mirror dm_region_hash dm_log dm_mod crc8 ib_uverbs dax
> ib_core
> Dec  9 13:22:52 klimt kernel: CPU: 0 PID: 2002 Comm: nfsd Tainted:
> G             L    5.5.0-rc1-00002-gc56d8b96a170 #1400
> Dec  9 13:22:52 klimt kernel: Hardware name: Supermicro Super
> Server/X10SRL-F, BIOS 1.0c 09/09/2015
> Dec  9 13:22:52 klimt kernel: RIP: 0010:put_task_struct+0xc/0x28
> Dec  9 13:22:52 klimt kernel: Code: 05 11 01 00 75 17 48 c7 c7 b1 b4
> eb 81 31 c0 c6 05 4f 05 11 01 01 e8 a7 ad fd ff 0f 0b c3 48 8d 57 20
> 83 c8 ff f0 0f c1 47 20 <83> f8 01 75 05 e9 cf 7e fd ff 85 c0 7f 0d
> be 03 00 00 00 48 89 d7
> Dec  9 13:22:52 klimt kernel: RSP: 0018:ffffc90000f7bb88 EFLAGS:
> 00000213 ORIG_RAX: ffffffffffffff13
> Dec  9 13:22:52 klimt kernel: RAX: 0000000000000002 RBX:
> ffff888844120000 RCX: 0000000000000000
> Dec  9 13:22:52 klimt kernel: RDX: ffff888844120020 RSI:
> 0000000000000000 RDI: ffff888844120000
> Dec  9 13:22:52 klimt kernel: RBP: 0000000000000001 R08:
> ffff888817527b00 R09: ffffffff8121d707
> Dec  9 13:22:52 klimt kernel: R10: ffffc90000f7bbc8 R11:
> 000000008e6571d9 R12: ffff888855055750
> Dec  9 13:22:52 klimt kernel: R13: 0000000000000000 R14:
> ffff88882dcd9320 R15: ffff88881741a8c0
> Dec  9 13:22:52 klimt kernel: FS:  0000000000000000(0000)
> GS:ffff88885fc00000(0000) knlGS:0000000000000000
> Dec  9 13:22:52 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> Dec  9 13:22:52 klimt kernel: CR2: 00007f816a386000 CR3:
> 0000000855686003 CR4: 00000000001606f0
> Dec  9 13:22:52 klimt kernel: Call Trace:
> Dec  9 13:22:52 klimt kernel: wake_up_q+0x34/0x40
> Dec  9 13:22:52 klimt kernel:
> __mutex_unlock_slowpath.isra.14+0x9d/0xeb
> Dec  9 13:22:52 klimt kernel: fsnotify_add_mark+0x53/0x5d
> Dec  9 13:22:52 klimt kernel: nfsd_file_acquire+0x423/0x576 [nfsd]
> Dec  9 13:22:52 klimt kernel: nfs4_get_vfs_file+0x14c/0x20f [nfsd]
> Dec  9 13:22:52 klimt kernel: nfsd4_process_open2+0xcd6/0xd98 [nfsd]
> Dec  9 13:22:52 klimt kernel: ? fh_verify+0x42e/0x4ef [nfsd]
> Dec  9 13:22:52 klimt kernel: ? nfsd4_process_open1+0x233/0x29d
> [nfsd]
> Dec  9 13:22:52 klimt kernel: nfsd4_open+0x500/0x5cb [nfsd]
> Dec  9 13:22:52 klimt kernel: nfsd4_proc_compound+0x32a/0x5c7 [nfsd]
> Dec  9 13:22:52 klimt kernel: nfsd_dispatch+0x102/0x1e2 [nfsd]
> Dec  9 13:22:52 klimt kernel: svc_process_common+0x3b3/0x65d [sunrpc]
> Dec  9 13:22:52 klimt kernel: ? svc_xprt_put+0x12/0x21 [sunrpc]
> Dec  9 13:22:52 klimt kernel: ? nfsd_svc+0x2be/0x2be [nfsd]
> Dec  9 13:22:52 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
> Dec  9 13:22:52 klimt kernel: svc_process+0xf6/0x115 [sunrpc]
> Dec  9 13:22:52 klimt kernel: nfsd+0xf2/0x149 [nfsd]
> Dec  9 13:22:52 klimt kernel: kthread+0xf6/0xfb
> Dec  9 13:22:52 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
> Dec  9 13:22:52 klimt kernel: ret_from_fork+0x3a/0x50
> 

Does the very first patch in the series I just sent out help?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CPU lockup in or near new filecache code
  2019-12-10 20:45 ` Trond Myklebust
@ 2019-12-11 18:14   ` Chuck Lever
  2019-12-11 20:01     ` Chuck Lever
  0 siblings, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2019-12-11 18:14 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Bruce Fields, Jeff Layton, Linux NFS Mailing List



> On Dec 10, 2019, at 3:45 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Tue, 2019-12-10 at 11:27 -0500, Chuck Lever wrote:
>> Under stress, I'm seeing BUGs similar to this quite a bit on my v5.5-
>> rc1 NFS server.
>> As near as I can tell, the nfsd thread is looping under
>> nfsd_file_acquire somewhere.
>> 
>> Dec  9 13:22:52 klimt kernel: watchdog: BUG: soft lockup - CPU#0
>> stuck for 22s! [nfsd:2002]
>> Dec  9 13:22:52 klimt kernel: Modules linked in: rpcsec_gss_krb5
>> ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
>> ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal
>> coretemp kvm_intel kvm irqbypass ext4 crct10dif_pclmul crc32_pclmul
>> ghash_clmulni_intel mbcache jbd2 iTCO_wdt iTCO_vendor_support
>> aesni_intel glue_helper crypto_simd cryptd pcspkr rpcrdma i2c_i801
>> lpc_ich rdma_ucm mfd_core ib_iser rdma_cm mei_me mei iw_cm raid0
>> ib_cm libiscsi sg scsi_transport_iscsi ioatdma wmi ipmi_si
>> ipmi_devintf ipmi_msghandler acpi_power_meter nfsd nfs_acl lockd
>> auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sr_mod cdrom
>> sd_mod qedr ast drm_vram_helper drm_ttm_helper ttm drm_kms_helper
>> crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci
>> libahci libata igb mlx4_core dca i2c_algo_bit i2c_core nvme nvme_core
>> qede qed dm_mirror dm_region_hash dm_log dm_mod crc8 ib_uverbs dax
>> ib_core
>> Dec  9 13:22:52 klimt kernel: CPU: 0 PID: 2002 Comm: nfsd Tainted:
>> G             L    5.5.0-rc1-00002-gc56d8b96a170 #1400
>> Dec  9 13:22:52 klimt kernel: Hardware name: Supermicro Super
>> Server/X10SRL-F, BIOS 1.0c 09/09/2015
>> Dec  9 13:22:52 klimt kernel: RIP: 0010:put_task_struct+0xc/0x28
>> Dec  9 13:22:52 klimt kernel: Code: 05 11 01 00 75 17 48 c7 c7 b1 b4
>> eb 81 31 c0 c6 05 4f 05 11 01 01 e8 a7 ad fd ff 0f 0b c3 48 8d 57 20
>> 83 c8 ff f0 0f c1 47 20 <83> f8 01 75 05 e9 cf 7e fd ff 85 c0 7f 0d
>> be 03 00 00 00 48 89 d7
>> Dec  9 13:22:52 klimt kernel: RSP: 0018:ffffc90000f7bb88 EFLAGS:
>> 00000213 ORIG_RAX: ffffffffffffff13
>> Dec  9 13:22:52 klimt kernel: RAX: 0000000000000002 RBX:
>> ffff888844120000 RCX: 0000000000000000
>> Dec  9 13:22:52 klimt kernel: RDX: ffff888844120020 RSI:
>> 0000000000000000 RDI: ffff888844120000
>> Dec  9 13:22:52 klimt kernel: RBP: 0000000000000001 R08:
>> ffff888817527b00 R09: ffffffff8121d707
>> Dec  9 13:22:52 klimt kernel: R10: ffffc90000f7bbc8 R11:
>> 000000008e6571d9 R12: ffff888855055750
>> Dec  9 13:22:52 klimt kernel: R13: 0000000000000000 R14:
>> ffff88882dcd9320 R15: ffff88881741a8c0
>> Dec  9 13:22:52 klimt kernel: FS:  0000000000000000(0000)
>> GS:ffff88885fc00000(0000) knlGS:0000000000000000
>> Dec  9 13:22:52 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
>> 0000000080050033
>> Dec  9 13:22:52 klimt kernel: CR2: 00007f816a386000 CR3:
>> 0000000855686003 CR4: 00000000001606f0
>> Dec  9 13:22:52 klimt kernel: Call Trace:
>> Dec  9 13:22:52 klimt kernel: wake_up_q+0x34/0x40
>> Dec  9 13:22:52 klimt kernel:
>> __mutex_unlock_slowpath.isra.14+0x9d/0xeb
>> Dec  9 13:22:52 klimt kernel: fsnotify_add_mark+0x53/0x5d
>> Dec  9 13:22:52 klimt kernel: nfsd_file_acquire+0x423/0x576 [nfsd]
>> Dec  9 13:22:52 klimt kernel: nfs4_get_vfs_file+0x14c/0x20f [nfsd]
>> Dec  9 13:22:52 klimt kernel: nfsd4_process_open2+0xcd6/0xd98 [nfsd]
>> Dec  9 13:22:52 klimt kernel: ? fh_verify+0x42e/0x4ef [nfsd]
>> Dec  9 13:22:52 klimt kernel: ? nfsd4_process_open1+0x233/0x29d
>> [nfsd]
>> Dec  9 13:22:52 klimt kernel: nfsd4_open+0x500/0x5cb [nfsd]
>> Dec  9 13:22:52 klimt kernel: nfsd4_proc_compound+0x32a/0x5c7 [nfsd]
>> Dec  9 13:22:52 klimt kernel: nfsd_dispatch+0x102/0x1e2 [nfsd]
>> Dec  9 13:22:52 klimt kernel: svc_process_common+0x3b3/0x65d [sunrpc]
>> Dec  9 13:22:52 klimt kernel: ? svc_xprt_put+0x12/0x21 [sunrpc]
>> Dec  9 13:22:52 klimt kernel: ? nfsd_svc+0x2be/0x2be [nfsd]
>> Dec  9 13:22:52 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
>> Dec  9 13:22:52 klimt kernel: svc_process+0xf6/0x115 [sunrpc]
>> Dec  9 13:22:52 klimt kernel: nfsd+0xf2/0x149 [nfsd]
>> Dec  9 13:22:52 klimt kernel: kthread+0xf6/0xfb
>> Dec  9 13:22:52 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
>> Dec  9 13:22:52 klimt kernel: ret_from_fork+0x3a/0x50
>> 
> 
> Does the very first patch in the series I just sent out help?

The test is to run (cd git; make -j12 test) on a 12-core client
over NFSv4.1 on RDMA with disconnect injection enabled.

I run the test with sys, krb5, krb5i, and krb5p.

The server used to crash 0-2 times per security flavor.

With your patch applied the server was stable enough to complete
all four security flavors without a kworker lockup. A definite
improvement. For the first patch:

Tested-by: Chuck Lever <chuck.lever@oracle.com>


--
Chuck Lever




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CPU lockup in or near new filecache code
  2019-12-11 18:14   ` Chuck Lever
@ 2019-12-11 20:01     ` Chuck Lever
  2019-12-13 20:12       ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2019-12-11 20:01 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Bruce Fields, Jeff Layton, Linux NFS Mailing List



> On Dec 11, 2019, at 1:14 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> 
> 
>> On Dec 10, 2019, at 3:45 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
>> 
>> On Tue, 2019-12-10 at 11:27 -0500, Chuck Lever wrote:
>>> Under stress, I'm seeing BUGs similar to this quite a bit on my v5.5-
>>> rc1 NFS server.
>>> As near as I can tell, the nfsd thread is looping under
>>> nfsd_file_acquire somewhere.
>>> 
>>> Dec  9 13:22:52 klimt kernel: watchdog: BUG: soft lockup - CPU#0
>>> stuck for 22s! [nfsd:2002]
>>> Dec  9 13:22:52 klimt kernel: Modules linked in: rpcsec_gss_krb5
>>> ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
>>> ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal
>>> coretemp kvm_intel kvm irqbypass ext4 crct10dif_pclmul crc32_pclmul
>>> ghash_clmulni_intel mbcache jbd2 iTCO_wdt iTCO_vendor_support
>>> aesni_intel glue_helper crypto_simd cryptd pcspkr rpcrdma i2c_i801
>>> lpc_ich rdma_ucm mfd_core ib_iser rdma_cm mei_me mei iw_cm raid0
>>> ib_cm libiscsi sg scsi_transport_iscsi ioatdma wmi ipmi_si
>>> ipmi_devintf ipmi_msghandler acpi_power_meter nfsd nfs_acl lockd
>>> auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sr_mod cdrom
>>> sd_mod qedr ast drm_vram_helper drm_ttm_helper ttm drm_kms_helper
>>> crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci
>>> libahci libata igb mlx4_core dca i2c_algo_bit i2c_core nvme nvme_core
>>> qede qed dm_mirror dm_region_hash dm_log dm_mod crc8 ib_uverbs dax
>>> ib_core
>>> Dec  9 13:22:52 klimt kernel: CPU: 0 PID: 2002 Comm: nfsd Tainted:
>>> G             L    5.5.0-rc1-00002-gc56d8b96a170 #1400
>>> Dec  9 13:22:52 klimt kernel: Hardware name: Supermicro Super
>>> Server/X10SRL-F, BIOS 1.0c 09/09/2015
>>> Dec  9 13:22:52 klimt kernel: RIP: 0010:put_task_struct+0xc/0x28
>>> Dec  9 13:22:52 klimt kernel: Code: 05 11 01 00 75 17 48 c7 c7 b1 b4
>>> eb 81 31 c0 c6 05 4f 05 11 01 01 e8 a7 ad fd ff 0f 0b c3 48 8d 57 20
>>> 83 c8 ff f0 0f c1 47 20 <83> f8 01 75 05 e9 cf 7e fd ff 85 c0 7f 0d
>>> be 03 00 00 00 48 89 d7
>>> Dec  9 13:22:52 klimt kernel: RSP: 0018:ffffc90000f7bb88 EFLAGS:
>>> 00000213 ORIG_RAX: ffffffffffffff13
>>> Dec  9 13:22:52 klimt kernel: RAX: 0000000000000002 RBX:
>>> ffff888844120000 RCX: 0000000000000000
>>> Dec  9 13:22:52 klimt kernel: RDX: ffff888844120020 RSI:
>>> 0000000000000000 RDI: ffff888844120000
>>> Dec  9 13:22:52 klimt kernel: RBP: 0000000000000001 R08:
>>> ffff888817527b00 R09: ffffffff8121d707
>>> Dec  9 13:22:52 klimt kernel: R10: ffffc90000f7bbc8 R11:
>>> 000000008e6571d9 R12: ffff888855055750
>>> Dec  9 13:22:52 klimt kernel: R13: 0000000000000000 R14:
>>> ffff88882dcd9320 R15: ffff88881741a8c0
>>> Dec  9 13:22:52 klimt kernel: FS:  0000000000000000(0000)
>>> GS:ffff88885fc00000(0000) knlGS:0000000000000000
>>> Dec  9 13:22:52 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
>>> 0000000080050033
>>> Dec  9 13:22:52 klimt kernel: CR2: 00007f816a386000 CR3:
>>> 0000000855686003 CR4: 00000000001606f0
>>> Dec  9 13:22:52 klimt kernel: Call Trace:
>>> Dec  9 13:22:52 klimt kernel: wake_up_q+0x34/0x40
>>> Dec  9 13:22:52 klimt kernel:
>>> __mutex_unlock_slowpath.isra.14+0x9d/0xeb
>>> Dec  9 13:22:52 klimt kernel: fsnotify_add_mark+0x53/0x5d
>>> Dec  9 13:22:52 klimt kernel: nfsd_file_acquire+0x423/0x576 [nfsd]
>>> Dec  9 13:22:52 klimt kernel: nfs4_get_vfs_file+0x14c/0x20f [nfsd]
>>> Dec  9 13:22:52 klimt kernel: nfsd4_process_open2+0xcd6/0xd98 [nfsd]
>>> Dec  9 13:22:52 klimt kernel: ? fh_verify+0x42e/0x4ef [nfsd]
>>> Dec  9 13:22:52 klimt kernel: ? nfsd4_process_open1+0x233/0x29d
>>> [nfsd]
>>> Dec  9 13:22:52 klimt kernel: nfsd4_open+0x500/0x5cb [nfsd]
>>> Dec  9 13:22:52 klimt kernel: nfsd4_proc_compound+0x32a/0x5c7 [nfsd]
>>> Dec  9 13:22:52 klimt kernel: nfsd_dispatch+0x102/0x1e2 [nfsd]
>>> Dec  9 13:22:52 klimt kernel: svc_process_common+0x3b3/0x65d [sunrpc]
>>> Dec  9 13:22:52 klimt kernel: ? svc_xprt_put+0x12/0x21 [sunrpc]
>>> Dec  9 13:22:52 klimt kernel: ? nfsd_svc+0x2be/0x2be [nfsd]
>>> Dec  9 13:22:52 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
>>> Dec  9 13:22:52 klimt kernel: svc_process+0xf6/0x115 [sunrpc]
>>> Dec  9 13:22:52 klimt kernel: nfsd+0xf2/0x149 [nfsd]
>>> Dec  9 13:22:52 klimt kernel: kthread+0xf6/0xfb
>>> Dec  9 13:22:52 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
>>> Dec  9 13:22:52 klimt kernel: ret_from_fork+0x3a/0x50
>>> 
>> 
>> Does the very first patch in the series I just sent out help?
> 
> The test is to run (cd git; make -j12 test) on a 12-core client
> over NFSv4.1 on RDMA with disconnect injection enabled.
> 
> I run the test with sys, krb5, krb5i, and krb5p.
> 
> The server used to crash 0-2 times per security flavor.
> 
> With your patch applied the server was stable enough to complete
> all four security flavors without a kworker lockup. A definite
> improvement. For the first patch:
> 
> Tested-by: Chuck Lever <chuck.lever@oracle.com>

OK, I finally got a hit. It took a long time. I've seen this particular
stack trace before, several times.

Dec 11 14:58:34 klimt kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:2005]
Dec 11 14:58:34 klimt kernel: Modules linked in: rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal kvm_intel coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt ext4 iTCO_vendor_support aesni_intel mbcache jbd2 glue_helper rpcrdma crypto_simd cryptd rdma_ucm ib_iser rdma_cm pcspkr iw_cm ib_cm mei_me raid0 libiscsi lpc_ich mei sg scsi_transport_iscsi i2c_i801 mfd_core wmi ipmi_si ipmi_devintf ipmi_msghandler ioatdma acpi_power_meter nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sr_mod sd_mod cdrom qedr ast drm_vram_helper drm_ttm_helper ttm crc32c_intel drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm igb dca i2c_algo_bit i2c_core mlx4_core ahci libahci libata nvme nvme_core qede qed dm_mirror dm_region_hash dm_log dm_mod crc8 ib_uverbs dax ib_core
Dec 11 14:58:34 klimt kernel: CPU: 0 PID: 2005 Comm: nfsd Tainted: G        W         5.5.0-rc1-00003-g170e7adc2317 #1401
Dec 11 14:58:34 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
Dec 11 14:58:34 klimt kernel: RIP: 0010:__srcu_read_lock+0x23/0x24
Dec 11 14:58:34 klimt kernel: Code: 07 00 0f 1f 40 00 c3 0f 1f 44 00 00 8b 87 c8 c3 00 00 48 8b 97 f0 c3 00 00 83 e0 01 48 63 c8 65 48 ff 04 ca f0 83 44 24 fc 00 <c3> 0f 1f 44 00 00 f0 83 44 24 fc 00 48 63 f6 48 8b 87 f0 c3 00 00
Dec 11 14:58:34 klimt kernel: RSP: 0018:ffffc90001d97bd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Dec 11 14:58:34 klimt kernel: RAX: 0000000000000001 RBX: ffff888830d0eb78 RCX: 0000000000000001
Dec 11 14:58:34 klimt kernel: RDX: 0000000000030f00 RSI: ffff888853f4da00 RDI: ffffffff82815a40
Dec 11 14:58:34 klimt kernel: RBP: ffff88883112d828 R08: ffff888843540000 R09: ffffffff8121d707
Dec 11 14:58:34 klimt kernel: R10: ffffc90001d97bf0 R11: 0000000000001b84 R12: ffff888853f4da00
Dec 11 14:58:34 klimt kernel: R13: ffff8888132a1410 R14: ffff88883112d7e0 R15: 00000000ffffffef
Dec 11 14:58:34 klimt kernel: FS:  0000000000000000(0000) GS:ffff88885fc00000(0000) knlGS:0000000000000000
Dec 11 14:58:34 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 11 14:58:34 klimt kernel: CR2: 00007f2d6a2d8000 CR3: 0000000859b38004 CR4: 00000000001606f0
Dec 11 14:58:34 klimt kernel: Call Trace:
Dec 11 14:58:34 klimt kernel: fsnotify_grab_connector+0x16/0x4f
Dec 11 14:58:34 klimt kernel: fsnotify_find_mark+0x11/0x6a
Dec 11 14:58:34 klimt kernel: nfsd_file_acquire+0x3a9/0x5b2 [nfsd]
Dec 11 14:58:34 klimt kernel: nfs4_get_vfs_file+0x14c/0x20f [nfsd]
Dec 11 14:58:34 klimt kernel: nfsd4_process_open2+0xcd6/0xd98 [nfsd]
Dec 11 14:58:34 klimt kernel: ? fh_verify+0x42e/0x4ef [nfsd]
Dec 11 14:58:34 klimt kernel: ? nfsd4_process_open1+0x233/0x29d [nfsd]
Dec 11 14:58:34 klimt kernel: nfsd4_open+0x500/0x5cb [nfsd]
Dec 11 14:58:34 klimt kernel: nfsd4_proc_compound+0x32a/0x5c7 [nfsd]
Dec 11 14:58:34 klimt kernel: nfsd_dispatch+0x102/0x1e2 [nfsd]
Dec 11 14:58:34 klimt kernel: svc_process_common+0x3b3/0x65d [sunrpc]
Dec 11 14:58:34 klimt kernel: ? svc_xprt_put+0x12/0x21 [sunrpc]
Dec 11 14:58:34 klimt kernel: ? nfsd_svc+0x2be/0x2be [nfsd]
Dec 11 14:58:34 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
Dec 11 14:58:34 klimt kernel: svc_process+0xf6/0x115 [sunrpc]
Dec 11 14:58:34 klimt kernel: nfsd+0xf2/0x149 [nfsd]
Dec 11 14:58:34 klimt kernel: kthread+0xf6/0xfb
Dec 11 14:58:34 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
Dec 11 14:58:34 klimt kernel: ret_from_fork+0x3a/0x50


--
Chuck Lever




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CPU lockup in or near new filecache code
  2019-12-11 20:01     ` Chuck Lever
@ 2019-12-13 20:12       ` Trond Myklebust
  2019-12-13 20:26         ` Chuck Lever
  2019-12-18 23:20         ` Chuck Lever
  0 siblings, 2 replies; 11+ messages in thread
From: Trond Myklebust @ 2019-12-13 20:12 UTC (permalink / raw)
  To: chuck.lever; +Cc: bfields, jlayton, linux-nfs

On Wed, 2019-12-11 at 15:01 -0500, Chuck Lever wrote:
> OK, I finally got a hit. It took a long time. I've seen this
> particular
> stack trace before, several times.
> 
> Dec 11 14:58:34 klimt kernel: watchdog: BUG: soft lockup - CPU#0
> stuck for 22s! [nfsd:2005]
> Dec 11 14:58:34 klimt kernel: Modules linked in: rpcsec_gss_krb5
> ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
> ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal
> kvm_intel coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel iTCO_wdt ext4 iTCO_vendor_support aesni_intel
> mbcache jbd2 glue_helper rpcrdma crypto_simd cryptd rdma_ucm ib_iser
> rdma_cm pcspkr iw_cm ib_cm mei_me raid0 libiscsi lpc_ich mei sg
> scsi_transport_iscsi i2c_i801 mfd_core wmi ipmi_si ipmi_devintf
> ipmi_msghandler ioatdma acpi_power_meter nfsd nfs_acl lockd
> auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sr_mod
> sd_mod cdrom qedr ast drm_vram_helper drm_ttm_helper ttm crc32c_intel
> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm igb
> dca i2c_algo_bit i2c_core mlx4_core ahci libahci libata nvme
> nvme_core qede qed dm_mirror dm_region_hash dm_log dm_mod crc8
> ib_uverbs dax ib_core
> Dec 11 14:58:34 klimt kernel: CPU: 0 PID: 2005 Comm: nfsd Tainted:
> G        W         5.5.0-rc1-00003-g170e7adc2317 #1401
> Dec 11 14:58:34 klimt kernel: Hardware name: Supermicro Super
> Server/X10SRL-F, BIOS 1.0c 09/09/2015
> Dec 11 14:58:34 klimt kernel: RIP: 0010:__srcu_read_lock+0x23/0x24
> Dec 11 14:58:34 klimt kernel: Code: 07 00 0f 1f 40 00 c3 0f 1f 44 00
> 00 8b 87 c8 c3 00 00 48 8b 97 f0 c3 00 00 83 e0 01 48 63 c8 65 48 ff
> 04 ca f0 83 44 24 fc 00 <c3> 0f 1f 44 00 00 f0 83 44 24 fc 00 48 63
> f6 48 8b 87 f0 c3 00 00
> Dec 11 14:58:34 klimt kernel: RSP: 0018:ffffc90001d97bd0 EFLAGS:
> 00000246 ORIG_RAX: ffffffffffffff13
> Dec 11 14:58:34 klimt kernel: RAX: 0000000000000001 RBX:
> ffff888830d0eb78 RCX: 0000000000000001
> Dec 11 14:58:34 klimt kernel: RDX: 0000000000030f00 RSI:
> ffff888853f4da00 RDI: ffffffff82815a40
> Dec 11 14:58:34 klimt kernel: RBP: ffff88883112d828 R08:
> ffff888843540000 R09: ffffffff8121d707
> Dec 11 14:58:34 klimt kernel: R10: ffffc90001d97bf0 R11:
> 0000000000001b84 R12: ffff888853f4da00
> Dec 11 14:58:34 klimt kernel: R13: ffff8888132a1410 R14:
> ffff88883112d7e0 R15: 00000000ffffffef
> Dec 11 14:58:34 klimt kernel: FS:  0000000000000000(0000)
> GS:ffff88885fc00000(0000) knlGS:0000000000000000
> Dec 11 14:58:34 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> Dec 11 14:58:34 klimt kernel: CR2: 00007f2d6a2d8000 CR3:
> 0000000859b38004 CR4: 00000000001606f0
> Dec 11 14:58:34 klimt kernel: Call Trace:
> Dec 11 14:58:34 klimt kernel: fsnotify_grab_connector+0x16/0x4f
> Dec 11 14:58:34 klimt kernel: fsnotify_find_mark+0x11/0x6a
> Dec 11 14:58:34 klimt kernel: nfsd_file_acquire+0x3a9/0x5b2 [nfsd]
> Dec 11 14:58:34 klimt kernel: nfs4_get_vfs_file+0x14c/0x20f [nfsd]
> Dec 11 14:58:34 klimt kernel: nfsd4_process_open2+0xcd6/0xd98 [nfsd]
> Dec 11 14:58:34 klimt kernel: ? fh_verify+0x42e/0x4ef [nfsd]
> Dec 11 14:58:34 klimt kernel: ? nfsd4_process_open1+0x233/0x29d
> [nfsd]
> Dec 11 14:58:34 klimt kernel: nfsd4_open+0x500/0x5cb [nfsd]
> Dec 11 14:58:34 klimt kernel: nfsd4_proc_compound+0x32a/0x5c7 [nfsd]
> Dec 11 14:58:34 klimt kernel: nfsd_dispatch+0x102/0x1e2 [nfsd]
> Dec 11 14:58:34 klimt kernel: svc_process_common+0x3b3/0x65d [sunrpc]
> Dec 11 14:58:34 klimt kernel: ? svc_xprt_put+0x12/0x21 [sunrpc]
> Dec 11 14:58:34 klimt kernel: ? nfsd_svc+0x2be/0x2be [nfsd]
> Dec 11 14:58:34 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
> Dec 11 14:58:34 klimt kernel: svc_process+0xf6/0x115 [sunrpc]
> Dec 11 14:58:34 klimt kernel: nfsd+0xf2/0x149 [nfsd]
> Dec 11 14:58:34 klimt kernel: kthread+0xf6/0xfb
> Dec 11 14:58:34 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
> Dec 11 14:58:34 klimt kernel: ret_from_fork+0x3a/0x50
> 

Does something like the following help?

8<---------------------------------------------------
From caf515c82ed572e4f92ac8293e5da4818da0c6ce Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Fri, 13 Dec 2019 15:07:33 -0500
Subject: [PATCH] nfsd: Fix a soft lockup race in
 nfsd_file_mark_find_or_create()

If nfsd_file_mark_find_or_create() keeps winning the race for the
nfsd_file_fsnotify_group->mark_mutex against nfsd_file_mark_put()
then it can soft lock up, since fsnotify_add_inode_mark() ends
up always finding an existing entry.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfsd/filecache.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 9c2b29e07975..f275c11c4e28 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -132,9 +132,13 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf)
 						 struct nfsd_file_mark,
 						 nfm_mark));
 			mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex);
-			fsnotify_put_mark(mark);
-			if (likely(nfm))
+			if (nfm) {
+				fsnotify_put_mark(mark);
 				break;
+			}
+			/* Avoid soft lockup race with nfsd_file_mark_put() */
+			fsnotify_destroy_mark(mark, nfsd_file_fsnotify_group);
+			fsnotify_put_mark(mark);
 		} else
 			mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex);
 
-- 
2.23.0


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: CPU lockup in or near new filecache code
  2019-12-13 20:12       ` Trond Myklebust
@ 2019-12-13 20:26         ` Chuck Lever
  2019-12-18 23:20         ` Chuck Lever
  1 sibling, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2019-12-13 20:26 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Bruce Fields, Jeff Layton, Linux NFS Mailing List



> On Dec 13, 2019, at 3:12 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Wed, 2019-12-11 at 15:01 -0500, Chuck Lever wrote:
>> OK, I finally got a hit. It took a long time. I've seen this
>> particular
>> stack trace before, several times.
>> 
>> Dec 11 14:58:34 klimt kernel: watchdog: BUG: soft lockup - CPU#0
>> stuck for 22s! [nfsd:2005]
>> Dec 11 14:58:34 klimt kernel: Modules linked in: rpcsec_gss_krb5
>> ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
>> ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal
>> kvm_intel coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul
>> ghash_clmulni_intel iTCO_wdt ext4 iTCO_vendor_support aesni_intel
>> mbcache jbd2 glue_helper rpcrdma crypto_simd cryptd rdma_ucm ib_iser
>> rdma_cm pcspkr iw_cm ib_cm mei_me raid0 libiscsi lpc_ich mei sg
>> scsi_transport_iscsi i2c_i801 mfd_core wmi ipmi_si ipmi_devintf
>> ipmi_msghandler ioatdma acpi_power_meter nfsd nfs_acl lockd
>> auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sr_mod
>> sd_mod cdrom qedr ast drm_vram_helper drm_ttm_helper ttm crc32c_intel
>> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm igb
>> dca i2c_algo_bit i2c_core mlx4_core ahci libahci libata nvme
>> nvme_core qede qed dm_mirror dm_region_hash dm_log dm_mod crc8
>> ib_uverbs dax ib_core
>> Dec 11 14:58:34 klimt kernel: CPU: 0 PID: 2005 Comm: nfsd Tainted:
>> G        W         5.5.0-rc1-00003-g170e7adc2317 #1401
>> Dec 11 14:58:34 klimt kernel: Hardware name: Supermicro Super
>> Server/X10SRL-F, BIOS 1.0c 09/09/2015
>> Dec 11 14:58:34 klimt kernel: RIP: 0010:__srcu_read_lock+0x23/0x24
>> Dec 11 14:58:34 klimt kernel: Code: 07 00 0f 1f 40 00 c3 0f 1f 44 00
>> 00 8b 87 c8 c3 00 00 48 8b 97 f0 c3 00 00 83 e0 01 48 63 c8 65 48 ff
>> 04 ca f0 83 44 24 fc 00 <c3> 0f 1f 44 00 00 f0 83 44 24 fc 00 48 63
>> f6 48 8b 87 f0 c3 00 00
>> Dec 11 14:58:34 klimt kernel: RSP: 0018:ffffc90001d97bd0 EFLAGS:
>> 00000246 ORIG_RAX: ffffffffffffff13
>> Dec 11 14:58:34 klimt kernel: RAX: 0000000000000001 RBX:
>> ffff888830d0eb78 RCX: 0000000000000001
>> Dec 11 14:58:34 klimt kernel: RDX: 0000000000030f00 RSI:
>> ffff888853f4da00 RDI: ffffffff82815a40
>> Dec 11 14:58:34 klimt kernel: RBP: ffff88883112d828 R08:
>> ffff888843540000 R09: ffffffff8121d707
>> Dec 11 14:58:34 klimt kernel: R10: ffffc90001d97bf0 R11:
>> 0000000000001b84 R12: ffff888853f4da00
>> Dec 11 14:58:34 klimt kernel: R13: ffff8888132a1410 R14:
>> ffff88883112d7e0 R15: 00000000ffffffef
>> Dec 11 14:58:34 klimt kernel: FS:  0000000000000000(0000)
>> GS:ffff88885fc00000(0000) knlGS:0000000000000000
>> Dec 11 14:58:34 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
>> 0000000080050033
>> Dec 11 14:58:34 klimt kernel: CR2: 00007f2d6a2d8000 CR3:
>> 0000000859b38004 CR4: 00000000001606f0
>> Dec 11 14:58:34 klimt kernel: Call Trace:
>> Dec 11 14:58:34 klimt kernel: fsnotify_grab_connector+0x16/0x4f
>> Dec 11 14:58:34 klimt kernel: fsnotify_find_mark+0x11/0x6a
>> Dec 11 14:58:34 klimt kernel: nfsd_file_acquire+0x3a9/0x5b2 [nfsd]
>> Dec 11 14:58:34 klimt kernel: nfs4_get_vfs_file+0x14c/0x20f [nfsd]
>> Dec 11 14:58:34 klimt kernel: nfsd4_process_open2+0xcd6/0xd98 [nfsd]
>> Dec 11 14:58:34 klimt kernel: ? fh_verify+0x42e/0x4ef [nfsd]
>> Dec 11 14:58:34 klimt kernel: ? nfsd4_process_open1+0x233/0x29d
>> [nfsd]
>> Dec 11 14:58:34 klimt kernel: nfsd4_open+0x500/0x5cb [nfsd]
>> Dec 11 14:58:34 klimt kernel: nfsd4_proc_compound+0x32a/0x5c7 [nfsd]
>> Dec 11 14:58:34 klimt kernel: nfsd_dispatch+0x102/0x1e2 [nfsd]
>> Dec 11 14:58:34 klimt kernel: svc_process_common+0x3b3/0x65d [sunrpc]
>> Dec 11 14:58:34 klimt kernel: ? svc_xprt_put+0x12/0x21 [sunrpc]
>> Dec 11 14:58:34 klimt kernel: ? nfsd_svc+0x2be/0x2be [nfsd]
>> Dec 11 14:58:34 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
>> Dec 11 14:58:34 klimt kernel: svc_process+0xf6/0x115 [sunrpc]
>> Dec 11 14:58:34 klimt kernel: nfsd+0xf2/0x149 [nfsd]
>> Dec 11 14:58:34 klimt kernel: kthread+0xf6/0xfb
>> Dec 11 14:58:34 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
>> Dec 11 14:58:34 klimt kernel: ret_from_fork+0x3a/0x50
>> 
> 
> Does something like the following help?
> 
> 8<---------------------------------------------------
> From caf515c82ed572e4f92ac8293e5da4818da0c6ce Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> Date: Fri, 13 Dec 2019 15:07:33 -0500
> Subject: [PATCH] nfsd: Fix a soft lockup race in
> nfsd_file_mark_find_or_create()
> 
> If nfsd_file_mark_find_or_create() keeps winning the race for the
> nfsd_file_fsnotify_group->mark_mutex against nfsd_file_mark_put()
> then it can soft lock up, since fsnotify_add_inode_mark() ends
> up always finding an existing entry.
> 
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> fs/nfsd/filecache.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 9c2b29e07975..f275c11c4e28 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -132,9 +132,13 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf)
> 						 struct nfsd_file_mark,
> 						 nfm_mark));
> 			mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex);
> -			fsnotify_put_mark(mark);
> -			if (likely(nfm))
> +			if (nfm) {
> +				fsnotify_put_mark(mark);
> 				break;
> +			}
> +			/* Avoid soft lockup race with nfsd_file_mark_put() */
> +			fsnotify_destroy_mark(mark, nfsd_file_fsnotify_group);
> +			fsnotify_put_mark(mark);
> 		} else
> 			mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex);
> 

Received. I'll have some time over the weekend or on Monday to test it.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CPU lockup in or near new filecache code
  2019-12-13 20:12       ` Trond Myklebust
  2019-12-13 20:26         ` Chuck Lever
@ 2019-12-18 23:20         ` Chuck Lever
  2020-01-03 16:47           ` Bruce Fields
  1 sibling, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2019-12-18 23:20 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Bruce Fields, Jeff Layton, Linux NFS Mailing List



> On Dec 13, 2019, at 3:12 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Wed, 2019-12-11 at 15:01 -0500, Chuck Lever wrote:
>> OK, I finally got a hit. It took a long time. I've seen this
>> particular
>> stack trace before, several times.
>> 
>> Dec 11 14:58:34 klimt kernel: watchdog: BUG: soft lockup - CPU#0
>> stuck for 22s! [nfsd:2005]
>> Dec 11 14:58:34 klimt kernel: Modules linked in: rpcsec_gss_krb5
>> ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
>> ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal
>> kvm_intel coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul
>> ghash_clmulni_intel iTCO_wdt ext4 iTCO_vendor_support aesni_intel
>> mbcache jbd2 glue_helper rpcrdma crypto_simd cryptd rdma_ucm ib_iser
>> rdma_cm pcspkr iw_cm ib_cm mei_me raid0 libiscsi lpc_ich mei sg
>> scsi_transport_iscsi i2c_i801 mfd_core wmi ipmi_si ipmi_devintf
>> ipmi_msghandler ioatdma acpi_power_meter nfsd nfs_acl lockd
>> auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sr_mod
>> sd_mod cdrom qedr ast drm_vram_helper drm_ttm_helper ttm crc32c_intel
>> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm igb
>> dca i2c_algo_bit i2c_core mlx4_core ahci libahci libata nvme
>> nvme_core qede qed dm_mirror dm_region_hash dm_log dm_mod crc8
>> ib_uverbs dax ib_core
>> Dec 11 14:58:34 klimt kernel: CPU: 0 PID: 2005 Comm: nfsd Tainted:
>> G        W         5.5.0-rc1-00003-g170e7adc2317 #1401
>> Dec 11 14:58:34 klimt kernel: Hardware name: Supermicro Super
>> Server/X10SRL-F, BIOS 1.0c 09/09/2015
>> Dec 11 14:58:34 klimt kernel: RIP: 0010:__srcu_read_lock+0x23/0x24
>> Dec 11 14:58:34 klimt kernel: Code: 07 00 0f 1f 40 00 c3 0f 1f 44 00
>> 00 8b 87 c8 c3 00 00 48 8b 97 f0 c3 00 00 83 e0 01 48 63 c8 65 48 ff
>> 04 ca f0 83 44 24 fc 00 <c3> 0f 1f 44 00 00 f0 83 44 24 fc 00 48 63
>> f6 48 8b 87 f0 c3 00 00
>> Dec 11 14:58:34 klimt kernel: RSP: 0018:ffffc90001d97bd0 EFLAGS:
>> 00000246 ORIG_RAX: ffffffffffffff13
>> Dec 11 14:58:34 klimt kernel: RAX: 0000000000000001 RBX:
>> ffff888830d0eb78 RCX: 0000000000000001
>> Dec 11 14:58:34 klimt kernel: RDX: 0000000000030f00 RSI:
>> ffff888853f4da00 RDI: ffffffff82815a40
>> Dec 11 14:58:34 klimt kernel: RBP: ffff88883112d828 R08:
>> ffff888843540000 R09: ffffffff8121d707
>> Dec 11 14:58:34 klimt kernel: R10: ffffc90001d97bf0 R11:
>> 0000000000001b84 R12: ffff888853f4da00
>> Dec 11 14:58:34 klimt kernel: R13: ffff8888132a1410 R14:
>> ffff88883112d7e0 R15: 00000000ffffffef
>> Dec 11 14:58:34 klimt kernel: FS:  0000000000000000(0000)
>> GS:ffff88885fc00000(0000) knlGS:0000000000000000
>> Dec 11 14:58:34 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
>> 0000000080050033
>> Dec 11 14:58:34 klimt kernel: CR2: 00007f2d6a2d8000 CR3:
>> 0000000859b38004 CR4: 00000000001606f0
>> Dec 11 14:58:34 klimt kernel: Call Trace:
>> Dec 11 14:58:34 klimt kernel: fsnotify_grab_connector+0x16/0x4f
>> Dec 11 14:58:34 klimt kernel: fsnotify_find_mark+0x11/0x6a
>> Dec 11 14:58:34 klimt kernel: nfsd_file_acquire+0x3a9/0x5b2 [nfsd]
>> Dec 11 14:58:34 klimt kernel: nfs4_get_vfs_file+0x14c/0x20f [nfsd]
>> Dec 11 14:58:34 klimt kernel: nfsd4_process_open2+0xcd6/0xd98 [nfsd]
>> Dec 11 14:58:34 klimt kernel: ? fh_verify+0x42e/0x4ef [nfsd]
>> Dec 11 14:58:34 klimt kernel: ? nfsd4_process_open1+0x233/0x29d
>> [nfsd]
>> Dec 11 14:58:34 klimt kernel: nfsd4_open+0x500/0x5cb [nfsd]
>> Dec 11 14:58:34 klimt kernel: nfsd4_proc_compound+0x32a/0x5c7 [nfsd]
>> Dec 11 14:58:34 klimt kernel: nfsd_dispatch+0x102/0x1e2 [nfsd]
>> Dec 11 14:58:34 klimt kernel: svc_process_common+0x3b3/0x65d [sunrpc]
>> Dec 11 14:58:34 klimt kernel: ? svc_xprt_put+0x12/0x21 [sunrpc]
>> Dec 11 14:58:34 klimt kernel: ? nfsd_svc+0x2be/0x2be [nfsd]
>> Dec 11 14:58:34 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
>> Dec 11 14:58:34 klimt kernel: svc_process+0xf6/0x115 [sunrpc]
>> Dec 11 14:58:34 klimt kernel: nfsd+0xf2/0x149 [nfsd]
>> Dec 11 14:58:34 klimt kernel: kthread+0xf6/0xfb
>> Dec 11 14:58:34 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
>> Dec 11 14:58:34 klimt kernel: ret_from_fork+0x3a/0x50
>> 
> 
> Does something like the following help?
> 
> 8<---------------------------------------------------
> From caf515c82ed572e4f92ac8293e5da4818da0c6ce Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> Date: Fri, 13 Dec 2019 15:07:33 -0500
> Subject: [PATCH] nfsd: Fix a soft lockup race in
> nfsd_file_mark_find_or_create()
> 
> If nfsd_file_mark_find_or_create() keeps winning the race for the
> nfsd_file_fsnotify_group->mark_mutex against nfsd_file_mark_put()
> then it can soft lock up, since fsnotify_add_inode_mark() ends
> up always finding an existing entry.
> 
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> fs/nfsd/filecache.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 9c2b29e07975..f275c11c4e28 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -132,9 +132,13 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf)
> 						 struct nfsd_file_mark,
> 						 nfm_mark));
> 			mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex);
> -			fsnotify_put_mark(mark);
> -			if (likely(nfm))
> +			if (nfm) {
> +				fsnotify_put_mark(mark);
> 				break;
> +			}
> +			/* Avoid soft lockup race with nfsd_file_mark_put() */
> +			fsnotify_destroy_mark(mark, nfsd_file_fsnotify_group);
> +			fsnotify_put_mark(mark);
> 		} else
> 			mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex);
> 

I've tried to reproduce the lockup for three days with this patch
applied to my server. No lockup.

Tested-by: Chuck Lever <chuck.lever@oracle.com>


--
Chuck Lever




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CPU lockup in or near new filecache code
  2019-12-18 23:20         ` Chuck Lever
@ 2020-01-03 16:47           ` Bruce Fields
  2020-01-03 18:01             ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Bruce Fields @ 2020-01-03 16:47 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Jeff Layton, Linux NFS Mailing List

On Wed, Dec 18, 2019 at 06:20:56PM -0500, Chuck Lever wrote:
> > On Dec 13, 2019, at 3:12 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> > Does something like the following help?
> > 
> > 8<---------------------------------------------------
> > From caf515c82ed572e4f92ac8293e5da4818da0c6ce Mon Sep 17 00:00:00 2001
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > Date: Fri, 13 Dec 2019 15:07:33 -0500
> > Subject: [PATCH] nfsd: Fix a soft lockup race in
> > nfsd_file_mark_find_or_create()
> > 
> > If nfsd_file_mark_find_or_create() keeps winning the race for the
> > nfsd_file_fsnotify_group->mark_mutex against nfsd_file_mark_put()
> > then it can soft lock up, since fsnotify_add_inode_mark() ends
> > up always finding an existing entry.
> > 
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> > fs/nfsd/filecache.c | 8 ++++++--
> > 1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index 9c2b29e07975..f275c11c4e28 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -132,9 +132,13 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf)
> > 						 struct nfsd_file_mark,
> > 						 nfm_mark));
> > 			mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex);
> > -			fsnotify_put_mark(mark);
> > -			if (likely(nfm))
> > +			if (nfm) {
> > +				fsnotify_put_mark(mark);
> > 				break;
> > +			}
> > +			/* Avoid soft lockup race with nfsd_file_mark_put() */
> > +			fsnotify_destroy_mark(mark, nfsd_file_fsnotify_group);
> > +			fsnotify_put_mark(mark);
> > 		} else
> > 			mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex);
> > 
> 
> I've tried to reproduce the lockup for three days with this patch
> applied to my server. No lockup.
> 
> Tested-by: Chuck Lever <chuck.lever@oracle.com>

I'm applying this for 5.5 with Chuck's tested-by and:

    Fixes: 65294c1f2c5e "nfsd: add a new struct file caching facility to nfsd"

--b.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CPU lockup in or near new filecache code
  2020-01-03 16:47           ` Bruce Fields
@ 2020-01-03 18:01             ` Trond Myklebust
  2020-01-03 18:40               ` bfields
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2020-01-03 18:01 UTC (permalink / raw)
  To: bfields, chuck.lever; +Cc: jlayton, linux-nfs

On Fri, 2020-01-03 at 11:47 -0500, Bruce Fields wrote:
> On Wed, Dec 18, 2019 at 06:20:56PM -0500, Chuck Lever wrote:
> > > On Dec 13, 2019, at 3:12 PM, Trond Myklebust <
> > > trondmy@hammerspace.com> wrote:
> > > Does something like the following help?
> > > 
> > > 8<---------------------------------------------------
> > > From caf515c82ed572e4f92ac8293e5da4818da0c6ce Mon Sep 17 00:00:00
> > > 2001
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > Date: Fri, 13 Dec 2019 15:07:33 -0500
> > > Subject: [PATCH] nfsd: Fix a soft lockup race in
> > > nfsd_file_mark_find_or_create()
> > > 
> > > If nfsd_file_mark_find_or_create() keeps winning the race for the
> > > nfsd_file_fsnotify_group->mark_mutex against nfsd_file_mark_put()
> > > then it can soft lock up, since fsnotify_add_inode_mark() ends
> > > up always finding an existing entry.
> > > 
> > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > ---
> > > fs/nfsd/filecache.c | 8 ++++++--
> > > 1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index 9c2b29e07975..f275c11c4e28 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -132,9 +132,13 @@ nfsd_file_mark_find_or_create(struct
> > > nfsd_file *nf)
> > > 						 struct nfsd_file_mark,
> > > 						 nfm_mark));
> > > 			mutex_unlock(&nfsd_file_fsnotify_group-
> > > >mark_mutex);
> > > -			fsnotify_put_mark(mark);
> > > -			if (likely(nfm))
> > > +			if (nfm) {
> > > +				fsnotify_put_mark(mark);
> > > 				break;
> > > +			}
> > > +			/* Avoid soft lockup race with
> > > nfsd_file_mark_put() */
> > > +			fsnotify_destroy_mark(mark,
> > > nfsd_file_fsnotify_group);
> > > +			fsnotify_put_mark(mark);
> > > 		} else
> > > 			mutex_unlock(&nfsd_file_fsnotify_group-
> > > >mark_mutex);
> > > 
> > 
> > I've tried to reproduce the lockup for three days with this patch
> > applied to my server. No lockup.
> > 
> > Tested-by: Chuck Lever <chuck.lever@oracle.com>
> 
> I'm applying this for 5.5 with Chuck's tested-by and:
> 
>     Fixes: 65294c1f2c5e "nfsd: add a new struct file caching facility
> to nfsd"
> 

I've got more coming. We've been doing data integrity tests and have
hit some issues around error reporting that need to be fixed in knfsd
in order to avoid silent corruption of data.

I'll be sending a bulk patchset for 5.5 soon.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CPU lockup in or near new filecache code
  2020-01-03 18:01             ` Trond Myklebust
@ 2020-01-03 18:40               ` bfields
  0 siblings, 0 replies; 11+ messages in thread
From: bfields @ 2020-01-03 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: chuck.lever, jlayton, linux-nfs

On Fri, Jan 03, 2020 at 06:01:47PM +0000, Trond Myklebust wrote:
> I've got more coming. We've been doing data integrity tests and have
> hit some issues around error reporting that need to be fixed in knfsd
> in order to avoid silent corruption of data.
> 
> I'll be sending a bulk patchset for 5.5 soon.

OK, thanks.--b.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-01-03 18:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-10 16:27 CPU lockup in or near new filecache code Chuck Lever
2019-12-10 18:49 ` Bruce Fields
2019-12-10 20:45 ` Trond Myklebust
2019-12-11 18:14   ` Chuck Lever
2019-12-11 20:01     ` Chuck Lever
2019-12-13 20:12       ` Trond Myklebust
2019-12-13 20:26         ` Chuck Lever
2019-12-18 23:20         ` Chuck Lever
2020-01-03 16:47           ` Bruce Fields
2020-01-03 18:01             ` Trond Myklebust
2020-01-03 18:40               ` bfields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).