linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* a nfsd_file_free panic when shudown
@ 2022-12-22  6:15 Wang Yugui
  2022-12-22  8:15 ` Wang Yugui
  0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2022-12-22  6:15 UTC (permalink / raw)
  To: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 1718 bytes --]

Hi,

a nfsd_file_free panic when shudown.

This is a kernel 6.1.1 with some nfsd 6.2 pathes.
It happened when os shutdown.

but the reproducer is yet not clear.
we just gather the info of the attachment file.

the path list that applied to kernel 6.1.1.
Subject: nfsd: don't call nfsd_file_put from client states seqfile display
Subject: NFSD: Pass the target nfsd_file to nfsd_commit()
Subject: NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file
Subject: NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
Subject: nfsd: fix up the filecache laundrette scheduling
Subject: NFSD: Flesh out a documenting comment for filecache.c
Subject: NFSD: Clean up nfs4_preprocess_stateid_op() call sites
Subject: NFSD: Trace stateids returned via DELEGRETURN
Subject: NFSD: Trace delegation revocations
Subject: NFSD: Use const pointers as parameters to fh_ helpers
Subject: NFSD: Update file_hashtbl() helpers
Subject: NFSD: Clean up nfsd4_init_file()
Subject: NFSD: Add a nfsd4_file_hash_remove() helper
Subject: NFSD: Clean up find_or_add_file()
Subject: NFSD: Refactor find_file()
Subject: NFSD: Use rhashtable for managing nfs4_file objects
Subject: NFSD: Fix licensing header in filecache.c
Subject: nfsd: remove the pages_flushed statistic from filecache
Subject: nfsd: reorganize filecache.c
Subject: NFSD: Add an nfsd_file_fsync tracepoint
Subject: nfsd: rework refcounting in filecache
Subject: nfsd: under NFSv4.1, fix double svc_xprt_put on rpc_create failure
Subject: NFSD: fix use-after-free in __nfs42_ssc_open()


It happened just after 'Subject: nfsd: rework refcounting in filecache'
is added, so this patch maybe related.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/12/22

[-- Attachment #2: rpviewer.png --]
[-- Type: image/png, Size: 65438 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: a nfsd_file_free panic when shudown
  2022-12-22  6:15 a nfsd_file_free panic when shudown Wang Yugui
@ 2022-12-22  8:15 ` Wang Yugui
  2022-12-22  9:38   ` Wang Yugui
  0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2022-12-22  8:15 UTC (permalink / raw)
  To: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 2087 bytes --]

Hi,

> a nfsd_file_free panic when shudown.
> 
> This is a kernel 6.1.1 with some nfsd 6.2 pathes.
> It happened when os shutdown.
> 
> but the reproducer is yet not clear.
> we just gather the info of the attachment file.

Now I can reproduce it.
1)  'tail -f xxx' to keep a file is open from nfs4 client
2) 'shutdow -r now' the nfs server.

more panic info in the attachment files (panic-2.txt/panic-3.txt)

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/12/22

> 
> the path list that applied to kernel 6.1.1.
> Subject: nfsd: don't call nfsd_file_put from client states seqfile display
> Subject: NFSD: Pass the target nfsd_file to nfsd_commit()
> Subject: NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file
> Subject: NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
> Subject: nfsd: fix up the filecache laundrette scheduling
> Subject: NFSD: Flesh out a documenting comment for filecache.c
> Subject: NFSD: Clean up nfs4_preprocess_stateid_op() call sites
> Subject: NFSD: Trace stateids returned via DELEGRETURN
> Subject: NFSD: Trace delegation revocations
> Subject: NFSD: Use const pointers as parameters to fh_ helpers
> Subject: NFSD: Update file_hashtbl() helpers
> Subject: NFSD: Clean up nfsd4_init_file()
> Subject: NFSD: Add a nfsd4_file_hash_remove() helper
> Subject: NFSD: Clean up find_or_add_file()
> Subject: NFSD: Refactor find_file()
> Subject: NFSD: Use rhashtable for managing nfs4_file objects
> Subject: NFSD: Fix licensing header in filecache.c
> Subject: nfsd: remove the pages_flushed statistic from filecache
> Subject: nfsd: reorganize filecache.c
> Subject: NFSD: Add an nfsd_file_fsync tracepoint
> Subject: nfsd: rework refcounting in filecache
> Subject: nfsd: under NFSv4.1, fix double svc_xprt_put on rpc_create failure
> Subject: NFSD: fix use-after-free in __nfs42_ssc_open()
> 
> 
> It happened just after 'Subject: nfsd: rework refcounting in filecache'
> is added, so this patch maybe related.
> 
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/12/22

-- 
Wang Yugui <wangyugui@e16-tech.com>

[-- Attachment #2: panic-2.txt --]
[-- Type: application/octet-stream, Size: 11016 bytes --]

[root@T640 ~]# shutdown -r now
[  114.660887] ------------[ cut here ]------------
[  114.660892] refcount_t: underflow; use-after-free.
[  114.660922] WARNING: CPU: 54 PID: 3642 at lib/refcount.c:28 refcount_warn_saturate+0xc7/0x110
[  114.660942] Modules linked in: xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit rfkill libnvdimm x86_pkg_temp_thermal intel_powerclamp kvm_intel iTCO_wdt iTCO_vendor_support kvm vfat irqbypass acpi_power_meter fat rapl dell_smbios intel_cstate mei_me dcdbas acpi_ipmi wmi_bmof pcspkr intel_uncore dell_wmi_descriptor i2c_i801 mei lpc_ich i2c_smbus intel_pch_thermal ipmi_si nfsd ipmi_devintf ipmi_msghandler auth_rpcgss nfs_acl lockd grace ip_tables x_tables rpcrdma rdma_ucm ib_umad ib_srpt ib_isert ib_ipoib iscsi_target_mod target_core_mod btrfs ib_iser rdma_cm iw_cm ib_cm mlx5_ib blake2b_generic xor raid6_pq zstd_compress bnxt_re irdma sd_mod t10_pi ice sg ib_uverbs ib_core mlx5_core ahci
[  114.661068]  crct10dif_pclmul libahci pci_hyperv_intf crc32_pclmul bnx2x i40e bnxt_en crc32c_intel mlxfw libata psample megaraid_sas ghash_clmulni_intel mgag200 mdio wmi dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi [last unloaded: coretemp]
[  114.661117] CPU: 54 PID: 3642 Comm: rpc.nfsd Not tainted 6.1.1-2.el7.x86_64 #1
[  114.661124] Hardware name: Dell Inc. PowerEdge T640/0TWW5Y, BIOS 2.16.1 08/19/2022
[  114.661127] RIP: 0010:refcount_warn_saturate+0xc7/0x110
[  114.661136] Code: ac 5a 00 0f 0b c3 cc cc cc cc 80 3d f5 f3 60 02 00 0f 85 78 ff ff ff 48 c7 c7 48 6f a9 b0 c6 05 e1 f3 60 02 01 e8 b5 ac 5a 00 <0f> 0b c3 cc cc cc cc 80 3d cc f3 60 02 00 0f 85 51 ff ff ff 48 c7
[  114.661141] RSP: 0018:ffffaaa79f507bb8 EFLAGS: 00010286
[  114.661146] RAX: 0000000000000000 RBX: ffff8e702a207618 RCX: 0000000000000027
[  114.661149] RDX: 0000000000000027 RSI: ffff8ecd00edf860 RDI: ffff8ecd00edf868
[  114.661153] RBP: ffff8e702a210eb0 R08: ffffffffb12676a0 R09: ffffaaa79f507b50
[  114.661155] R10: 00000000fffc0768 R11: 0000000000000001 R12: ffff8e6f637c8000
[  114.661158] R13: ffffaaa79f507c10 R14: ffffaaa79f507c10 R15: ffff8e6f637c8058
[  114.661161] FS:  00007fcb3d84c840(0000) GS:ffff8ecd00ec0000(0000) knlGS:0000000000000000
[  114.661165] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  114.894573] CR2: 000055b0404ee008 CR3: 0000000129e68001 CR4: 00000000007706e0
[  114.894575] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  114.894576] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  114.894578] PKRU: 55555554
[  114.894579] Call Trace:
[  114.894581]  <TASK>
[  114.894583]  nfsd_file_free+0x20b/0x210 [nfsd]
[  114.894627]  destroy_unhashed_deleg+0x60/0x90 [nfsd]
[  114.932817]  __destroy_client+0x10a/0x2a0 [nfsd]
[  114.937483]  nfs4_state_shutdown_net+0x16c/0x310 [nfsd]
[  114.937513]  nfsd_shutdown_net+0x35/0x70 [nfsd]
[  114.937536]  nfsd_put+0x127/0x140 [nfsd]
[  114.951276]  nfsd_svc+0x285/0x350 [nfsd]
[  114.955240]  write_threads+0x97/0x100 [nfsd]
[  114.959546]  ? simple_transaction_get+0xc3/0xf0
[  114.959551]  ? write_pool_threads+0x230/0x230 [nfsd]
[  114.969083]  nfsctl_transaction_write+0x43/0x80 [nfsd]
[  114.974263]  vfs_write+0xe8/0x3f0
[  114.974267]  ? do_sys_openat2+0x1d5/0x310
[  114.981625]  ksys_write+0x59/0xd0
[  114.981627]  do_syscall_64+0x58/0x80
[  114.989478]  ? do_syscall_64+0x67/0x80
[  114.989482]  ? syscall_exit_to_user_mode+0x12/0x30
[  114.999132]  ? do_syscall_64+0x67/0x80
[  114.999134]  ? exc_page_fault+0x64/0x140
[  114.999137]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  115.013345] RIP: 0033:0x7fcb3caefba0
[  115.013348] Code: 73 01 c3 48 8b 0d d0 72 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 1d d4 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 7e cc 01 00 48 89 04 24
[  115.013349] RSP: 002b:00007ffe619f9df8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  OK  [  115.013351] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fcb3caefba0
] Started Show P[  115.013352] RDX: 0000000000000002 RSI: 000000000060a680 RDI: 0000000000000003
lymouth Reboot S[  115.013353] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fcb3ca4d2cd
creen.
[  115.013354] R10: 00007ffe619f91e0 R11: 0000000000000246 R12: 0000000000000000
[  115.013355] R13: 0000000000000000 R14: 000000000000001f R15: 0000000000020000
[  115.013357]  </TASK>
[  115.013358] ---[ end trace 0000000000000000 ]---
[  115.042956] nfsd: last server has exited, flushing export cache
[  OK  ] Stopped Virtual Machine and Container Registra[  115.108725] ------------[ cut here ]------------
tion Service.
[  115.113863] kernel BUG at mm/slub.c:419!
[  115.119346] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  115.124842] CPU: 50 PID: 0 Comm: swapper/50 Tainted: G        W          6.1.1-2.el7.x86_64 #1
[  115.133738] Hardware name: Dell Inc. PowerEdge T640/0TWW5Y, BIOS 2.16.1 08/19/2022
[  115.141567] RIP: 0010:__slab_free+0x24d/0x4e0
[  115.146195] Code: 48 8b 45 08 a8 01 75 11 0f 1f 44 00 00 48 89 e8 48 39 c5 0f 84 eb 00 00 00 48 c7 c6 d8 97 a5 b0 48 89 ef e8 a5 12 f9 ff 0f 0b <0f> 0b 48 8b 45 00 a9 00 00 01 00 74 af 48 8b 45 48 a8 01 74 a7 48
[  115.165349] RSP: 0018:ffffaaa799880e48 EFLAGS: 00010246
[  115.170850] RAX: ffff8e6f63645770 RBX: 0000000080200019 RCX: ffff8e6f63645700
[  115.178284] RDX: ffff8e6f63645700 RSI: fffff471448d9100 RDI: ffff8ece8000a800
[  115.185720] RBP: fffff471448d9100 R08: 0000000000000001 R09: ffffffffaf7899a9
[  115.193150] R10: ffff8e6f63645700 R11: 0000000000000000 R12: ffff8ece8000a800
[  115.200666] R13: ffff8ecd00e71980 R14: ffff8e6f63645700 R15: fffff471448d9120
[  115.208081] FS:  0000000000000000(0000) GS:ffff8ecd00e40000(0000) knlGS:0000000000000000
[  115.216453] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  115.222499] CR2: 00007fe8afc12288 CR3: 000000553f010002 CR4: 00000000007706e0
[  115.229923] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  115.237350] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  115.244770] PKRU: 55555554
[  115.247792] Call Trace:
[  115.250537]  <IRQ>
[  115.252852]  ? free_percpu+0x20e/0x3a0
[  115.256879]  ? rcu_do_batch+0x178/0x510
[  115.261026]  rcu_do_batch+0x169/0x510
[  115.264980]  rcu_core+0x252/0x3c0
[  115.268585]  ? sched_clock_cpu+0x9/0xb0
[  115.272773]  __do_softirq+0xc6/0x2c8
[  115.276673]  irq_exit_rcu+0xa8/0xd0
[  115.280470]  sysvec_apic_timer_interrupt+0x6e/0x90
[  115.285565]  </IRQ>
[  115.287961]  <TASK>
[  115.290353]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[  115.295784] RIP: 0010:cpuidle_enter_state+0xd6/0x420
[  115.301091] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 a5 da 89 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 01 03 00 00 31 ff e8 ee 50 90 ff fb 45 85 f6 <0f> 88 75 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49
[  115.320360] RSP: 0018:ffffaaa780717e88 EFLAGS: 00000206
[  115.325889] RAX: ffff8ecd00e70b40 RBX: 0000000000000003 RCX: 000000000000001f
[  115.333359] RDX: 0000001acd038c26 RSI: 000000003158b35b RDI: 0000000000000000
[  115.340795] RBP: ffffca4780e40568 R08: 0000000000000002 R09: 00000000000303c0
[  115.348248] R10: 00002c8904d93bce R11: ffff8ecd00e6f884 R12: 0000001acd038c26
[  115.355671] R13: ffffffffb2137ea0 R14: 0000000000000003 R15: 0000000000000000
[  115.363135]  ? cpuidle_enter_state+0xbb/0x420
[  115.367838]  cpuidle_enter+0x29/0x40
[  115.371782]  do_idle+0x243/0x290
[  115.375345]  cpu_startup_entry+0x19/0x20
[  115.379585]  start_secondary+0x10f/0x130
[  115.383829]  secondary_startup_64_no_verify+0xe5/0xeb
[  115.389189]  </TASK>
[  115.391666] Modules linked in: xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit rfkill libnvdimm x86_pkg_temp_thermal intel_powerclamp kvm_intel iTCO_wdt iTCO_vendor_support kvm vfat irqbypass acpi_power_meter fat rapl dell_smbios intel_cstate mei_me dcdbas acpi_ipmi wmi_bmof pcspkr intel_uncore dell_wmi_descriptor i2c_i801 mei lpc_ich i2c_smbus intel_pch_thermal ipmi_si nfsd ipmi_devintf ipmi_msghandler auth_rpcgss nfs_acl lockd grace ip_tables x_tables rpcrdma rdma_ucm ib_umad ib_srpt ib_isert ib_ipoib iscsi_target_mod target_core_mod btrfs ib_iser rdma_cm iw_cm ib_cm mlx5_ib blake2b_generic xor raid6_pq zstd_compress bnxt_re irdma sd_mod t10_pi ice sg ib_uverbs ib_core mlx5_core ahci
[  115.391717]  crct10dif_pclmul libahci pci_hyperv_intf crc32_pclmul bnx2x i40e bnxt_en crc32c_intel mlxfw libata psample megaraid_sas ghash_clmulni_intel mgag200 mdio wmi dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi [last unloaded: coretemp]
[  115.515756] ---[ end trace 0000000000000000 ]---
[  115.551718] RIP: 0010:__slab_free+0x24d/0x4e0
[  115.556431] Code: 48 8b 45 08 a8 01 75 11 0f 1f 44 00 00 48 89 e8 48 39 c5 0f 84 eb 00 00 00 48 c7 c6 d8 97 a5 b0 48 89 ef e8 a5 12 f9 ff 0f 0b <0f> 0b 48 8b 45 00 a9 00 00 01 00 74 af 48 8b 45 48 a8 01 74 a7 48
[  115.575680] RSP: 0018:ffffaaa799880e48 EFLAGS: 00010246
[  115.581173] RAX: ffff8e6f63645770 RBX: 0000000080200019 RCX: ffff8e6f63645700
[  115.588576] RDX: ffff8e6f63645700 RSI: fffff471448d9100 RDI: ffff8ece8000a800
[  115.595967] RBP: fffff471448d9100 R08: 0000000000000001 R09: ffffffffaf7899a9
[  115.603360] R10: ffff8e6f63645700 R11: 0000000000000000 R12: ffff8ece8000a800
[  115.610772] R13: ffff8ecd00e71980 R14: ffff8e6f63645700 R15: fffff471448d9120
[  115.618164] FS:  0000000000000000(0000) GS:ffff8ecd00e40000(0000) knlGS:0000000000000000
[  115.626518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  115.632547] CR2: 00007fe8afc12288 CR3: 000000553f010002 CR4: 00000000007706e0
[  115.639975] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  115.647404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  115.654805] PKRU: 55555554
[  115.657788] Kernel panic - not syncing: Fatal exception in interrupt
[  115.984472] Kernel Offset: 0x2e600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  116.027185] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

[-- Attachment #3: panic-3.txt --]
[-- Type: application/octet-stream, Size: 10206 bytes --]

[root@T640 ~]# shutdown -r now
[  100.394735] ------------[ cut here ]------------
[  100.394739] refcount_t: underflow; use-after-free.
[  100.394774] WARNING: CPU: 33 PID: 3645 at lib/refcount.c:28 refcount_warn_saturate+0xc7/0x110
[  100.394794] Modules linked in: xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp rfkill kvm_intel kvm iTCO_wdt irqbypass iTCO_vendor_support vfat acpi_power_meter rapl fat intel_cstate dell_smbios mei_me i2c_i801 dcdbas acpi_ipmi intel_uncore wmi_bmof dell_wmi_descriptor pcspkr mei lpc_ich i2c_smbus ipmi_si intel_pch_thermal ipmi_devintf ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace ip_tables x_tables rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser ib_umad btrfs rdma_cm ib_ipoib iw_cm ib_cm irdma blake2b_generic xor raid6_pq zstd_compress ice mlx5_ib bnxt_re sd_mod t10_pi ib_uverbs ib_core sg mlx5_core ahci
[  100.394916]  crct10dif_pclmul libahci pci_hyperv_intf crc32_pclmul bnx2x crc32c_intel i40e bnxt_en mlxfw libata psample megaraid_sas ghash_clmulni_intel mdio mgag200 wmi dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi [last unloaded: coretemp]
[  100.394965] CPU: 33 PID: 3645 Comm: rpc.nfsd Not tainted 6.1.1-2.el7.x86_64 #1
[  100.394971] Hardware name: Dell Inc. PowerEdge T640/0TWW5Y, BIOS 2.16.1 08/19/2022
[  100.394974] RIP: 0010:refcount_warn_saturate+0xc7/0x110
[  100.394983] Code: ac 5a 00 0f 0b c3 cc cc cc cc 80 3d f5 f3 60 02 00 0f 85 78 ff ff ff 48 c7 c7 48 6f 29 9c c6 05 e1 f3 60 02 01 e8 b5 ac 5a 00 <0f> 0b c3 cc cc cc cc 80 3d cc f3 60 02 00 0f 85 51 ff ff ff 48 c7
[  100.394988] RSP: 0018:ffffabec9f797be8 EFLAGS: 00010282
[  100.394993] RAX: 0000000000000000 RBX: ffff8ae65997c6e8 RCX: 0000000000000027
[  100.394997] RDX: 0000000000000027 RSI: ffff8b44bf81f860 RDI: ffff8b44bf81f868
[  100.395001] RBP: ffff8ae659979cd0 R08: ffffffff9ca676a0 R09: ffffabec9f797b80
[  100.395004] R10: 00000000fffc0768 R11: 0000000000000001 R12: ffff8ae68d0cc2a0
[  100.395007] R13: ffffabec9f797c40 R14: ffffabec9f797c40 R15: ffff8ae68d0cc2f8
[  100.395010] FS:  00007fba69472840(0000) GS:ffff8b44bf800000(0000) knlGS:0000000000000000
[  100.395014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  100.395017] CR2: 0000000001f7f5f8 CR3: 000000013bba8002 CR4: 00000000007706e0
[  100.395020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  100.395022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  100.395025] PKRU: 55555554
[  100.395027] Call Trace:
[  100.395030]  <TASK>
[  100.395033]  nfsd_file_free+0x20b/0x210 [nfsd]
[  100.395135]  destroy_unhashed_deleg+0x60/0x90 [nfsd]
[  100.395238]  __destroy_client+0x10a/0x2a0 [nfsd]
[  100.395334]  nfs4_state_shutdown_net+0x16c/0x310 [nfsd]
[  100.395432]  nfsd_shutdown_net+0x35/0x70 [nfsd]
[  100.395515]  nfsd_put+0x127/0x140 [nfsd]
[  100.395588]  nfsd_svc+0x285/0x350 [nfsd]
[  100.395660]  write_threads+0x97/0x100 [nfsd]
[  100.395732]  ? simple_transaction_get+0xc3/0xf0
[  100.395741]  ? write_pool_threads+0x230/0x230 [nfsd]
[  100.395813]  nfsctl_transaction_write+0x43/0x80 [nfsd]
[  100.395885]  vfs_write+0xe8/0x3f0
[  100.395894]  ksys_write+0x59/0xd0
[  100.395900]  do_syscall_64+0x58/0x80
[  100.395911]  ? do_syscall_64+0x67/0x80
[  100.395916]  ? syscall_exit_to_user_mode+0x12/0x30
[  100.395925]  ? do_syscall_64+0x67/0x80
[  100.395930]  ? do_syscall_64+0x67/0x80
[  100.395935]  ? exc_page_fault+0x64/0x140
[  100.395943]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  100.395951] RIP: 0033:0x7fba686efba0
[  100.395956] Code: 73 01 c3 48 8b 0d d0 72 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 1d d4 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 7e cc 01 00 48 89 04 24
[  100.395961] RSP: 002b:00007ffd1cad8b68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  100.395966] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fba686efba0
[  100.395969] RDX: 0000000000000002 RSI: 000000000060a680 RDI: 0000000000000003
[  100.395971] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fba6864d2cd
[  100.395973] R10: 00007ffd1cad7f60 R11: 0000000000000246 R12: 0000000000000000
[  100.395976] R13: 0000000000000000 R14: 000000000000001f R15: 0000000000020000
[  100.395981]  </TASK>
[  100.395983] ---[ end trace 0000000000000000 ]---
[  100.429671] nfsd: last server has exited, flushing export cache
[  100.854515] ------------[ cut here ]------------
[  100.859911] kernel BUG at mm/slub.c:419!
[  100.864590] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  100.870249] CPU: 35 PID: 195 Comm: ksoftirqd/35 Tainted: G        W          6.1.1-2.el7.x86_64 #1
[  100.879629] Hardware name: Dell Inc. PowerEdge T640/0TWW5Y, BIOS 2.16.1 08/19/2022
[  100.887461] RIP: 0010:__slab_free+0x24d/0x4e0
[  100.892119] Code: 48 8b 45 08 a8 01 75 11 0f 1f 44 00 00 48 89 e8 48 39 c5 0f 84 eb 00 00 00 48 c7 c6 d8 97 25 9c 48 89 ef e8 a5 12 f9 ff 0f 0b <0f> 0b 48 8b 45 00 a9 00 00 01 00 74 af 48 8b 45 48 a8 01 74 a7 48
[  100.892121] RSP: 0018:ffffabec991d3d40 EFLAGS: 00010246
[  100.892123] RAX: ffff8ae68cca0b70 RBX: 000000008020001a RCX: ffff8ae68cca0b00
[  100.892124] RDX: ffff8ae68cca0b00 RSI: ffffefd2c2332800 RDI: ffff8ae64000b700
[  100.892125] RBP: ffffefd2c2332800 R08: 0000000000000001 R09: ffffffff9af899a9
[  100.892126] R10: ffff8ae68cca0b00 R11: 0000000000000000 R12: ffff8ae64000b700
[  100.892127] R13: ffff8b44bf871980 R14: ffff8ae68cca0b00 R15: ffffefd2c2332820
[  100.892128] FS:  0000000000000000(0000) GS:ffff8b44bf840000(0000) knlGS:0000000000000000
[  100.892129] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  100.892130] CR2: 00007f86487e7000 CR3: 00000039d5e10006 CR4: 00000000007706e0
[  100.892131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  100.892131] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  100.892132] PKRU: 55555554
[  100.892133] Call Trace:
[  100.892134]  <TASK>
[  100.892135]  ? update_load_avg+0x7e/0x730
[  100.892142]  ? __queue_work+0x141/0x410
[  100.892146]  ? set_next_entity+0xdf/0x190
[  100.892148]  ? rcu_do_batch+0x178/0x510
[  100.892154]  rcu_do_batch+0x169/0x510
[  100.892156]  rcu_core+0x252/0x3c0
[  100.892158]  __do_softirq+0xc6/0x2c8
[  100.892163]  ? smpboot_thread_fn+0x2b/0x220
[  101.032784]  ? smpboot_thread_fn+0x61/0x220
[  101.032787]  ? smpboot_thread_fn+0x111/0x220
[  101.032790]  run_ksoftirqd+0x1e/0x40
[  101.032793]  smpboot_thread_fn+0x109/0x220
[  101.032796]  ? find_next_bit+0x10/0x10
[  101.032799]  kthread+0xe3/0x110
[  101.058431]  ? kthread_complete_and_exit+0x20/0x20
[  101.058433]  ret_from_fork+0x1f/0x30
[  101.058440]  </TASK>
[  101.058441] Modules linked in: xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp rfkill kvm_intel kvm iTCO_wdt irqbypass iTCO_vendor_support vfat acpi_power_meter rapl fat intel_cstate dell_smbios mei_me i2c_i801 dcdbas acpi_ipmi intel_uncore wmi_bmof dell_wmi_descriptor pcspkr mei lpc_ich i2c_smbus ipmi_si intel_pch_thermal ipmi_devintf ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace ip_tables x_tables rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser ib_umad btrfs rdma_cm ib_ipoib iw_cm ib_cm irdma blake2b_generic xor raid6_pq zstd_compress ice mlx5_ib bnxt_re sd_mod t10_pi ib_uverbs ib_core sg mlx5_core ahci
[  101.058488]  crct10dif_pclmul libahci pci_hyperv_intf crc32_pclmul bnx2x crc32c_intel i40e bnxt_en mlxfw libata psample megaraid_sas ghash_clmulni_intel mdio mgag200 wmi dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi [last unloaded: coretemp]
[  101.058515] ---[ end trace 0000000000000000 ]---
[  OK  ] Started Show Plymouth Reboot S[  101.230443] RIP: 0010:__slab_free+0x24d/0x4e0
creen.
[  101.236200] Code: 48 8b 45 08 a8 01 75 11 0f 1f 44 00 00 48 89 e8 48 39 c5 0f 84 eb 00 00 00 48 c7 c6 d8 97 25 9c 48 89 ef e8 a5 12 f9 ff 0f 0b <0f> 0b 48 8b 45 00 a9 00 00 01 00 74 af 48 8b 45 48 a8 01 74 a7 48
[  101.255776] RSP: 0018:ffffabec991d3d40 EFLAGS: 00010246
[  101.261290] RAX: ffff8ae68cca0b70 RBX: 000000008020001a RCX: ffff8ae68cca0b00
[  101.268708] RDX: ffff8ae68cca0b00 RSI: ffffefd2c2332800 RDI: ffff8ae64000b700
[  101.276127] RBP: ffffefd2c2332800 R08: 0000000000000001 R09: ffffffff9af899a9
[  101.283545] R10: ffff8ae68cca0b00 R11: 0000000000000000 R12: ffff8ae64000b700
[  101.290956] R13: ffff8b44bf871980 R14: ffff8ae68cca0b00 R15: ffffefd2c2332820
[  101.298365] FS:  0000000000000000(0000) GS:ffff8b44bf840000(0000) knlGS:0000000000000000
[  101.306734] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  101.312751] CR2: 00007f86487e7000 CR3: 00000039d5e10006 CR4: 00000000007706e0
[  101.320160] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  101.327571] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  101.334980] PKRU: 55555554
[  101.337971] Kernel panic - not syncing: Fatal exception in interrupt
[  101.744765] Kernel Offset: 0x19e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  101.761270] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---












^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: a nfsd_file_free panic when shudown
  2022-12-22  8:15 ` Wang Yugui
@ 2022-12-22  9:38   ` Wang Yugui
  2022-12-22 11:18     ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2022-12-22  9:38 UTC (permalink / raw)
  To: linux-nfs, Jeff Layton

Hi, Jeff Layton 

> Hi,
> > a nfsd_file_free panic when shudown.
> > 
> > This is a kernel 6.1.1 with some nfsd 6.2 pathes.
> > It happened when os shutdown.
> > 
> > but the reproducer is yet not clear.
> > we just gather the info of the attachment file.
> 
> Now I can reproduce it.
> 1)  'tail -f xxx' to keep a file is open from nfs4 client
> 2) 'shutdow -r now' the nfs server.
> 
> more panic info in the attachment files (panic-2.txt/panic-3.txt)
> 
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/12/22
> 
> > 
> > the path list that applied to kernel 6.1.1.
> > Subject: nfsd: don't call nfsd_file_put from client states seqfile display
> > Subject: NFSD: Pass the target nfsd_file to nfsd_commit()
> > Subject: NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file
> > Subject: NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
> > Subject: nfsd: fix up the filecache laundrette scheduling
> > Subject: NFSD: Flesh out a documenting comment for filecache.c
> > Subject: NFSD: Clean up nfs4_preprocess_stateid_op() call sites
> > Subject: NFSD: Trace stateids returned via DELEGRETURN
> > Subject: NFSD: Trace delegation revocations
> > Subject: NFSD: Use const pointers as parameters to fh_ helpers
> > Subject: NFSD: Update file_hashtbl() helpers
> > Subject: NFSD: Clean up nfsd4_init_file()
> > Subject: NFSD: Add a nfsd4_file_hash_remove() helper
> > Subject: NFSD: Clean up find_or_add_file()
> > Subject: NFSD: Refactor find_file()
> > Subject: NFSD: Use rhashtable for managing nfs4_file objects
> > Subject: NFSD: Fix licensing header in filecache.c
> > Subject: nfsd: remove the pages_flushed statistic from filecache
> > Subject: nfsd: reorganize filecache.c
> > Subject: NFSD: Add an nfsd_file_fsync tracepoint
> > Subject: nfsd: rework refcounting in filecache
> > Subject: nfsd: under NFSv4.1, fix double svc_xprt_put on rpc_create failure
> > Subject: NFSD: fix use-after-free in __nfs42_ssc_open()
> > 
> > 
> > It happened just after 'Subject: nfsd: rework refcounting in filecache'
> > is added, so this patch maybe related.

It is confirmed that this panic is caused by the patch
'nfsd: rework refcounting in filecache'.

the problem is 100% reproduced when this patch is applied.
the problem is yet not reproduced when this patch is not applied.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/12/22



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: a nfsd_file_free panic when shudown
  2022-12-22  9:38   ` Wang Yugui
@ 2022-12-22 11:18     ` Jeff Layton
  2022-12-22 13:06       ` Wang Yugui
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2022-12-22 11:18 UTC (permalink / raw)
  To: Wang Yugui, linux-nfs; +Cc: Chuck Lever

On Thu, 2022-12-22 at 17:38 +0800, Wang Yugui wrote:
> Hi, Jeff Layton 
> 
> > Hi,
> > > a nfsd_file_free panic when shudown.
> > > 
> > > This is a kernel 6.1.1 with some nfsd 6.2 pathes.
> > > It happened when os shutdown.
> > > 
> > > but the reproducer is yet not clear.
> > > we just gather the info of the attachment file.
> > 
> > Now I can reproduce it.
> > 1)  'tail -f xxx' to keep a file is open from nfs4 client
> > 2) 'shutdow -r now' the nfs server.
> > 
> > more panic info in the attachment files (panic-2.txt/panic-3.txt)
> > 
> > Best Regards
> > Wang Yugui (wangyugui@e16-tech.com)
> > 2022/12/22
> > 
> > > 
> > > the path list that applied to kernel 6.1.1.
> > > Subject: nfsd: don't call nfsd_file_put from client states seqfile display
> > > Subject: NFSD: Pass the target nfsd_file to nfsd_commit()
> > > Subject: NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file
> > > Subject: NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
> > > Subject: nfsd: fix up the filecache laundrette scheduling
> > > Subject: NFSD: Flesh out a documenting comment for filecache.c
> > > Subject: NFSD: Clean up nfs4_preprocess_stateid_op() call sites
> > > Subject: NFSD: Trace stateids returned via DELEGRETURN
> > > Subject: NFSD: Trace delegation revocations
> > > Subject: NFSD: Use const pointers as parameters to fh_ helpers
> > > Subject: NFSD: Update file_hashtbl() helpers
> > > Subject: NFSD: Clean up nfsd4_init_file()
> > > Subject: NFSD: Add a nfsd4_file_hash_remove() helper
> > > Subject: NFSD: Clean up find_or_add_file()
> > > Subject: NFSD: Refactor find_file()
> > > Subject: NFSD: Use rhashtable for managing nfs4_file objects
> > > Subject: NFSD: Fix licensing header in filecache.c
> > > Subject: nfsd: remove the pages_flushed statistic from filecache
> > > Subject: nfsd: reorganize filecache.c
> > > Subject: NFSD: Add an nfsd_file_fsync tracepoint
> > > Subject: nfsd: rework refcounting in filecache
> > > Subject: nfsd: under NFSv4.1, fix double svc_xprt_put on rpc_create failure
> > > Subject: NFSD: fix use-after-free in __nfs42_ssc_open()
> > > 
> > > 
> > > It happened just after 'Subject: nfsd: rework refcounting in filecache'
> > > is added, so this patch maybe related.
> 
> It is confirmed that this panic is caused by the patch
> 'nfsd: rework refcounting in filecache'.
> 
> the problem is 100% reproduced when this patch is applied.
> the problem is yet not reproduced when this patch is not applied.
> 
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/12/22
> 

Thanks for the bug report! This patch seems to fix the problem for me.
Can you test it and let me know whether it also fixes it for you? If so,
I'll send this to Chuck and the list for inclusion soon.

Thanks!
-- 
Jeff Layton <jlayton@kernel.org>

--------------------8<--------------------------

[PATCH] nfsd: shut down the NFSv4 state objects before the filecache

Currently, we shut down the filecache before trying to clean up the
stateids that depend on it, which can lead to an oops at shutdown time
due to a refcount imbalance. Change the shutdown procedures to bring
down the state handling infrastructure prior to shutting down the
filecache.

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfssvc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 56fba1cba3af..325d3d3f1211 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -453,8 +453,8 @@ static void nfsd_shutdown_net(struct net *net)
 {
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
-	nfsd_file_cache_shutdown_net(net);
 	nfs4_state_shutdown_net(net);
+	nfsd_file_cache_shutdown_net(net);
 	if (nn->lockd_up) {
 		lockd_down(net);
 		nn->lockd_up = false;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: a nfsd_file_free panic when shudown
  2022-12-22 11:18     ` Jeff Layton
@ 2022-12-22 13:06       ` Wang Yugui
  0 siblings, 0 replies; 5+ messages in thread
From: Wang Yugui @ 2022-12-22 13:06 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs, Chuck Lever

Hi,


> On Thu, 2022-12-22 at 17:38 +0800, Wang Yugui wrote:
> > Hi, Jeff Layton 
> > 
> > > Hi,
> > > > a nfsd_file_free panic when shudown.
> > > > 
> > > > This is a kernel 6.1.1 with some nfsd 6.2 pathes.
> > > > It happened when os shutdown.
> > > > 
> > > > but the reproducer is yet not clear.
> > > > we just gather the info of the attachment file.
> > > 
> > > Now I can reproduce it.
> > > 1)  'tail -f xxx' to keep a file is open from nfs4 client
> > > 2) 'shutdow -r now' the nfs server.
> > > 
> > > more panic info in the attachment files (panic-2.txt/panic-3.txt)
> > > 
> > > > 
> > > > It happened just after 'Subject: nfsd: rework refcounting in filecache'
> > > > is added, so this patch maybe related.
> > 
> > It is confirmed that this panic is caused by the patch
> > 'nfsd: rework refcounting in filecache'.
> > 
> > the problem is 100% reproduced when this patch is applied.
> > the problem is yet not reproduced when this patch is not applied.
> > 
> > Best Regards
> > Wang Yugui (wangyugui@e16-tech.com)
> > 2022/12/22
> > 
> 
> Thanks for the bug report! This patch seems to fix the problem for me.
> Can you test it and let me know whether it also fixes it for you? If so,
> I'll send this to Chuck and the list for inclusion soon.

This new patch fix the problem here too.  Thanks a lot.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/12/22


> 
> Thanks!
> --?
> Jeff Layton <jlayton@kernel.org>
> 
> --------------------8<--------------------------
> 
> [PATCH] nfsd: shut down the NFSv4 state objects before the filecache
> 
> Currently, we shut down the filecache before trying to clean up the
> stateids that depend on it, which can lead to an oops at shutdown time
> due to a refcount imbalance. Change the shutdown procedures to bring
> down the state handling infrastructure prior to shutting down the
> filecache.
> 
> Reported-by: Wang Yugui <wangyugui@e16-tech.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/nfsd/nfssvc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 56fba1cba3af..325d3d3f1211 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -453,8 +453,8 @@ static void nfsd_shutdown_net(struct net *net)
>  {
>  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>  
> -	nfsd_file_cache_shutdown_net(net);
>  	nfs4_state_shutdown_net(net);
> +	nfsd_file_cache_shutdown_net(net);
>  	if (nn->lockd_up) {
>  		lockd_down(net);
>  		nn->lockd_up = false;
> -- 
> 2.38.1
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-12-22 13:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-22  6:15 a nfsd_file_free panic when shudown Wang Yugui
2022-12-22  8:15 ` Wang Yugui
2022-12-22  9:38   ` Wang Yugui
2022-12-22 11:18     ` Jeff Layton
2022-12-22 13:06       ` Wang Yugui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).