linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge
@ 2023-01-19 19:20 Jeff Layton
  2023-01-19 22:59 ` Chuck Lever III
  2023-01-20 18:36 ` Jeff Layton
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff Layton @ 2023-01-19 19:20 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Ruben Vestergaard, Torkil Svensgaard, Shachar Kagan, linux-nfs,
	linux-kernel

nfsd_file_cache_purge is called when the server is shutting down, in
which case, tearing things down is generally fine, but it also gets
called when the exports cache is flushed.

Instead of walking the cache and freeing everything unconditionally,
attempt to unhash each entry and get a reference to it, and then put
that reference. Only queue it to the dispose list if the refcount goes
to 0.

Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
Reported-by: Shachar Kagan <skagan@nvidia.com>
Signed-off-by: Jeffrey Layton <jlayton@kernel.org>
---
 fs/nfsd/filecache.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

I was able to reproduce this today by running pynfs against the server
while running "exportfs -ra" in a loop. This one is a bit different in
that it happened in the open codepath, but that's probably just because
delegations are more likely to be long-lived. With this patch, the
server survives the run:

[  337.962027] ------------[ cut here ]------------
[  337.963823] refcount_t: underflow; use-after-free.
[  337.965502] WARNING: CPU: 6 PID: 3401 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
[  337.967999] Modules linked in: nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ext4(E) crc16(E) cirrus(E) kvm_intel(E) 9p(E) mbcache(E) joydev(E) virtio_net(E) drm_shmem_helper(E) net_failover(E) kvm(E) jbd2(E) netfs(E) psmouse(E) evdev(E) pcspkr(E) failover(E) irqbypass(E) virtio_balloon(E) drm_kms_helper(E) 9pnet_virtio(E) button(E) drm(E) configfs(E) zram(E) zsmalloc(E) crct10dif_pclmul(E) crc32_pclmul(E) nvme(E) ghash_clmulni_intel(E) virtio_blk(E) sha512_ssse3(E) sha512_generic(E) nvme_core(E) t10_pi(E) virtio_pci(E) virtio(E) crc64_rocksoft_generic(E) aesni_intel(E) crypto_simd(E) crc64_rocksoft(E) virtio_pci_legacy_dev(E) i6300esb(E) cryptd(E) serio_raw(E) crc64(E) virtio_pci_modern_dev(E) virtio_ring(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) crc32c_intel(E) autofs4(E)
[  337.992040] CPU: 6 PID: 3401 Comm: nfsd Tainted: G            E      6.2.0-rc3+ #11
[  337.994701] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
[  337.998046] RIP: 0010:refcount_warn_saturate+0xba/0x110
[  337.999852] Code: 01 01 e8 83 e5 4f 00 0f 0b c3 cc cc cc cc 80 3d 60 f4 05 01 00 75 85 48 c7 c7 30 b5 e1 9d c6 05 50 f4 05 01 01 e8 60 e5 4f 00 <0f> 0b c3 cc cc cc cc 80 3d 3b f4 05 01 00 0f 85 5e ff ff ff 48 c7
[  338.005245] RSP: 0018:ffffa36802e4bd50 EFLAGS: 00010282
[  338.006621] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000000
[  338.008273] RDX: 0000000000000001 RSI: ffffffff9de03ef5 RDI: 00000000ffffffff
[  338.009804] RBP: 0000000000000003 R08: 0000000000000000 R09: ffffa36802e4bc00
[  338.011719] R10: 0000000000000003 R11: ffffffff9e0bfdc8 R12: ffff9578da461b80
[  338.013533] R13: 0000000000000001 R14: ffff9578da422280 R15: ffff9578da461b80
[  338.015238] FS:  0000000000000000(0000) GS:ffff957a37d00000(0000) knlGS:0000000000000000
[  338.017179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  338.018680] CR2: 00007f324c1e1c08 CR3: 000000020360a004 CR4: 0000000000060ee0
[  338.020377] Call Trace:
[  338.021190]  <TASK>
[  338.021956]  release_all_access+0x96/0x120 [nfsd]
[  338.023192]  ? _raw_spin_unlock+0x15/0x30
[  338.024192]  nfsd4_close+0x275/0x3d0 [nfsd]
[  338.025468]  ? fh_verify+0x45e/0x780 [nfsd]
[  338.027535]  ? __pfx_nfsd4_encode_noop+0x10/0x10 [nfsd]
[  338.028775]  ? nfsd4_encode_operation+0xae/0x280 [nfsd]
[  338.030593]  nfsd4_proc_compound+0x3ae/0x6f0 [nfsd]
[  338.032341]  nfsd_dispatch+0x16a/0x270 [nfsd]
[  338.034667]  svc_process_common+0x2eb/0x660 [sunrpc]
[  338.036614]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[  338.038827]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[  338.040267]  svc_process+0xad/0x100 [sunrpc]
[  338.041981]  nfsd+0xd5/0x190 [nfsd]
[  338.043362]  kthread+0xe9/0x110
[  338.044680]  ? __pfx_kthread+0x10/0x10
[  338.046376]  ret_from_fork+0x2c/0x50
[  338.047892]  </TASK>
[  338.049067] ---[ end trace 0000000000000000 ]---
[  760.792789] BUG: kernel NULL pointer dereference, address: 0000000000000078
[  760.795933] #PF: supervisor read access in kernel mode
[  760.797477] #PF: error_code(0x0000) - not-present page
[  760.799120] PGD 0 P4D 0
[  760.800140] Oops: 0000 [#1] PREEMPT SMP PTI
[  760.801383] CPU: 2 PID: 3401 Comm: nfsd Tainted: G        W   E      6.2.0-rc3+ #11
[  760.803120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
[  760.805018] RIP: 0010:filp_close+0x23/0x70
[  760.806099] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 48 8b 47 38 48 85 c0 0f 84 41 e1 6d 00 48 8b 47 28 48 89 fb 48 89 f5 45 31 e4 <48> 8b 40 78 48 85 c0 74 08 e8 6f 70 72 00 41 89 c4 f6 43 45 40 75
[  760.809737] RSP: 0018:ffffa36802e4bc78 EFLAGS: 00010246
[  760.811084] RAX: 0000000000000000 RBX: ffff9578c7d4d600 RCX: 0000000000000000
[  760.812540] RDX: 000000000000098d RSI: 0000000000000000 RDI: ffff9578c7d4d600
[  760.814433] RBP: 0000000000000000 R08: 0000011335048e60 R09: ffff9578f82f1540
[  760.816089] R10: ffffa36802e4bcd0 R11: ffffa36802e4bcd8 R12: 0000000000000000
[  760.817529] R13: 0000000000000001 R14: dead000000000100 R15: ffff9578f82f1558
[  760.818982] FS:  0000000000000000(0000) GS:ffff957a37c80000(0000) knlGS:0000000000000000
[  760.820544] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  760.821734] CR2: 0000000000000078 CR3: 00000001565ce002 CR4: 0000000000060ee0
[  760.823141] Call Trace:
[  760.823808]  <TASK>
[  760.824419]  nfsd_file_free+0xe9/0x210 [nfsd]
[  760.825610]  release_all_access+0x96/0x120 [nfsd]
[  760.826680]  nfs4_free_ol_stateid+0x22/0x60 [nfsd]
[  760.827747]  free_ol_stateid_reaplist+0x61/0x90 [nfsd]
[  760.828858]  release_openowner+0x258/0x2a0 [nfsd]
[  760.829792]  __destroy_client+0x183/0x290 [nfsd]
[  760.830694]  nfsd4_setclientid_confirm+0x1a3/0x4f0 [nfsd]
[  760.831763]  nfsd4_proc_compound+0x3ae/0x6f0 [nfsd]
[  760.832717]  nfsd_dispatch+0x16a/0x270 [nfsd]
[  760.833576]  svc_process_common+0x2eb/0x660 [sunrpc]
[  760.834587]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[  760.835576]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[  760.836462]  svc_process+0xad/0x100 [sunrpc]
[  760.837317]  nfsd+0xd5/0x190 [nfsd]
[  760.838133]  kthread+0xe9/0x110
[  760.838862]  ? __pfx_kthread+0x10/0x10
[  760.839755]  ret_from_fork+0x2c/0x50
[  760.840534]  </TASK>
[  760.841167] Modules linked in: nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ext4(E) crc16(E) cirrus(E) kvm_intel(E) 9p(E) mbcache(E) joydev(E) virtio_net(E) drm_shmem_helper(E) net_failover(E) kvm(E) jbd2(E) netfs(E) psmouse(E) evdev(E) pcspkr(E) failover(E) irqbypass(E) virtio_balloon(E) drm_kms_helper(E) 9pnet_virtio(E) button(E) drm(E) configfs(E) zram(E) zsmalloc(E) crct10dif_pclmul(E) crc32_pclmul(E) nvme(E) ghash_clmulni_intel(E) virtio_blk(E) sha512_ssse3(E) sha512_generic(E) nvme_core(E) t10_pi(E) virtio_pci(E) virtio(E) crc64_rocksoft_generic(E) aesni_intel(E) crypto_simd(E) crc64_rocksoft(E) virtio_pci_legacy_dev(E) i6300esb(E) cryptd(E) serio_raw(E) crc64(E) virtio_pci_modern_dev(E) virtio_ring(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) crc32c_intel(E) autofs4(E)
[  760.853527] CR2: 0000000000000078
[  760.854340] ---[ end trace 0000000000000000 ]---
[  760.855261] RIP: 0010:filp_close+0x23/0x70
[  760.856185] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 48 8b 47 38 48 85 c0 0f 84 41 e1 6d 00 48 8b 47 28 48 89 fb 48 89 f5 45 31 e4 <48> 8b 40 78 48 85 c0 74 08 e8 6f 70 72 00 41 89 c4 f6 43 45 40 75
[  760.859350] RSP: 0018:ffffa36802e4bc78 EFLAGS: 00010246
[  760.860356] RAX: 0000000000000000 RBX: ffff9578c7d4d600 RCX: 0000000000000000
[  760.861628] RDX: 000000000000098d RSI: 0000000000000000 RDI: ffff9578c7d4d600
[  760.862898] RBP: 0000000000000000 R08: 0000011335048e60 R09: ffff9578f82f1540
[  760.864172] R10: ffffa36802e4bcd0 R11: ffffa36802e4bcd8 R12: 0000000000000000
[  760.865438] R13: 0000000000000001 R14: dead000000000100 R15: ffff9578f82f1558
[  760.866692] FS:  0000000000000000(0000) GS:ffff957a37c80000(0000) knlGS:0000000000000000
[  760.868053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  760.869102] CR2: 0000000000000078 CR3: 00000001565ce002 CR4: 0000000000060ee0

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 66449c41b19c..b9100707d347 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -908,9 +908,24 @@ __nfsd_file_cache_purge(struct net *net)
 		nf = rhashtable_walk_next(&iter);
 		while (!IS_ERR_OR_NULL(nf)) {
 			if (!net || nf->nf_net == net) {
-				nfsd_file_unhash(nf);
-				nfsd_file_lru_remove(nf);
-				list_add(&nf->nf_lru, &dispose);
+				/* Ignore it if it's no longer hashed */
+				if (!nfsd_file_unhash(nf))
+					continue;
+
+				/*
+				 * Try to claim the LRU ref (if any). If it's
+				 * not on the LRU, then try to take a ref. If that
+				 * fails, then we'll ignore it.
+				 */
+				if (!nfsd_file_lru_remove(nf))
+					nf = nfsd_file_get(nf);
+
+				/*
+				 * Now try to put the ref we hold. If that works
+				 * and it goes to zero, queue it to be freed.
+				 */
+				if (nf && refcount_dec_and_test(&nf->nf_ref))
+					list_add(&nf->nf_lru, &dispose);
 			}
 			nf = rhashtable_walk_next(&iter);
 		}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge
  2023-01-19 19:20 [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge Jeff Layton
@ 2023-01-19 22:59 ` Chuck Lever III
  2023-01-19 23:29   ` Jeff Layton
  2023-01-20 18:36 ` Jeff Layton
  1 sibling, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2023-01-19 22:59 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Ruben Vestergaard, Torkil Svensgaard, Shachar Kagan,
	Linux NFS Mailing List, linux-kernel



> On Jan 19, 2023, at 2:20 PM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> nfsd_file_cache_purge is called when the server is shutting down, in
> which case, tearing things down is generally fine, but it also gets
> called when the exports cache is flushed.
> 
> Instead of walking the cache and freeing everything unconditionally,
> attempt to unhash each entry and get a reference to it, and then put
> that reference. Only queue it to the dispose list if the refcount goes
> to 0.
> 
> Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
> Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
> Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
> Reported-by: Shachar Kagan <skagan@nvidia.com>
> Signed-off-by: Jeffrey Layton <jlayton@kernel.org>

FYI, checkpatch complained that the Signed-off-by does not
exactly match the From: address. I haven't see "Jeffrey"
before, but the DKIM check passed, so....

.... I've applied this to nfsd-fixes (formerly for-rc).


> ---
> fs/nfsd/filecache.c | 21 ++++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
> 
> I was able to reproduce this today by running pynfs against the server
> while running "exportfs -ra" in a loop. This one is a bit different in
> that it happened in the open codepath, but that's probably just because
> delegations are more likely to be long-lived. With this patch, the
> server survives the run:
> 
> [  337.962027] ------------[ cut here ]------------
> [  337.963823] refcount_t: underflow; use-after-free.
> [  337.965502] WARNING: CPU: 6 PID: 3401 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
> [  337.967999] Modules linked in: nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ext4(E) crc16(E) cirrus(E) kvm_intel(E) 9p(E) mbcache(E) joydev(E) virtio_net(E) drm_shmem_helper(E) net_failover(E) kvm(E) jbd2(E) netfs(E) psmouse(E) evdev(E) pcspkr(E) failover(E) irqbypass(E) virtio_balloon(E) drm_kms_helper(E) 9pnet_virtio(E) button(E) drm(E) configfs(E) zram(E) zsmalloc(E) crct10dif_pclmul(E) crc32_pclmul(E) nvme(E) ghash_clmulni_intel(E) virtio_blk(E) sha512_ssse3(E) sha512_generic(E) nvme_core(E) t10_pi(E) virtio_pci(E) virtio(E) crc64_rocksoft_generic(E) aesni_intel(E) crypto_simd(E) crc64_rocksoft(E) virtio_pci_legacy_dev(E) i6300esb(E) cryptd(E) serio_raw(E) crc64(E) virtio_pci_modern_dev(E) virtio_ring(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) crc32c_intel(E) autofs4(E)
> [  337.992040] CPU: 6 PID: 3401 Comm: nfsd Tainted: G            E      6.2.0-rc3+ #11
> [  337.994701] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
> [  337.998046] RIP: 0010:refcount_warn_saturate+0xba/0x110
> [  337.999852] Code: 01 01 e8 83 e5 4f 00 0f 0b c3 cc cc cc cc 80 3d 60 f4 05 01 00 75 85 48 c7 c7 30 b5 e1 9d c6 05 50 f4 05 01 01 e8 60 e5 4f 00 <0f> 0b c3 cc cc cc cc 80 3d 3b f4 05 01 00 0f 85 5e ff ff ff 48 c7
> [  338.005245] RSP: 0018:ffffa36802e4bd50 EFLAGS: 00010282
> [  338.006621] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000000
> [  338.008273] RDX: 0000000000000001 RSI: ffffffff9de03ef5 RDI: 00000000ffffffff
> [  338.009804] RBP: 0000000000000003 R08: 0000000000000000 R09: ffffa36802e4bc00
> [  338.011719] R10: 0000000000000003 R11: ffffffff9e0bfdc8 R12: ffff9578da461b80
> [  338.013533] R13: 0000000000000001 R14: ffff9578da422280 R15: ffff9578da461b80
> [  338.015238] FS:  0000000000000000(0000) GS:ffff957a37d00000(0000) knlGS:0000000000000000
> [  338.017179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  338.018680] CR2: 00007f324c1e1c08 CR3: 000000020360a004 CR4: 0000000000060ee0
> [  338.020377] Call Trace:
> [  338.021190]  <TASK>
> [  338.021956]  release_all_access+0x96/0x120 [nfsd]
> [  338.023192]  ? _raw_spin_unlock+0x15/0x30
> [  338.024192]  nfsd4_close+0x275/0x3d0 [nfsd]
> [  338.025468]  ? fh_verify+0x45e/0x780 [nfsd]
> [  338.027535]  ? __pfx_nfsd4_encode_noop+0x10/0x10 [nfsd]
> [  338.028775]  ? nfsd4_encode_operation+0xae/0x280 [nfsd]
> [  338.030593]  nfsd4_proc_compound+0x3ae/0x6f0 [nfsd]
> [  338.032341]  nfsd_dispatch+0x16a/0x270 [nfsd]
> [  338.034667]  svc_process_common+0x2eb/0x660 [sunrpc]
> [  338.036614]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> [  338.038827]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> [  338.040267]  svc_process+0xad/0x100 [sunrpc]
> [  338.041981]  nfsd+0xd5/0x190 [nfsd]
> [  338.043362]  kthread+0xe9/0x110
> [  338.044680]  ? __pfx_kthread+0x10/0x10
> [  338.046376]  ret_from_fork+0x2c/0x50
> [  338.047892]  </TASK>
> [  338.049067] ---[ end trace 0000000000000000 ]---
> [  760.792789] BUG: kernel NULL pointer dereference, address: 0000000000000078
> [  760.795933] #PF: supervisor read access in kernel mode
> [  760.797477] #PF: error_code(0x0000) - not-present page
> [  760.799120] PGD 0 P4D 0
> [  760.800140] Oops: 0000 [#1] PREEMPT SMP PTI
> [  760.801383] CPU: 2 PID: 3401 Comm: nfsd Tainted: G        W   E      6.2.0-rc3+ #11
> [  760.803120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
> [  760.805018] RIP: 0010:filp_close+0x23/0x70
> [  760.806099] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 48 8b 47 38 48 85 c0 0f 84 41 e1 6d 00 48 8b 47 28 48 89 fb 48 89 f5 45 31 e4 <48> 8b 40 78 48 85 c0 74 08 e8 6f 70 72 00 41 89 c4 f6 43 45 40 75
> [  760.809737] RSP: 0018:ffffa36802e4bc78 EFLAGS: 00010246
> [  760.811084] RAX: 0000000000000000 RBX: ffff9578c7d4d600 RCX: 0000000000000000
> [  760.812540] RDX: 000000000000098d RSI: 0000000000000000 RDI: ffff9578c7d4d600
> [  760.814433] RBP: 0000000000000000 R08: 0000011335048e60 R09: ffff9578f82f1540
> [  760.816089] R10: ffffa36802e4bcd0 R11: ffffa36802e4bcd8 R12: 0000000000000000
> [  760.817529] R13: 0000000000000001 R14: dead000000000100 R15: ffff9578f82f1558
> [  760.818982] FS:  0000000000000000(0000) GS:ffff957a37c80000(0000) knlGS:0000000000000000
> [  760.820544] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  760.821734] CR2: 0000000000000078 CR3: 00000001565ce002 CR4: 0000000000060ee0
> [  760.823141] Call Trace:
> [  760.823808]  <TASK>
> [  760.824419]  nfsd_file_free+0xe9/0x210 [nfsd]
> [  760.825610]  release_all_access+0x96/0x120 [nfsd]
> [  760.826680]  nfs4_free_ol_stateid+0x22/0x60 [nfsd]
> [  760.827747]  free_ol_stateid_reaplist+0x61/0x90 [nfsd]
> [  760.828858]  release_openowner+0x258/0x2a0 [nfsd]
> [  760.829792]  __destroy_client+0x183/0x290 [nfsd]
> [  760.830694]  nfsd4_setclientid_confirm+0x1a3/0x4f0 [nfsd]
> [  760.831763]  nfsd4_proc_compound+0x3ae/0x6f0 [nfsd]
> [  760.832717]  nfsd_dispatch+0x16a/0x270 [nfsd]
> [  760.833576]  svc_process_common+0x2eb/0x660 [sunrpc]
> [  760.834587]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> [  760.835576]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> [  760.836462]  svc_process+0xad/0x100 [sunrpc]
> [  760.837317]  nfsd+0xd5/0x190 [nfsd]
> [  760.838133]  kthread+0xe9/0x110
> [  760.838862]  ? __pfx_kthread+0x10/0x10
> [  760.839755]  ret_from_fork+0x2c/0x50
> [  760.840534]  </TASK>
> [  760.841167] Modules linked in: nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ext4(E) crc16(E) cirrus(E) kvm_intel(E) 9p(E) mbcache(E) joydev(E) virtio_net(E) drm_shmem_helper(E) net_failover(E) kvm(E) jbd2(E) netfs(E) psmouse(E) evdev(E) pcspkr(E) failover(E) irqbypass(E) virtio_balloon(E) drm_kms_helper(E) 9pnet_virtio(E) button(E) drm(E) configfs(E) zram(E) zsmalloc(E) crct10dif_pclmul(E) crc32_pclmul(E) nvme(E) ghash_clmulni_intel(E) virtio_blk(E) sha512_ssse3(E) sha512_generic(E) nvme_core(E) t10_pi(E) virtio_pci(E) virtio(E) crc64_rocksoft_generic(E) aesni_intel(E) crypto_simd(E) crc64_rocksoft(E) virtio_pci_legacy_dev(E) i6300esb(E) cryptd(E) serio_raw(E) crc64(E) virtio_pci_modern_dev(E) virtio_ring(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) crc32c_intel(E) autofs4(E)
> [  760.853527] CR2: 0000000000000078
> [  760.854340] ---[ end trace 0000000000000000 ]---
> [  760.855261] RIP: 0010:filp_close+0x23/0x70
> [  760.856185] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 48 8b 47 38 48 85 c0 0f 84 41 e1 6d 00 48 8b 47 28 48 89 fb 48 89 f5 45 31 e4 <48> 8b 40 78 48 85 c0 74 08 e8 6f 70 72 00 41 89 c4 f6 43 45 40 75
> [  760.859350] RSP: 0018:ffffa36802e4bc78 EFLAGS: 00010246
> [  760.860356] RAX: 0000000000000000 RBX: ffff9578c7d4d600 RCX: 0000000000000000
> [  760.861628] RDX: 000000000000098d RSI: 0000000000000000 RDI: ffff9578c7d4d600
> [  760.862898] RBP: 0000000000000000 R08: 0000011335048e60 R09: ffff9578f82f1540
> [  760.864172] R10: ffffa36802e4bcd0 R11: ffffa36802e4bcd8 R12: 0000000000000000
> [  760.865438] R13: 0000000000000001 R14: dead000000000100 R15: ffff9578f82f1558
> [  760.866692] FS:  0000000000000000(0000) GS:ffff957a37c80000(0000) knlGS:0000000000000000
> [  760.868053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  760.869102] CR2: 0000000000000078 CR3: 00000001565ce002 CR4: 0000000000060ee0
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 66449c41b19c..b9100707d347 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -908,9 +908,24 @@ __nfsd_file_cache_purge(struct net *net)
> 		nf = rhashtable_walk_next(&iter);
> 		while (!IS_ERR_OR_NULL(nf)) {
> 			if (!net || nf->nf_net == net) {
> -				nfsd_file_unhash(nf);
> -				nfsd_file_lru_remove(nf);
> -				list_add(&nf->nf_lru, &dispose);
> +				/* Ignore it if it's no longer hashed */
> +				if (!nfsd_file_unhash(nf))
> +					continue;
> +
> +				/*
> +				 * Try to claim the LRU ref (if any). If it's
> +				 * not on the LRU, then try to take a ref. If that
> +				 * fails, then we'll ignore it.
> +				 */
> +				if (!nfsd_file_lru_remove(nf))
> +					nf = nfsd_file_get(nf);
> +
> +				/*
> +				 * Now try to put the ref we hold. If that works
> +				 * and it goes to zero, queue it to be freed.
> +				 */
> +				if (nf && refcount_dec_and_test(&nf->nf_ref))
> +					list_add(&nf->nf_lru, &dispose);
> 			}
> 			nf = rhashtable_walk_next(&iter);
> 		}
> -- 
> 2.39.0
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge
  2023-01-19 22:59 ` Chuck Lever III
@ 2023-01-19 23:29   ` Jeff Layton
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Layton @ 2023-01-19 23:29 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Ruben Vestergaard, Torkil Svensgaard, Shachar Kagan,
	Linux NFS Mailing List, linux-kernel

On Thu, 2023-01-19 at 22:59 +0000, Chuck Lever III wrote:
> 
> > On Jan 19, 2023, at 2:20 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > nfsd_file_cache_purge is called when the server is shutting down, in
> > which case, tearing things down is generally fine, but it also gets
> > called when the exports cache is flushed.
> > 
> > Instead of walking the cache and freeing everything unconditionally,
> > attempt to unhash each entry and get a reference to it, and then put
> > that reference. Only queue it to the dispose list if the refcount goes
> > to 0.
> > 
> > Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
> > Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
> > Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
> > Reported-by: Shachar Kagan <skagan@nvidia.com>
> > Signed-off-by: Jeffrey Layton <jlayton@kernel.org>
> 
> FYI, checkpatch complained that the Signed-off-by does not
> exactly match the From: address. I haven't see "Jeffrey"
> before, but the DKIM check passed, so....
> 
> .... I've applied this to nfsd-fixes (formerly for-rc).
> 

Whoops! I originally did this in a different tree and had to fix up the
Sob line. Let's change it to "Jeff" if it makes checkpatch happy. Thanks
for pulling it in quickly!

> 
> > ---
> > fs/nfsd/filecache.c | 21 ++++++++++++++++++---
> > 1 file changed, 18 insertions(+), 3 deletions(-)
> > 
> > I was able to reproduce this today by running pynfs against the server
> > while running "exportfs -ra" in a loop. This one is a bit different in
> > that it happened in the open codepath, but that's probably just because
> > delegations are more likely to be long-lived. With this patch, the
> > server survives the run:
> > 
> > [  337.962027] ------------[ cut here ]------------
> > [  337.963823] refcount_t: underflow; use-after-free.
> > [  337.965502] WARNING: CPU: 6 PID: 3401 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
> > [  337.967999] Modules linked in: nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ext4(E) crc16(E) cirrus(E) kvm_intel(E) 9p(E) mbcache(E) joydev(E) virtio_net(E) drm_shmem_helper(E) net_failover(E) kvm(E) jbd2(E) netfs(E) psmouse(E) evdev(E) pcspkr(E) failover(E) irqbypass(E) virtio_balloon(E) drm_kms_helper(E) 9pnet_virtio(E) button(E) drm(E) configfs(E) zram(E) zsmalloc(E) crct10dif_pclmul(E) crc32_pclmul(E) nvme(E) ghash_clmulni_intel(E) virtio_blk(E) sha512_ssse3(E) sha512_generic(E) nvme_core(E) t10_pi(E) virtio_pci(E) virtio(E) crc64_rocksoft_generic(E) aesni_intel(E) crypto_simd(E) crc64_rocksoft(E) virtio_pci_legacy_dev(E) i6300esb(E) cryptd(E) serio_raw(E) crc64(E) virtio_pci_modern_dev(E) virtio_ring(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) crc32c_intel(E) autofs4(E)
> > [  337.992040] CPU: 6 PID: 3401 Comm: nfsd Tainted: G            E      6.2.0-rc3+ #11
> > [  337.994701] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
> > [  337.998046] RIP: 0010:refcount_warn_saturate+0xba/0x110
> > [  337.999852] Code: 01 01 e8 83 e5 4f 00 0f 0b c3 cc cc cc cc 80 3d 60 f4 05 01 00 75 85 48 c7 c7 30 b5 e1 9d c6 05 50 f4 05 01 01 e8 60 e5 4f 00 <0f> 0b c3 cc cc cc cc 80 3d 3b f4 05 01 00 0f 85 5e ff ff ff 48 c7
> > [  338.005245] RSP: 0018:ffffa36802e4bd50 EFLAGS: 00010282
> > [  338.006621] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000000
> > [  338.008273] RDX: 0000000000000001 RSI: ffffffff9de03ef5 RDI: 00000000ffffffff
> > [  338.009804] RBP: 0000000000000003 R08: 0000000000000000 R09: ffffa36802e4bc00
> > [  338.011719] R10: 0000000000000003 R11: ffffffff9e0bfdc8 R12: ffff9578da461b80
> > [  338.013533] R13: 0000000000000001 R14: ffff9578da422280 R15: ffff9578da461b80
> > [  338.015238] FS:  0000000000000000(0000) GS:ffff957a37d00000(0000) knlGS:0000000000000000
> > [  338.017179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  338.018680] CR2: 00007f324c1e1c08 CR3: 000000020360a004 CR4: 0000000000060ee0
> > [  338.020377] Call Trace:
> > [  338.021190]  <TASK>
> > [  338.021956]  release_all_access+0x96/0x120 [nfsd]
> > [  338.023192]  ? _raw_spin_unlock+0x15/0x30
> > [  338.024192]  nfsd4_close+0x275/0x3d0 [nfsd]
> > [  338.025468]  ? fh_verify+0x45e/0x780 [nfsd]
> > [  338.027535]  ? __pfx_nfsd4_encode_noop+0x10/0x10 [nfsd]
> > [  338.028775]  ? nfsd4_encode_operation+0xae/0x280 [nfsd]
> > [  338.030593]  nfsd4_proc_compound+0x3ae/0x6f0 [nfsd]
> > [  338.032341]  nfsd_dispatch+0x16a/0x270 [nfsd]
> > [  338.034667]  svc_process_common+0x2eb/0x660 [sunrpc]
> > [  338.036614]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > [  338.038827]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > [  338.040267]  svc_process+0xad/0x100 [sunrpc]
> > [  338.041981]  nfsd+0xd5/0x190 [nfsd]
> > [  338.043362]  kthread+0xe9/0x110
> > [  338.044680]  ? __pfx_kthread+0x10/0x10
> > [  338.046376]  ret_from_fork+0x2c/0x50
> > [  338.047892]  </TASK>
> > [  338.049067] ---[ end trace 0000000000000000 ]---
> > [  760.792789] BUG: kernel NULL pointer dereference, address: 0000000000000078
> > [  760.795933] #PF: supervisor read access in kernel mode
> > [  760.797477] #PF: error_code(0x0000) - not-present page
> > [  760.799120] PGD 0 P4D 0
> > [  760.800140] Oops: 0000 [#1] PREEMPT SMP PTI
> > [  760.801383] CPU: 2 PID: 3401 Comm: nfsd Tainted: G        W   E      6.2.0-rc3+ #11
> > [  760.803120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
> > [  760.805018] RIP: 0010:filp_close+0x23/0x70
> > [  760.806099] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 48 8b 47 38 48 85 c0 0f 84 41 e1 6d 00 48 8b 47 28 48 89 fb 48 89 f5 45 31 e4 <48> 8b 40 78 48 85 c0 74 08 e8 6f 70 72 00 41 89 c4 f6 43 45 40 75
> > [  760.809737] RSP: 0018:ffffa36802e4bc78 EFLAGS: 00010246
> > [  760.811084] RAX: 0000000000000000 RBX: ffff9578c7d4d600 RCX: 0000000000000000
> > [  760.812540] RDX: 000000000000098d RSI: 0000000000000000 RDI: ffff9578c7d4d600
> > [  760.814433] RBP: 0000000000000000 R08: 0000011335048e60 R09: ffff9578f82f1540
> > [  760.816089] R10: ffffa36802e4bcd0 R11: ffffa36802e4bcd8 R12: 0000000000000000
> > [  760.817529] R13: 0000000000000001 R14: dead000000000100 R15: ffff9578f82f1558
> > [  760.818982] FS:  0000000000000000(0000) GS:ffff957a37c80000(0000) knlGS:0000000000000000
> > [  760.820544] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  760.821734] CR2: 0000000000000078 CR3: 00000001565ce002 CR4: 0000000000060ee0
> > [  760.823141] Call Trace:
> > [  760.823808]  <TASK>
> > [  760.824419]  nfsd_file_free+0xe9/0x210 [nfsd]
> > [  760.825610]  release_all_access+0x96/0x120 [nfsd]
> > [  760.826680]  nfs4_free_ol_stateid+0x22/0x60 [nfsd]
> > [  760.827747]  free_ol_stateid_reaplist+0x61/0x90 [nfsd]
> > [  760.828858]  release_openowner+0x258/0x2a0 [nfsd]
> > [  760.829792]  __destroy_client+0x183/0x290 [nfsd]
> > [  760.830694]  nfsd4_setclientid_confirm+0x1a3/0x4f0 [nfsd]
> > [  760.831763]  nfsd4_proc_compound+0x3ae/0x6f0 [nfsd]
> > [  760.832717]  nfsd_dispatch+0x16a/0x270 [nfsd]
> > [  760.833576]  svc_process_common+0x2eb/0x660 [sunrpc]
> > [  760.834587]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > [  760.835576]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > [  760.836462]  svc_process+0xad/0x100 [sunrpc]
> > [  760.837317]  nfsd+0xd5/0x190 [nfsd]
> > [  760.838133]  kthread+0xe9/0x110
> > [  760.838862]  ? __pfx_kthread+0x10/0x10
> > [  760.839755]  ret_from_fork+0x2c/0x50
> > [  760.840534]  </TASK>
> > [  760.841167] Modules linked in: nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ext4(E) crc16(E) cirrus(E) kvm_intel(E) 9p(E) mbcache(E) joydev(E) virtio_net(E) drm_shmem_helper(E) net_failover(E) kvm(E) jbd2(E) netfs(E) psmouse(E) evdev(E) pcspkr(E) failover(E) irqbypass(E) virtio_balloon(E) drm_kms_helper(E) 9pnet_virtio(E) button(E) drm(E) configfs(E) zram(E) zsmalloc(E) crct10dif_pclmul(E) crc32_pclmul(E) nvme(E) ghash_clmulni_intel(E) virtio_blk(E) sha512_ssse3(E) sha512_generic(E) nvme_core(E) t10_pi(E) virtio_pci(E) virtio(E) crc64_rocksoft_generic(E) aesni_intel(E) crypto_simd(E) crc64_rocksoft(E) virtio_pci_legacy_dev(E) i6300esb(E) cryptd(E) serio_raw(E) crc64(E) virtio_pci_modern_dev(E) virtio_ring(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) crc32c_intel(E) autofs4(E)
> > [  760.853527] CR2: 0000000000000078
> > [  760.854340] ---[ end trace 0000000000000000 ]---
> > [  760.855261] RIP: 0010:filp_close+0x23/0x70
> > [  760.856185] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 48 8b 47 38 48 85 c0 0f 84 41 e1 6d 00 48 8b 47 28 48 89 fb 48 89 f5 45 31 e4 <48> 8b 40 78 48 85 c0 74 08 e8 6f 70 72 00 41 89 c4 f6 43 45 40 75
> > [  760.859350] RSP: 0018:ffffa36802e4bc78 EFLAGS: 00010246
> > [  760.860356] RAX: 0000000000000000 RBX: ffff9578c7d4d600 RCX: 0000000000000000
> > [  760.861628] RDX: 000000000000098d RSI: 0000000000000000 RDI: ffff9578c7d4d600
> > [  760.862898] RBP: 0000000000000000 R08: 0000011335048e60 R09: ffff9578f82f1540
> > [  760.864172] R10: ffffa36802e4bcd0 R11: ffffa36802e4bcd8 R12: 0000000000000000
> > [  760.865438] R13: 0000000000000001 R14: dead000000000100 R15: ffff9578f82f1558
> > [  760.866692] FS:  0000000000000000(0000) GS:ffff957a37c80000(0000) knlGS:0000000000000000
> > [  760.868053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  760.869102] CR2: 0000000000000078 CR3: 00000001565ce002 CR4: 0000000000060ee0
> > 
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index 66449c41b19c..b9100707d347 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -908,9 +908,24 @@ __nfsd_file_cache_purge(struct net *net)
> > 		nf = rhashtable_walk_next(&iter);
> > 		while (!IS_ERR_OR_NULL(nf)) {
> > 			if (!net || nf->nf_net == net) {
> > -				nfsd_file_unhash(nf);
> > -				nfsd_file_lru_remove(nf);
> > -				list_add(&nf->nf_lru, &dispose);
> > +				/* Ignore it if it's no longer hashed */
> > +				if (!nfsd_file_unhash(nf))
> > +					continue;
> > +
> > +				/*
> > +				 * Try to claim the LRU ref (if any). If it's
> > +				 * not on the LRU, then try to take a ref. If that
> > +				 * fails, then we'll ignore it.
> > +				 */
> > +				if (!nfsd_file_lru_remove(nf))
> > +					nf = nfsd_file_get(nf);
> > +
> > +				/*
> > +				 * Now try to put the ref we hold. If that works
> > +				 * and it goes to zero, queue it to be freed.
> > +				 */
> > +				if (nf && refcount_dec_and_test(&nf->nf_ref))
> > +					list_add(&nf->nf_lru, &dispose);
> > 			}
> > 			nf = rhashtable_walk_next(&iter);
> > 		}
> > -- 
> > 2.39.0
> > 
> 
> --
> Chuck Lever
> 
> 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge
  2023-01-19 19:20 [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge Jeff Layton
  2023-01-19 22:59 ` Chuck Lever III
@ 2023-01-20 18:36 ` Jeff Layton
  1 sibling, 0 replies; 12+ messages in thread
From: Jeff Layton @ 2023-01-20 18:36 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Ruben Vestergaard, Torkil Svensgaard, Shachar Kagan, linux-nfs,
	linux-kernel

On Thu, 2023-01-19 at 14:20 -0500, Jeff Layton wrote:
> nfsd_file_cache_purge is called when the server is shutting down, in
> which case, tearing things down is generally fine, but it also gets
> called when the exports cache is flushed.
> 
> Instead of walking the cache and freeing everything unconditionally,
> attempt to unhash each entry and get a reference to it, and then put
> that reference. Only queue it to the dispose list if the refcount goes
> to 0.
> 
> Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
> Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
> Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
> Reported-by: Shachar Kagan <skagan@nvidia.com>
> Signed-off-by: Jeffrey Layton <jlayton@kernel.org>
> ---
>  fs/nfsd/filecache.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> I was able to reproduce this today by running pynfs against the server
> while running "exportfs -ra" in a loop. This one is a bit different in
> that it happened in the open codepath, but that's probably just because
> delegations are more likely to be long-lived. With this patch, the
> server survives the run:
> 
> [  337.962027] ------------[ cut here ]------------
> [  337.963823] refcount_t: underflow; use-after-free.
> [  337.965502] WARNING: CPU: 6 PID: 3401 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
> [  337.967999] Modules linked in: nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ext4(E) crc16(E) cirrus(E) kvm_intel(E) 9p(E) mbcache(E) joydev(E) virtio_net(E) drm_shmem_helper(E) net_failover(E) kvm(E) jbd2(E) netfs(E) psmouse(E) evdev(E) pcspkr(E) failover(E) irqbypass(E) virtio_balloon(E) drm_kms_helper(E) 9pnet_virtio(E) button(E) drm(E) configfs(E) zram(E) zsmalloc(E) crct10dif_pclmul(E) crc32_pclmul(E) nvme(E) ghash_clmulni_intel(E) virtio_blk(E) sha512_ssse3(E) sha512_generic(E) nvme_core(E) t10_pi(E) virtio_pci(E) virtio(E) crc64_rocksoft_generic(E) aesni_intel(E) crypto_simd(E) crc64_rocksoft(E) virtio_pci_legacy_dev(E) i6300esb(E) cryptd(E) serio_raw(E) crc64(E) virtio_pci_modern_dev(E) virtio_ring(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) crc32c_intel(E) autofs4(E)
> [  337.992040] CPU: 6 PID: 3401 Comm: nfsd Tainted: G            E      6.2.0-rc3+ #11
> [  337.994701] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
> [  337.998046] RIP: 0010:refcount_warn_saturate+0xba/0x110
> [  337.999852] Code: 01 01 e8 83 e5 4f 00 0f 0b c3 cc cc cc cc 80 3d 60 f4 05 01 00 75 85 48 c7 c7 30 b5 e1 9d c6 05 50 f4 05 01 01 e8 60 e5 4f 00 <0f> 0b c3 cc cc cc cc 80 3d 3b f4 05 01 00 0f 85 5e ff ff ff 48 c7
> [  338.005245] RSP: 0018:ffffa36802e4bd50 EFLAGS: 00010282
> [  338.006621] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000000
> [  338.008273] RDX: 0000000000000001 RSI: ffffffff9de03ef5 RDI: 00000000ffffffff
> [  338.009804] RBP: 0000000000000003 R08: 0000000000000000 R09: ffffa36802e4bc00
> [  338.011719] R10: 0000000000000003 R11: ffffffff9e0bfdc8 R12: ffff9578da461b80
> [  338.013533] R13: 0000000000000001 R14: ffff9578da422280 R15: ffff9578da461b80
> [  338.015238] FS:  0000000000000000(0000) GS:ffff957a37d00000(0000) knlGS:0000000000000000
> [  338.017179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  338.018680] CR2: 00007f324c1e1c08 CR3: 000000020360a004 CR4: 0000000000060ee0
> [  338.020377] Call Trace:
> [  338.021190]  <TASK>
> [  338.021956]  release_all_access+0x96/0x120 [nfsd]
> [  338.023192]  ? _raw_spin_unlock+0x15/0x30
> [  338.024192]  nfsd4_close+0x275/0x3d0 [nfsd]
> [  338.025468]  ? fh_verify+0x45e/0x780 [nfsd]
> [  338.027535]  ? __pfx_nfsd4_encode_noop+0x10/0x10 [nfsd]
> [  338.028775]  ? nfsd4_encode_operation+0xae/0x280 [nfsd]
> [  338.030593]  nfsd4_proc_compound+0x3ae/0x6f0 [nfsd]
> [  338.032341]  nfsd_dispatch+0x16a/0x270 [nfsd]
> [  338.034667]  svc_process_common+0x2eb/0x660 [sunrpc]
> [  338.036614]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> [  338.038827]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> [  338.040267]  svc_process+0xad/0x100 [sunrpc]
> [  338.041981]  nfsd+0xd5/0x190 [nfsd]
> [  338.043362]  kthread+0xe9/0x110
> [  338.044680]  ? __pfx_kthread+0x10/0x10
> [  338.046376]  ret_from_fork+0x2c/0x50
> [  338.047892]  </TASK>
> [  338.049067] ---[ end trace 0000000000000000 ]---
> [  760.792789] BUG: kernel NULL pointer dereference, address: 0000000000000078
> [  760.795933] #PF: supervisor read access in kernel mode
> [  760.797477] #PF: error_code(0x0000) - not-present page
> [  760.799120] PGD 0 P4D 0
> [  760.800140] Oops: 0000 [#1] PREEMPT SMP PTI
> [  760.801383] CPU: 2 PID: 3401 Comm: nfsd Tainted: G        W   E      6.2.0-rc3+ #11
> [  760.803120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
> [  760.805018] RIP: 0010:filp_close+0x23/0x70
> [  760.806099] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 48 8b 47 38 48 85 c0 0f 84 41 e1 6d 00 48 8b 47 28 48 89 fb 48 89 f5 45 31 e4 <48> 8b 40 78 48 85 c0 74 08 e8 6f 70 72 00 41 89 c4 f6 43 45 40 75
> [  760.809737] RSP: 0018:ffffa36802e4bc78 EFLAGS: 00010246
> [  760.811084] RAX: 0000000000000000 RBX: ffff9578c7d4d600 RCX: 0000000000000000
> [  760.812540] RDX: 000000000000098d RSI: 0000000000000000 RDI: ffff9578c7d4d600
> [  760.814433] RBP: 0000000000000000 R08: 0000011335048e60 R09: ffff9578f82f1540
> [  760.816089] R10: ffffa36802e4bcd0 R11: ffffa36802e4bcd8 R12: 0000000000000000
> [  760.817529] R13: 0000000000000001 R14: dead000000000100 R15: ffff9578f82f1558
> [  760.818982] FS:  0000000000000000(0000) GS:ffff957a37c80000(0000) knlGS:0000000000000000
> [  760.820544] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  760.821734] CR2: 0000000000000078 CR3: 00000001565ce002 CR4: 0000000000060ee0
> [  760.823141] Call Trace:
> [  760.823808]  <TASK>
> [  760.824419]  nfsd_file_free+0xe9/0x210 [nfsd]
> [  760.825610]  release_all_access+0x96/0x120 [nfsd]
> [  760.826680]  nfs4_free_ol_stateid+0x22/0x60 [nfsd]
> [  760.827747]  free_ol_stateid_reaplist+0x61/0x90 [nfsd]
> [  760.828858]  release_openowner+0x258/0x2a0 [nfsd]
> [  760.829792]  __destroy_client+0x183/0x290 [nfsd]
> [  760.830694]  nfsd4_setclientid_confirm+0x1a3/0x4f0 [nfsd]
> [  760.831763]  nfsd4_proc_compound+0x3ae/0x6f0 [nfsd]
> [  760.832717]  nfsd_dispatch+0x16a/0x270 [nfsd]
> [  760.833576]  svc_process_common+0x2eb/0x660 [sunrpc]
> [  760.834587]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> [  760.835576]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> [  760.836462]  svc_process+0xad/0x100 [sunrpc]
> [  760.837317]  nfsd+0xd5/0x190 [nfsd]
> [  760.838133]  kthread+0xe9/0x110
> [  760.838862]  ? __pfx_kthread+0x10/0x10
> [  760.839755]  ret_from_fork+0x2c/0x50
> [  760.840534]  </TASK>
> [  760.841167] Modules linked in: nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ext4(E) crc16(E) cirrus(E) kvm_intel(E) 9p(E) mbcache(E) joydev(E) virtio_net(E) drm_shmem_helper(E) net_failover(E) kvm(E) jbd2(E) netfs(E) psmouse(E) evdev(E) pcspkr(E) failover(E) irqbypass(E) virtio_balloon(E) drm_kms_helper(E) 9pnet_virtio(E) button(E) drm(E) configfs(E) zram(E) zsmalloc(E) crct10dif_pclmul(E) crc32_pclmul(E) nvme(E) ghash_clmulni_intel(E) virtio_blk(E) sha512_ssse3(E) sha512_generic(E) nvme_core(E) t10_pi(E) virtio_pci(E) virtio(E) crc64_rocksoft_generic(E) aesni_intel(E) crypto_simd(E) crc64_rocksoft(E) virtio_pci_legacy_dev(E) i6300esb(E) cryptd(E) serio_raw(E) crc64(E) virtio_pci_modern_dev(E) virtio_ring(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) crc32c_intel(E) autofs4(E)
> [  760.853527] CR2: 0000000000000078
> [  760.854340] ---[ end trace 0000000000000000 ]---
> [  760.855261] RIP: 0010:filp_close+0x23/0x70
> [  760.856185] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 48 8b 47 38 48 85 c0 0f 84 41 e1 6d 00 48 8b 47 28 48 89 fb 48 89 f5 45 31 e4 <48> 8b 40 78 48 85 c0 74 08 e8 6f 70 72 00 41 89 c4 f6 43 45 40 75
> [  760.859350] RSP: 0018:ffffa36802e4bc78 EFLAGS: 00010246
> [  760.860356] RAX: 0000000000000000 RBX: ffff9578c7d4d600 RCX: 0000000000000000
> [  760.861628] RDX: 000000000000098d RSI: 0000000000000000 RDI: ffff9578c7d4d600
> [  760.862898] RBP: 0000000000000000 R08: 0000011335048e60 R09: ffff9578f82f1540
> [  760.864172] R10: ffffa36802e4bcd0 R11: ffffa36802e4bcd8 R12: 0000000000000000
> [  760.865438] R13: 0000000000000001 R14: dead000000000100 R15: ffff9578f82f1558
> [  760.866692] FS:  0000000000000000(0000) GS:ffff957a37c80000(0000) knlGS:0000000000000000
> [  760.868053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  760.869102] CR2: 0000000000000078 CR3: 00000001565ce002 CR4: 0000000000060ee0
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 66449c41b19c..b9100707d347 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -908,9 +908,24 @@ __nfsd_file_cache_purge(struct net *net)
>  		nf = rhashtable_walk_next(&iter);
>  		while (!IS_ERR_OR_NULL(nf)) {
>  			if (!net || nf->nf_net == net) {
> -				nfsd_file_unhash(nf);
> -				nfsd_file_lru_remove(nf);
> -				list_add(&nf->nf_lru, &dispose);
> +				/* Ignore it if it's no longer hashed */
> +				if (!nfsd_file_unhash(nf))
> +					continue;
> +
> +				/*
> +				 * Try to claim the LRU ref (if any). If it's
> +				 * not on the LRU, then try to take a ref. If that
> +				 * fails, then we'll ignore it.
> +				 */
> +				if (!nfsd_file_lru_remove(nf))
> +					nf = nfsd_file_get(nf);
> +

Now that I've looked over this again today, I think this patch is not
quite right. We don't want to try to dequeue it from the lru until we
hold a reference, or we could corrupt the list_head for private users.

I think we want to do the same thing we do when we get a notifier pop in
this case. I'm working on a v2 now and will send it once I've tested it.

Cheers,

> +				/*
> +				 * Now try to put the ref we hold. If that works
> +				 * and it goes to zero, queue it to be freed.
> +				 */
> +				if (nf && refcount_dec_and_test(&nf->nf_ref))
> +					list_add(&nf->nf_lru, &dispose);
>  			}
>  			nf = rhashtable_walk_next(&iter);
>  		}

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge
  2023-01-22 23:40           ` Chuck Lever III
@ 2023-01-23 10:57             ` Shachar Kagan
  0 siblings, 0 replies; 12+ messages in thread
From: Shachar Kagan @ 2023-01-23 10:57 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Jeff Layton, Ruben Vestergaard, Torkil Svensgaard,
	Linux NFS Mailing List, linux-kernel


>> On Jan 22, 2023, at 9:58 AM, Shachar Kagan <skagan@nvidia.com> wrote:
>>
>>> On Sat, 2023-01-21 at 17:04 +0000, Chuck Lever III wrote:
>>>>
>>>>> On Jan 20, 2023, at 3:54 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>
>>>>> On Fri, 2023-01-20 at 20:21 +0000, Chuck Lever III wrote:
>>>>>>
>>>>>>> On Jan 20, 2023, at 2:52 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>>>
>>>>>>> nfsd_file_cache_purge is called when the server is shutting down, 
>>>>>>> in which case, tearing things down is generally fine, but it also 
>>>>>>> gets called when the exports cache is flushed.
>>>>>>
>>>>>> Yeah... cache flush is probably the case we've been missing.
>>>>>>
>>>>>>
>>>>>>> Instead of walking the cache and freeing everything 
>>>>>>> unconditionally, handle it the same as when we have a notification of conflicting access.
>>>>>>>
>>>>>>> Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
>>>>>>> Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
>>>>>>> Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
>>>>>>> Reported-by: Shachar Kagan <skagan@nvidia.com>
>>>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>
>> I ran our tests with the commit, the issue didn't occur.
>> Thanks for the quick response.
>
> May I add Tested-by: Shachar Kagan <skagan@nvidia.com> ?
>

Yes

> --
> Chuck Lever

--
Shachar Kagan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge
  2023-01-22 14:58         ` Shachar Kagan
@ 2023-01-22 23:40           ` Chuck Lever III
  2023-01-23 10:57             ` Shachar Kagan
  0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2023-01-22 23:40 UTC (permalink / raw)
  To: Shachar Kagan
  Cc: Jeff Layton, Ruben Vestergaard, Torkil Svensgaard,
	Linux NFS Mailing List, linux-kernel



> On Jan 22, 2023, at 9:58 AM, Shachar Kagan <skagan@nvidia.com> wrote:
> 
> 
> On Sun, 2023-01-22 at 16:58 +0000, Shachar Kagan wrote:
> 
>> On Sat, 2023-01-21 at 17:04 +0000, Chuck Lever III wrote:
>>> 
>>>> On Jan 20, 2023, at 3:54 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>> 
>>>> On Fri, 2023-01-20 at 20:21 +0000, Chuck Lever III wrote:
>>>>> 
>>>>>> On Jan 20, 2023, at 2:52 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>>> 
>>>>>> nfsd_file_cache_purge is called when the server is shutting 
>>>>>> down, in which case, tearing things down is generally fine, but 
>>>>>> it also gets called when the exports cache is flushed.
>>>>> 
>>>>> Yeah... cache flush is probably the case we've been missing.
>>>>> 
>>>>> 
>>>>>> Instead of walking the cache and freeing everything 
>>>>>> unconditionally, handle it the same as when we have a notification of conflicting access.
>>>>>> 
>>>>>> Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
>>>>>> Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
>>>>>> Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
>>>>>> Reported-by: Shachar Kagan <skagan@nvidia.com>
>>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> 
> I ran our tests with the commit, the issue didn't occur.
> Thanks for the quick response.

May I add Tested-by: Shachar Kagan <skagan@nvidia.com> ?

--
Chuck Lever




^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge
  2023-01-21 17:20       ` Jeff Layton
@ 2023-01-22 14:58         ` Shachar Kagan
  2023-01-22 23:40           ` Chuck Lever III
  0 siblings, 1 reply; 12+ messages in thread
From: Shachar Kagan @ 2023-01-22 14:58 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever III
  Cc: Ruben Vestergaard, Torkil Svensgaard, Linux NFS Mailing List,
	linux-kernel


On Sun, 2023-01-22 at 16:58 +0000, Shachar Kagan wrote:

> On Sat, 2023-01-21 at 17:04 +0000, Chuck Lever III wrote:
> >
> > > On Jan 20, 2023, at 3:54 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > On Fri, 2023-01-20 at 20:21 +0000, Chuck Lever III wrote:
> > > >
> > > > > On Jan 20, 2023, at 2:52 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > >
> > > > > nfsd_file_cache_purge is called when the server is shutting 
> > > > > down, in which case, tearing things down is generally fine, but 
> > > > > it also gets called when the exports cache is flushed.
> > > >
> > > > Yeah... cache flush is probably the case we've been missing.
> > > >
> > > >
> > > > > Instead of walking the cache and freeing everything 
> > > > > unconditionally, handle it the same as when we have a notification of conflicting access.
> > > > >
> > > > > Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
> > > > > Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
> > > > > Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
> > > > > Reported-by: Shachar Kagan <skagan@nvidia.com>
> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>

I ran our tests with the commit, the issue didn't occur.
Thanks for the quick response.

> > > > > ---
> > > > > fs/nfsd/filecache.c | 61 
> > > > > +++++++++++++++++++++++++++------------------
> > > > > 1 file changed, 37 insertions(+), 24 deletions(-)
> > > > >
> > > > > v2: use the same method to purge entries from the cache as we do when
> > > > >   there is a notification of conflicting access.
> > > > >
> > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 
> > > > > 58ac93e7e680..397ae212b98d 100644
> > > > > --- a/fs/nfsd/filecache.c
> > > > > +++ b/fs/nfsd/filecache.c
> > > > > @@ -661,6 +661,40 @@ static struct shrinker     nfsd_file_shrinker = {
> > > > >         .seeks = 1,
> > > > > };
> > > > >
> > > > > +/**
> > > > > + * maybe_queue_nfsd_file - try to unhash and queue a nfsd_file 
> > > > > +to be freed
> > > > > + * @nf: nfsd_file to attempt to queue
> > > > > + * @dispose: private list to queue successfully-put objects
> > > > > + *
> > > > > + * Unhash an nfsd_file, try to get a reference to it, and then 
> > > > > +put that
> > > > > + * reference. If it's the last reference, queue it to the dispose list.
> > > > > + *
> > > > > + * The caller MUST hold the rcu_read_lock() !
> > > >
> > > > __nfsd_file_cache_purge() isn't holding rcu_read_lock(), it's 
> > > > holding the nfsd_mutex. Is this comment incorrect, or is it just 
> > > > too specific? Or did I miss something obvious?
> > > >
> > >
> > > It's implicitly taken by rhashtable_walk_start and released by 
> > > rhashtable_walk_stop.
> > >
> > > FWIW, it'd be nice if there were a lockdep_assert_held equivalent 
> > > for the rcu_read_lock() here, but I didn't see one of those.
> >
> > Would
> >
> > static void
> > nfsd_file_cond_queue(struct nfsd_file *nf, struct list_head *dispose)
> >       __must_hold(RCU)
> > {
> >
> > do the trick?
> >
> 
> Seems like a reasonable thing to add.
> 
> >
> > > There is a
> > > rcu_read_lock_held(), but I didn't see a good way to get that to 
> > > compile out when lockdep was disabled.
> > >
> > > > > + */
> > > > > +static void
> > > > > +maybe_queue_nfsd_file(struct nfsd_file *nf, struct list_head 
> > > > > +*dispose)
> > > >
> > > > I prefer the name nfsd_file_try_to_queue() or nfsd_file_try_to_dispose().
> > > > nfsd_file_ should be the prefix where possible. Unless you're 
> > > > redriving, I can fix that.
> > > >
> > > >
> > >
> > > Rename at will.
> >
> > Done and applied to nfsd-fixes, replacing v1 of this patch.
> >
> 
> LGTM. Thanks!
> 
> >
> > > > > +{
> > > > > +       int decrement = 1;
> > > > > +
> > > > > +       /* If we raced with someone else unhashing, ignore it */
> > > > > +       if (!nfsd_file_unhash(nf))
> > > > > +               return;
> > > > > +
> > > > > +       /* If we can't get a reference, ignore it */
> > > > > +       if (!nfsd_file_get(nf))
> > > > > +               return;
> > > > > +
> > > > > +       /* Extra decrement if we remove from the LRU */
> > > > > +       if (nfsd_file_lru_remove(nf))
> > > > > +               ++decrement;
> > > > > +
> > > > > +       /* If refcount goes to 0, then put on the dispose list */
> > > > > +       if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
> > > > > +               list_add(&nf->nf_lru, dispose);
> > > > > +               trace_nfsd_file_closing(nf);
> > > > > +       }
> > > > > +}
> > > > > +
> > > > > /**
> > > > > * nfsd_file_queue_for_close: try to close out any open nfsd_files for an inode
> > > > > * @inode:   inode on which to close out nfsd_files
> > > > > @@ -688,30 +722,12 @@ nfsd_file_queue_for_close(struct inode 
> > > > > *inode, struct list_head *dispose)
> > > > >
> > > > >         rcu_read_lock();
> > > > >         do {
> > > > > -               int decrement = 1;
> > > > > -
> > > > >                 nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key,
> > > > >                                        nfsd_file_rhash_params);
> > > > >                 if (!nf)
> > > > >                         break;
> > > > >
> > > > > -               /* If we raced with someone else unhashing, ignore it */
> > > > > -               if (!nfsd_file_unhash(nf))
> > > > > -                       continue;
> > > > > -
> > > > > -               /* If we can't get a reference, ignore it */
> > > > > -               if (!nfsd_file_get(nf))
> > > > > -                       continue;
> > > > > -
> > > > > -               /* Extra decrement if we remove from the LRU */
> > > > > -               if (nfsd_file_lru_remove(nf))
> > > > > -                       ++decrement;
> > > > > -
> > > > > -               /* If refcount goes to 0, then put on the dispose list */
> > > > > -               if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
> > > > > -                       list_add(&nf->nf_lru, dispose);
> > > > > -                       trace_nfsd_file_closing(nf);
> > > > > -               }
> > > > > +               maybe_queue_nfsd_file(nf, dispose);
> > > > >         } while (1);
> > > > >         rcu_read_unlock();
> > > > > }
> > > > > @@ -928,11 +944,8 @@ __nfsd_file_cache_purge(struct net *net)
> > > > >
> > > > >                 nf = rhashtable_walk_next(&iter);
> > > > >                 while (!IS_ERR_OR_NULL(nf)) {
> > > > > -                       if (!net || nf->nf_net == net) {
> > > > > -                               nfsd_file_unhash(nf);
> > > > > -                               nfsd_file_lru_remove(nf);
> > > > > -                               list_add(&nf->nf_lru, &dispose);
> > > > > -                       }
> > > > > +                       if (!net || nf->nf_net == net)
> > > > > +                               maybe_queue_nfsd_file(nf, 
> > > > > + &dispose);
> > > > >                         nf = rhashtable_walk_next(&iter);
> > > > >                 }
> > > > >
> > > > > --
> > > > > 2.39.0
> > > > >
> > > >
> > > > --
> > > > Chuck Lever
> > > >
> > > >
> > > >
> > >
> > > --
> > > Jeff Layton <jlayton@kernel.org>
> >
> > --
> > Chuck Lever
> >
> >
> >
> 
> --
> Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge
  2023-01-21 17:04     ` Chuck Lever III
@ 2023-01-21 17:20       ` Jeff Layton
  2023-01-22 14:58         ` Shachar Kagan
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2023-01-21 17:20 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Ruben Vestergaard, Torkil Svensgaard, Shachar Kagan,
	Linux NFS Mailing List, linux-kernel

On Sat, 2023-01-21 at 17:04 +0000, Chuck Lever III wrote:
> 
> > On Jan 20, 2023, at 3:54 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Fri, 2023-01-20 at 20:21 +0000, Chuck Lever III wrote:
> > > 
> > > > On Jan 20, 2023, at 2:52 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > 
> > > > nfsd_file_cache_purge is called when the server is shutting down, in
> > > > which case, tearing things down is generally fine, but it also gets
> > > > called when the exports cache is flushed.
> > > 
> > > Yeah... cache flush is probably the case we've been missing.
> > > 
> > > 
> > > > Instead of walking the cache and freeing everything unconditionally,
> > > > handle it the same as when we have a notification of conflicting access.
> > > > 
> > > > Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
> > > > Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
> > > > Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
> > > > Reported-by: Shachar Kagan <skagan@nvidia.com>
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > > fs/nfsd/filecache.c | 61 +++++++++++++++++++++++++++------------------
> > > > 1 file changed, 37 insertions(+), 24 deletions(-)
> > > > 
> > > > v2: use the same method to purge entries from the cache as we do when
> > > >   there is a notification of conflicting access.
> > > > 
> > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > index 58ac93e7e680..397ae212b98d 100644
> > > > --- a/fs/nfsd/filecache.c
> > > > +++ b/fs/nfsd/filecache.c
> > > > @@ -661,6 +661,40 @@ static struct shrinker	nfsd_file_shrinker = {
> > > > 	.seeks = 1,
> > > > };
> > > > 
> > > > +/**
> > > > + * maybe_queue_nfsd_file - try to unhash and queue a nfsd_file to be freed
> > > > + * @nf: nfsd_file to attempt to queue
> > > > + * @dispose: private list to queue successfully-put objects
> > > > + *
> > > > + * Unhash an nfsd_file, try to get a reference to it, and then put that
> > > > + * reference. If it's the last reference, queue it to the dispose list.
> > > > + *
> > > > + * The caller MUST hold the rcu_read_lock() !
> > > 
> > > __nfsd_file_cache_purge() isn't holding rcu_read_lock(), it's
> > > holding the nfsd_mutex. Is this comment incorrect, or is it just
> > > too specific? Or did I miss something obvious?
> > > 
> > 
> > It's implicitly taken by rhashtable_walk_start and released by
> > rhashtable_walk_stop.
> > 
> > FWIW, it'd be nice if there were a lockdep_assert_held equivalent for
> > the rcu_read_lock() here, but I didn't see one of those.
> 
> Would
> 
> static void
> nfsd_file_cond_queue(struct nfsd_file *nf, struct list_head *dispose)   
> 	__must_hold(RCU)
> {
> 
> do the trick?
> 

Seems like a reasonable thing to add.

> 
> > There is a
> > rcu_read_lock_held(), but I didn't see a good way to get that to compile
> > out when lockdep was disabled.
> > 
> > > > + */
> > > > +static void
> > > > +maybe_queue_nfsd_file(struct nfsd_file *nf, struct list_head *dispose)
> > > 
> > > I prefer the name nfsd_file_try_to_queue() or nfsd_file_try_to_dispose().
> > > nfsd_file_ should be the prefix where possible. Unless you're
> > > redriving, I can fix that.
> > > 
> > > 
> > 
> > Rename at will.
> 
> Done and applied to nfsd-fixes, replacing v1 of this patch.
> 

LGTM. Thanks!

> 
> > > > +{
> > > > +	int decrement = 1;
> > > > +
> > > > +	/* If we raced with someone else unhashing, ignore it */
> > > > +	if (!nfsd_file_unhash(nf))
> > > > +		return;
> > > > +
> > > > +	/* If we can't get a reference, ignore it */
> > > > +	if (!nfsd_file_get(nf))
> > > > +		return;
> > > > +
> > > > +	/* Extra decrement if we remove from the LRU */
> > > > +	if (nfsd_file_lru_remove(nf))
> > > > +		++decrement;
> > > > +
> > > > +	/* If refcount goes to 0, then put on the dispose list */
> > > > +	if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
> > > > +		list_add(&nf->nf_lru, dispose);
> > > > +		trace_nfsd_file_closing(nf);
> > > > +	}
> > > > +}
> > > > +
> > > > /**
> > > > * nfsd_file_queue_for_close: try to close out any open nfsd_files for an inode
> > > > * @inode:   inode on which to close out nfsd_files
> > > > @@ -688,30 +722,12 @@ nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose)
> > > > 
> > > > 	rcu_read_lock();
> > > > 	do {
> > > > -		int decrement = 1;
> > > > -
> > > > 		nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key,
> > > > 				       nfsd_file_rhash_params);
> > > > 		if (!nf)
> > > > 			break;
> > > > 
> > > > -		/* If we raced with someone else unhashing, ignore it */
> > > > -		if (!nfsd_file_unhash(nf))
> > > > -			continue;
> > > > -
> > > > -		/* If we can't get a reference, ignore it */
> > > > -		if (!nfsd_file_get(nf))
> > > > -			continue;
> > > > -
> > > > -		/* Extra decrement if we remove from the LRU */
> > > > -		if (nfsd_file_lru_remove(nf))
> > > > -			++decrement;
> > > > -
> > > > -		/* If refcount goes to 0, then put on the dispose list */
> > > > -		if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
> > > > -			list_add(&nf->nf_lru, dispose);
> > > > -			trace_nfsd_file_closing(nf);
> > > > -		}
> > > > +		maybe_queue_nfsd_file(nf, dispose);
> > > > 	} while (1);
> > > > 	rcu_read_unlock();
> > > > }
> > > > @@ -928,11 +944,8 @@ __nfsd_file_cache_purge(struct net *net)
> > > > 
> > > > 		nf = rhashtable_walk_next(&iter);
> > > > 		while (!IS_ERR_OR_NULL(nf)) {
> > > > -			if (!net || nf->nf_net == net) {
> > > > -				nfsd_file_unhash(nf);
> > > > -				nfsd_file_lru_remove(nf);
> > > > -				list_add(&nf->nf_lru, &dispose);
> > > > -			}
> > > > +			if (!net || nf->nf_net == net)
> > > > +				maybe_queue_nfsd_file(nf, &dispose);
> > > > 			nf = rhashtable_walk_next(&iter);
> > > > 		}
> > > > 
> > > > -- 
> > > > 2.39.0
> > > > 
> > > 
> > > --
> > > Chuck Lever
> > > 
> > > 
> > > 
> > 
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> 
> --
> Chuck Lever
> 
> 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge
  2023-01-20 20:54   ` Jeff Layton
@ 2023-01-21 17:04     ` Chuck Lever III
  2023-01-21 17:20       ` Jeff Layton
  0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2023-01-21 17:04 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Ruben Vestergaard, Torkil Svensgaard, Shachar Kagan,
	Linux NFS Mailing List, linux-kernel



> On Jan 20, 2023, at 3:54 PM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Fri, 2023-01-20 at 20:21 +0000, Chuck Lever III wrote:
>> 
>>> On Jan 20, 2023, at 2:52 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>> 
>>> nfsd_file_cache_purge is called when the server is shutting down, in
>>> which case, tearing things down is generally fine, but it also gets
>>> called when the exports cache is flushed.
>> 
>> Yeah... cache flush is probably the case we've been missing.
>> 
>> 
>>> Instead of walking the cache and freeing everything unconditionally,
>>> handle it the same as when we have a notification of conflicting access.
>>> 
>>> Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
>>> Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
>>> Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
>>> Reported-by: Shachar Kagan <skagan@nvidia.com>
>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>> fs/nfsd/filecache.c | 61 +++++++++++++++++++++++++++------------------
>>> 1 file changed, 37 insertions(+), 24 deletions(-)
>>> 
>>> v2: use the same method to purge entries from the cache as we do when
>>>   there is a notification of conflicting access.
>>> 
>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>> index 58ac93e7e680..397ae212b98d 100644
>>> --- a/fs/nfsd/filecache.c
>>> +++ b/fs/nfsd/filecache.c
>>> @@ -661,6 +661,40 @@ static struct shrinker	nfsd_file_shrinker = {
>>> 	.seeks = 1,
>>> };
>>> 
>>> +/**
>>> + * maybe_queue_nfsd_file - try to unhash and queue a nfsd_file to be freed
>>> + * @nf: nfsd_file to attempt to queue
>>> + * @dispose: private list to queue successfully-put objects
>>> + *
>>> + * Unhash an nfsd_file, try to get a reference to it, and then put that
>>> + * reference. If it's the last reference, queue it to the dispose list.
>>> + *
>>> + * The caller MUST hold the rcu_read_lock() !
>> 
>> __nfsd_file_cache_purge() isn't holding rcu_read_lock(), it's
>> holding the nfsd_mutex. Is this comment incorrect, or is it just
>> too specific? Or did I miss something obvious?
>> 
> 
> It's implicitly taken by rhashtable_walk_start and released by
> rhashtable_walk_stop.
> 
> FWIW, it'd be nice if there were a lockdep_assert_held equivalent for
> the rcu_read_lock() here, but I didn't see one of those.

Would

static void
nfsd_file_cond_queue(struct nfsd_file *nf, struct list_head *dispose)   
	__must_hold(RCU)
{

do the trick?


> There is a
> rcu_read_lock_held(), but I didn't see a good way to get that to compile
> out when lockdep was disabled.
> 
>>> + */
>>> +static void
>>> +maybe_queue_nfsd_file(struct nfsd_file *nf, struct list_head *dispose)
>> 
>> I prefer the name nfsd_file_try_to_queue() or nfsd_file_try_to_dispose().
>> nfsd_file_ should be the prefix where possible. Unless you're
>> redriving, I can fix that.
>> 
>> 
> 
> Rename at will.

Done and applied to nfsd-fixes, replacing v1 of this patch.


>>> +{
>>> +	int decrement = 1;
>>> +
>>> +	/* If we raced with someone else unhashing, ignore it */
>>> +	if (!nfsd_file_unhash(nf))
>>> +		return;
>>> +
>>> +	/* If we can't get a reference, ignore it */
>>> +	if (!nfsd_file_get(nf))
>>> +		return;
>>> +
>>> +	/* Extra decrement if we remove from the LRU */
>>> +	if (nfsd_file_lru_remove(nf))
>>> +		++decrement;
>>> +
>>> +	/* If refcount goes to 0, then put on the dispose list */
>>> +	if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
>>> +		list_add(&nf->nf_lru, dispose);
>>> +		trace_nfsd_file_closing(nf);
>>> +	}
>>> +}
>>> +
>>> /**
>>> * nfsd_file_queue_for_close: try to close out any open nfsd_files for an inode
>>> * @inode:   inode on which to close out nfsd_files
>>> @@ -688,30 +722,12 @@ nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose)
>>> 
>>> 	rcu_read_lock();
>>> 	do {
>>> -		int decrement = 1;
>>> -
>>> 		nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key,
>>> 				       nfsd_file_rhash_params);
>>> 		if (!nf)
>>> 			break;
>>> 
>>> -		/* If we raced with someone else unhashing, ignore it */
>>> -		if (!nfsd_file_unhash(nf))
>>> -			continue;
>>> -
>>> -		/* If we can't get a reference, ignore it */
>>> -		if (!nfsd_file_get(nf))
>>> -			continue;
>>> -
>>> -		/* Extra decrement if we remove from the LRU */
>>> -		if (nfsd_file_lru_remove(nf))
>>> -			++decrement;
>>> -
>>> -		/* If refcount goes to 0, then put on the dispose list */
>>> -		if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
>>> -			list_add(&nf->nf_lru, dispose);
>>> -			trace_nfsd_file_closing(nf);
>>> -		}
>>> +		maybe_queue_nfsd_file(nf, dispose);
>>> 	} while (1);
>>> 	rcu_read_unlock();
>>> }
>>> @@ -928,11 +944,8 @@ __nfsd_file_cache_purge(struct net *net)
>>> 
>>> 		nf = rhashtable_walk_next(&iter);
>>> 		while (!IS_ERR_OR_NULL(nf)) {
>>> -			if (!net || nf->nf_net == net) {
>>> -				nfsd_file_unhash(nf);
>>> -				nfsd_file_lru_remove(nf);
>>> -				list_add(&nf->nf_lru, &dispose);
>>> -			}
>>> +			if (!net || nf->nf_net == net)
>>> +				maybe_queue_nfsd_file(nf, &dispose);
>>> 			nf = rhashtable_walk_next(&iter);
>>> 		}
>>> 
>>> -- 
>>> 2.39.0
>>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> 
> -- 
> Jeff Layton <jlayton@kernel.org>

--
Chuck Lever




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge
  2023-01-20 20:21 ` Chuck Lever III
@ 2023-01-20 20:54   ` Jeff Layton
  2023-01-21 17:04     ` Chuck Lever III
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2023-01-20 20:54 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Ruben Vestergaard, Torkil Svensgaard, Shachar Kagan,
	Linux NFS Mailing List, linux-kernel

On Fri, 2023-01-20 at 20:21 +0000, Chuck Lever III wrote:
> 
> > On Jan 20, 2023, at 2:52 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > nfsd_file_cache_purge is called when the server is shutting down, in
> > which case, tearing things down is generally fine, but it also gets
> > called when the exports cache is flushed.
> 
> Yeah... cache flush is probably the case we've been missing.
> 
> 
> > Instead of walking the cache and freeing everything unconditionally,
> > handle it the same as when we have a notification of conflicting access.
> > 
> > Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
> > Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
> > Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
> > Reported-by: Shachar Kagan <skagan@nvidia.com>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/nfsd/filecache.c | 61 +++++++++++++++++++++++++++------------------
> > 1 file changed, 37 insertions(+), 24 deletions(-)
> > 
> > v2: use the same method to purge entries from the cache as we do when
> >    there is a notification of conflicting access.
> > 
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index 58ac93e7e680..397ae212b98d 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -661,6 +661,40 @@ static struct shrinker	nfsd_file_shrinker = {
> > 	.seeks = 1,
> > };
> > 
> > +/**
> > + * maybe_queue_nfsd_file - try to unhash and queue a nfsd_file to be freed
> > + * @nf: nfsd_file to attempt to queue
> > + * @dispose: private list to queue successfully-put objects
> > + *
> > + * Unhash an nfsd_file, try to get a reference to it, and then put that
> > + * reference. If it's the last reference, queue it to the dispose list.
> > + *
> > + * The caller MUST hold the rcu_read_lock() !
> 
> __nfsd_file_cache_purge() isn't holding rcu_read_lock(), it's
> holding the nfsd_mutex. Is this comment incorrect, or is it just
> too specific? Or did I miss something obvious?
> 

It's implicitly taken by rhashtable_walk_start and released by
rhashtable_walk_stop.

FWIW, it'd be nice if there were a lockdep_assert_held equivalent for
the rcu_read_lock() here, but I didn't see one of those. There is a
rcu_read_lock_held(), but I didn't see a good way to get that to compile
out when lockdep was disabled.

> 
> > + */
> > +static void
> > +maybe_queue_nfsd_file(struct nfsd_file *nf, struct list_head *dispose)
> 
> I prefer the name nfsd_file_try_to_queue() or nfsd_file_try_to_dispose().
> nfsd_file_ should be the prefix where possible. Unless you're
> redriving, I can fix that.
> 
> 

Rename at will.

> > +{
> > +	int decrement = 1;
> > +
> > +	/* If we raced with someone else unhashing, ignore it */
> > +	if (!nfsd_file_unhash(nf))
> > +		return;
> > +
> > +	/* If we can't get a reference, ignore it */
> > +	if (!nfsd_file_get(nf))
> > +		return;
> > +
> > +	/* Extra decrement if we remove from the LRU */
> > +	if (nfsd_file_lru_remove(nf))
> > +		++decrement;
> > +
> > +	/* If refcount goes to 0, then put on the dispose list */
> > +	if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
> > +		list_add(&nf->nf_lru, dispose);
> > +		trace_nfsd_file_closing(nf);
> > +	}
> > +}
> > +
> > /**
> >  * nfsd_file_queue_for_close: try to close out any open nfsd_files for an inode
> >  * @inode:   inode on which to close out nfsd_files
> > @@ -688,30 +722,12 @@ nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose)
> > 
> > 	rcu_read_lock();
> > 	do {
> > -		int decrement = 1;
> > -
> > 		nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key,
> > 				       nfsd_file_rhash_params);
> > 		if (!nf)
> > 			break;
> > 
> > -		/* If we raced with someone else unhashing, ignore it */
> > -		if (!nfsd_file_unhash(nf))
> > -			continue;
> > -
> > -		/* If we can't get a reference, ignore it */
> > -		if (!nfsd_file_get(nf))
> > -			continue;
> > -
> > -		/* Extra decrement if we remove from the LRU */
> > -		if (nfsd_file_lru_remove(nf))
> > -			++decrement;
> > -
> > -		/* If refcount goes to 0, then put on the dispose list */
> > -		if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
> > -			list_add(&nf->nf_lru, dispose);
> > -			trace_nfsd_file_closing(nf);
> > -		}
> > +		maybe_queue_nfsd_file(nf, dispose);
> > 	} while (1);
> > 	rcu_read_unlock();
> > }
> > @@ -928,11 +944,8 @@ __nfsd_file_cache_purge(struct net *net)
> > 
> > 		nf = rhashtable_walk_next(&iter);
> > 		while (!IS_ERR_OR_NULL(nf)) {
> > -			if (!net || nf->nf_net == net) {
> > -				nfsd_file_unhash(nf);
> > -				nfsd_file_lru_remove(nf);
> > -				list_add(&nf->nf_lru, &dispose);
> > -			}
> > +			if (!net || nf->nf_net == net)
> > +				maybe_queue_nfsd_file(nf, &dispose);
> > 			nf = rhashtable_walk_next(&iter);
> > 		}
> > 
> > -- 
> > 2.39.0
> > 
> 
> --
> Chuck Lever
> 
> 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge
  2023-01-20 19:52 Jeff Layton
@ 2023-01-20 20:21 ` Chuck Lever III
  2023-01-20 20:54   ` Jeff Layton
  0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2023-01-20 20:21 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Ruben Vestergaard, Torkil Svensgaard, Shachar Kagan,
	Linux NFS Mailing List, linux-kernel



> On Jan 20, 2023, at 2:52 PM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> nfsd_file_cache_purge is called when the server is shutting down, in
> which case, tearing things down is generally fine, but it also gets
> called when the exports cache is flushed.

Yeah... cache flush is probably the case we've been missing.


> Instead of walking the cache and freeing everything unconditionally,
> handle it the same as when we have a notification of conflicting access.
> 
> Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
> Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
> Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
> Reported-by: Shachar Kagan <skagan@nvidia.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/nfsd/filecache.c | 61 +++++++++++++++++++++++++++------------------
> 1 file changed, 37 insertions(+), 24 deletions(-)
> 
> v2: use the same method to purge entries from the cache as we do when
>    there is a notification of conflicting access.
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 58ac93e7e680..397ae212b98d 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -661,6 +661,40 @@ static struct shrinker	nfsd_file_shrinker = {
> 	.seeks = 1,
> };
> 
> +/**
> + * maybe_queue_nfsd_file - try to unhash and queue a nfsd_file to be freed
> + * @nf: nfsd_file to attempt to queue
> + * @dispose: private list to queue successfully-put objects
> + *
> + * Unhash an nfsd_file, try to get a reference to it, and then put that
> + * reference. If it's the last reference, queue it to the dispose list.
> + *
> + * The caller MUST hold the rcu_read_lock() !

__nfsd_file_cache_purge() isn't holding rcu_read_lock(), it's
holding the nfsd_mutex. Is this comment incorrect, or is it just
too specific? Or did I miss something obvious?


> + */
> +static void
> +maybe_queue_nfsd_file(struct nfsd_file *nf, struct list_head *dispose)

I prefer the name nfsd_file_try_to_queue() or nfsd_file_try_to_dispose().
nfsd_file_ should be the prefix where possible. Unless you're
redriving, I can fix that.


> +{
> +	int decrement = 1;
> +
> +	/* If we raced with someone else unhashing, ignore it */
> +	if (!nfsd_file_unhash(nf))
> +		return;
> +
> +	/* If we can't get a reference, ignore it */
> +	if (!nfsd_file_get(nf))
> +		return;
> +
> +	/* Extra decrement if we remove from the LRU */
> +	if (nfsd_file_lru_remove(nf))
> +		++decrement;
> +
> +	/* If refcount goes to 0, then put on the dispose list */
> +	if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
> +		list_add(&nf->nf_lru, dispose);
> +		trace_nfsd_file_closing(nf);
> +	}
> +}
> +
> /**
>  * nfsd_file_queue_for_close: try to close out any open nfsd_files for an inode
>  * @inode:   inode on which to close out nfsd_files
> @@ -688,30 +722,12 @@ nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose)
> 
> 	rcu_read_lock();
> 	do {
> -		int decrement = 1;
> -
> 		nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key,
> 				       nfsd_file_rhash_params);
> 		if (!nf)
> 			break;
> 
> -		/* If we raced with someone else unhashing, ignore it */
> -		if (!nfsd_file_unhash(nf))
> -			continue;
> -
> -		/* If we can't get a reference, ignore it */
> -		if (!nfsd_file_get(nf))
> -			continue;
> -
> -		/* Extra decrement if we remove from the LRU */
> -		if (nfsd_file_lru_remove(nf))
> -			++decrement;
> -
> -		/* If refcount goes to 0, then put on the dispose list */
> -		if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
> -			list_add(&nf->nf_lru, dispose);
> -			trace_nfsd_file_closing(nf);
> -		}
> +		maybe_queue_nfsd_file(nf, dispose);
> 	} while (1);
> 	rcu_read_unlock();
> }
> @@ -928,11 +944,8 @@ __nfsd_file_cache_purge(struct net *net)
> 
> 		nf = rhashtable_walk_next(&iter);
> 		while (!IS_ERR_OR_NULL(nf)) {
> -			if (!net || nf->nf_net == net) {
> -				nfsd_file_unhash(nf);
> -				nfsd_file_lru_remove(nf);
> -				list_add(&nf->nf_lru, &dispose);
> -			}
> +			if (!net || nf->nf_net == net)
> +				maybe_queue_nfsd_file(nf, &dispose);
> 			nf = rhashtable_walk_next(&iter);
> 		}
> 
> -- 
> 2.39.0
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge
@ 2023-01-20 19:52 Jeff Layton
  2023-01-20 20:21 ` Chuck Lever III
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2023-01-20 19:52 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Ruben Vestergaard, Torkil Svensgaard, Shachar Kagan, linux-nfs,
	linux-kernel

nfsd_file_cache_purge is called when the server is shutting down, in
which case, tearing things down is generally fine, but it also gets
called when the exports cache is flushed.

Instead of walking the cache and freeing everything unconditionally,
handle it the same as when we have a notification of conflicting access.

Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
Reported-by: Shachar Kagan <skagan@nvidia.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/filecache.c | 61 +++++++++++++++++++++++++++------------------
 1 file changed, 37 insertions(+), 24 deletions(-)

v2: use the same method to purge entries from the cache as we do when
    there is a notification of conflicting access.

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 58ac93e7e680..397ae212b98d 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -661,6 +661,40 @@ static struct shrinker	nfsd_file_shrinker = {
 	.seeks = 1,
 };
 
+/**
+ * maybe_queue_nfsd_file - try to unhash and queue a nfsd_file to be freed
+ * @nf: nfsd_file to attempt to queue
+ * @dispose: private list to queue successfully-put objects
+ *
+ * Unhash an nfsd_file, try to get a reference to it, and then put that
+ * reference. If it's the last reference, queue it to the dispose list.
+ *
+ * The caller MUST hold the rcu_read_lock() !
+ */
+static void
+maybe_queue_nfsd_file(struct nfsd_file *nf, struct list_head *dispose)
+{
+	int decrement = 1;
+
+	/* If we raced with someone else unhashing, ignore it */
+	if (!nfsd_file_unhash(nf))
+		return;
+
+	/* If we can't get a reference, ignore it */
+	if (!nfsd_file_get(nf))
+		return;
+
+	/* Extra decrement if we remove from the LRU */
+	if (nfsd_file_lru_remove(nf))
+		++decrement;
+
+	/* If refcount goes to 0, then put on the dispose list */
+	if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
+		list_add(&nf->nf_lru, dispose);
+		trace_nfsd_file_closing(nf);
+	}
+}
+
 /**
  * nfsd_file_queue_for_close: try to close out any open nfsd_files for an inode
  * @inode:   inode on which to close out nfsd_files
@@ -688,30 +722,12 @@ nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose)
 
 	rcu_read_lock();
 	do {
-		int decrement = 1;
-
 		nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key,
 				       nfsd_file_rhash_params);
 		if (!nf)
 			break;
 
-		/* If we raced with someone else unhashing, ignore it */
-		if (!nfsd_file_unhash(nf))
-			continue;
-
-		/* If we can't get a reference, ignore it */
-		if (!nfsd_file_get(nf))
-			continue;
-
-		/* Extra decrement if we remove from the LRU */
-		if (nfsd_file_lru_remove(nf))
-			++decrement;
-
-		/* If refcount goes to 0, then put on the dispose list */
-		if (refcount_sub_and_test(decrement, &nf->nf_ref)) {
-			list_add(&nf->nf_lru, dispose);
-			trace_nfsd_file_closing(nf);
-		}
+		maybe_queue_nfsd_file(nf, dispose);
 	} while (1);
 	rcu_read_unlock();
 }
@@ -928,11 +944,8 @@ __nfsd_file_cache_purge(struct net *net)
 
 		nf = rhashtable_walk_next(&iter);
 		while (!IS_ERR_OR_NULL(nf)) {
-			if (!net || nf->nf_net == net) {
-				nfsd_file_unhash(nf);
-				nfsd_file_lru_remove(nf);
-				list_add(&nf->nf_lru, &dispose);
-			}
+			if (!net || nf->nf_net == net)
+				maybe_queue_nfsd_file(nf, &dispose);
 			nf = rhashtable_walk_next(&iter);
 		}
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-01-23 10:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-19 19:20 [PATCH] nfsd: don't free files unconditionally in __nfsd_file_cache_purge Jeff Layton
2023-01-19 22:59 ` Chuck Lever III
2023-01-19 23:29   ` Jeff Layton
2023-01-20 18:36 ` Jeff Layton
2023-01-20 19:52 Jeff Layton
2023-01-20 20:21 ` Chuck Lever III
2023-01-20 20:54   ` Jeff Layton
2023-01-21 17:04     ` Chuck Lever III
2023-01-21 17:20       ` Jeff Layton
2023-01-22 14:58         ` Shachar Kagan
2023-01-22 23:40           ` Chuck Lever III
2023-01-23 10:57             ` Shachar Kagan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).