All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init
@ 2022-02-24 16:17 Amir Goldstein
  2022-02-24 16:45 ` Jeff Layton
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Amir Goldstein @ 2022-02-24 16:17 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Jeff Layton, linux-nfs

The nfsd file cache table can be pretty large and its allocation
may require as many as 80 contigious pages.

Employ the same fix that was employed for similar issue that was
reported for the reply cache hash table allocation several years ago
by commit 8f97514b423a ("nfsd: more robust allocation failure handling
in nfsd_reply_cache_init").

Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@kernel.org/
Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Since v1:
- Use kvcalloc()
- Use kvfree()

 fs/nfsd/filecache.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 8bc807c5fea4..cc2831cec669 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -632,7 +632,7 @@ nfsd_file_cache_init(void)
 	if (!nfsd_filecache_wq)
 		goto out;
 
-	nfsd_file_hashtbl = kcalloc(NFSD_FILE_HASH_SIZE,
+	nfsd_file_hashtbl = kvcalloc(NFSD_FILE_HASH_SIZE,
 				sizeof(*nfsd_file_hashtbl), GFP_KERNEL);
 	if (!nfsd_file_hashtbl) {
 		pr_err("nfsd: unable to allocate nfsd_file_hashtbl\n");
@@ -700,7 +700,7 @@ nfsd_file_cache_init(void)
 	nfsd_file_slab = NULL;
 	kmem_cache_destroy(nfsd_file_mark_slab);
 	nfsd_file_mark_slab = NULL;
-	kfree(nfsd_file_hashtbl);
+	kvfree(nfsd_file_hashtbl);
 	nfsd_file_hashtbl = NULL;
 	destroy_workqueue(nfsd_filecache_wq);
 	nfsd_filecache_wq = NULL;
@@ -811,7 +811,7 @@ nfsd_file_cache_shutdown(void)
 	fsnotify_wait_marks_destroyed();
 	kmem_cache_destroy(nfsd_file_mark_slab);
 	nfsd_file_mark_slab = NULL;
-	kfree(nfsd_file_hashtbl);
+	kvfree(nfsd_file_hashtbl);
 	nfsd_file_hashtbl = NULL;
 	destroy_workqueue(nfsd_filecache_wq);
 	nfsd_filecache_wq = NULL;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init
  2022-02-24 16:17 [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init Amir Goldstein
@ 2022-02-24 16:45 ` Jeff Layton
  2022-02-24 20:17 ` Chuck Lever III
  2022-02-24 21:49 ` NeilBrown
  2 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2022-02-24 16:45 UTC (permalink / raw)
  To: Amir Goldstein, Chuck Lever; +Cc: linux-nfs

On Thu, 2022-02-24 at 18:17 +0200, Amir Goldstein wrote:
> The nfsd file cache table can be pretty large and its allocation
> may require as many as 80 contigious pages.
> 
> Employ the same fix that was employed for similar issue that was
> reported for the reply cache hash table allocation several years ago
> by commit 8f97514b423a ("nfsd: more robust allocation failure handling
> in nfsd_reply_cache_init").
> 
> Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
> Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@kernel.org/
> Suggested-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Since v1:
> - Use kvcalloc()
> - Use kvfree()
> 
>  fs/nfsd/filecache.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 8bc807c5fea4..cc2831cec669 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -632,7 +632,7 @@ nfsd_file_cache_init(void)
>  	if (!nfsd_filecache_wq)
>  		goto out;
>  
> -	nfsd_file_hashtbl = kcalloc(NFSD_FILE_HASH_SIZE,
> +	nfsd_file_hashtbl = kvcalloc(NFSD_FILE_HASH_SIZE,
>  				sizeof(*nfsd_file_hashtbl), GFP_KERNEL);
>  	if (!nfsd_file_hashtbl) {
>  		pr_err("nfsd: unable to allocate nfsd_file_hashtbl\n");
> @@ -700,7 +700,7 @@ nfsd_file_cache_init(void)
>  	nfsd_file_slab = NULL;
>  	kmem_cache_destroy(nfsd_file_mark_slab);
>  	nfsd_file_mark_slab = NULL;
> -	kfree(nfsd_file_hashtbl);
> +	kvfree(nfsd_file_hashtbl);
>  	nfsd_file_hashtbl = NULL;
>  	destroy_workqueue(nfsd_filecache_wq);
>  	nfsd_filecache_wq = NULL;
> @@ -811,7 +811,7 @@ nfsd_file_cache_shutdown(void)
>  	fsnotify_wait_marks_destroyed();
>  	kmem_cache_destroy(nfsd_file_mark_slab);
>  	nfsd_file_mark_slab = NULL;
> -	kfree(nfsd_file_hashtbl);
> +	kvfree(nfsd_file_hashtbl);
>  	nfsd_file_hashtbl = NULL;
>  	destroy_workqueue(nfsd_filecache_wq);
>  	nfsd_filecache_wq = NULL;

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init
  2022-02-24 16:17 [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init Amir Goldstein
  2022-02-24 16:45 ` Jeff Layton
@ 2022-02-24 20:17 ` Chuck Lever III
  2022-02-24 21:39   ` Amir Goldstein
  2022-02-24 21:49 ` NeilBrown
  2 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever III @ 2022-02-24 20:17 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, Linux NFS Mailing List

Hi Amir-

> On Feb 24, 2022, at 11:17 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> 
> The nfsd file cache table can be pretty large and its allocation
> may require as many as 80 contigious pages.
> 
> Employ the same fix that was employed for similar issue that was
> reported for the reply cache hash table allocation several years ago
> by commit 8f97514b423a ("nfsd: more robust allocation failure handling
> in nfsd_reply_cache_init").
> 
> Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
> Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@kernel.org/
> Suggested-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Since v1:
> - Use kvcalloc()
> - Use kvfree()
> 
> fs/nfsd/filecache.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)

v2 passes some simple testing, so I've applied it to NFSD for-next.
It should get 0-day and merge testing and is available for others
to try out.

I don't have anything that exercises low memory scenarios, though.
Do you have anything like this to try?


> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 8bc807c5fea4..cc2831cec669 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -632,7 +632,7 @@ nfsd_file_cache_init(void)
> 	if (!nfsd_filecache_wq)
> 		goto out;
> 
> -	nfsd_file_hashtbl = kcalloc(NFSD_FILE_HASH_SIZE,
> +	nfsd_file_hashtbl = kvcalloc(NFSD_FILE_HASH_SIZE,
> 				sizeof(*nfsd_file_hashtbl), GFP_KERNEL);
> 	if (!nfsd_file_hashtbl) {
> 		pr_err("nfsd: unable to allocate nfsd_file_hashtbl\n");
> @@ -700,7 +700,7 @@ nfsd_file_cache_init(void)
> 	nfsd_file_slab = NULL;
> 	kmem_cache_destroy(nfsd_file_mark_slab);
> 	nfsd_file_mark_slab = NULL;
> -	kfree(nfsd_file_hashtbl);
> +	kvfree(nfsd_file_hashtbl);
> 	nfsd_file_hashtbl = NULL;
> 	destroy_workqueue(nfsd_filecache_wq);
> 	nfsd_filecache_wq = NULL;
> @@ -811,7 +811,7 @@ nfsd_file_cache_shutdown(void)
> 	fsnotify_wait_marks_destroyed();
> 	kmem_cache_destroy(nfsd_file_mark_slab);
> 	nfsd_file_mark_slab = NULL;
> -	kfree(nfsd_file_hashtbl);
> +	kvfree(nfsd_file_hashtbl);
> 	nfsd_file_hashtbl = NULL;
> 	destroy_workqueue(nfsd_filecache_wq);
> 	nfsd_filecache_wq = NULL;
> -- 
> 2.25.1
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init
  2022-02-24 20:17 ` Chuck Lever III
@ 2022-02-24 21:39   ` Amir Goldstein
  2022-02-26 18:37     ` Amir Goldstein
  0 siblings, 1 reply; 8+ messages in thread
From: Amir Goldstein @ 2022-02-24 21:39 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Jeff Layton, Linux NFS Mailing List

On Thu, Feb 24, 2022 at 10:41 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
> Hi Amir-
>
> > On Feb 24, 2022, at 11:17 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > The nfsd file cache table can be pretty large and its allocation
> > may require as many as 80 contigious pages.
> >
> > Employ the same fix that was employed for similar issue that was
> > reported for the reply cache hash table allocation several years ago
> > by commit 8f97514b423a ("nfsd: more robust allocation failure handling
> > in nfsd_reply_cache_init").
> >
> > Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
> > Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@kernel.org/
> > Suggested-by: Jeff Layton <jlayton@kernel.org>
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >
> > Since v1:
> > - Use kvcalloc()
> > - Use kvfree()
> >
> > fs/nfsd/filecache.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
>
> v2 passes some simple testing, so I've applied it to NFSD for-next.
> It should get 0-day and merge testing and is available for others
> to try out.
>
> I don't have anything that exercises low memory scenarios, though.
> Do you have anything like this to try?

Well, it is not low memory really it's fragmented memory.
I would try setting:

CONFIG_FAIL_PAGE_ALLOC=y

echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
echo 100 > /sys/kernel/debug/fail_page_alloc/probability

and starting (or restarting) nfsd.
hoping that other large page allocations won't get in the way.

I gave it a shot, but couldn't figure out why nfsd4_files slab
is still there after stopping nfs-server service, meaning that
nfsd_file_cache_shutdown() was not called - I must be missing
something. I may play with this some more tomorrow.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init
  2022-02-24 16:17 [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init Amir Goldstein
  2022-02-24 16:45 ` Jeff Layton
  2022-02-24 20:17 ` Chuck Lever III
@ 2022-02-24 21:49 ` NeilBrown
  2022-02-24 21:56   ` Chuck Lever III
  2 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2022-02-24 21:49 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Chuck Lever, Jeff Layton, linux-nfs

On Fri, 25 Feb 2022, Amir Goldstein wrote:
> The nfsd file cache table can be pretty large and its allocation
> may require as many as 80 contigious pages.
> 
> Employ the same fix that was employed for similar issue that was
> reported for the reply cache hash table allocation several years ago
> by commit 8f97514b423a ("nfsd: more robust allocation failure handling
> in nfsd_reply_cache_init").
> 
> Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
> Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@kernel.org/
> Suggested-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Since v1:
> - Use kvcalloc()
> - Use kvfree()

I think this is a good improvement, but it would be really nice to
replace this bespoke hash table with an rhashtable.  They we wouldn't
need to worry about these trivial details.

NeilBrown

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init
  2022-02-24 21:49 ` NeilBrown
@ 2022-02-24 21:56   ` Chuck Lever III
  0 siblings, 0 replies; 8+ messages in thread
From: Chuck Lever III @ 2022-02-24 21:56 UTC (permalink / raw)
  To: Neil Brown; +Cc: Amir Goldstein, Jeff Layton, Linux NFS Mailing List



> On Feb 24, 2022, at 4:49 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Fri, 25 Feb 2022, Amir Goldstein wrote:
>> The nfsd file cache table can be pretty large and its allocation
>> may require as many as 80 contigious pages.
>> 
>> Employ the same fix that was employed for similar issue that was
>> reported for the reply cache hash table allocation several years ago
>> by commit 8f97514b423a ("nfsd: more robust allocation failure handling
>> in nfsd_reply_cache_init").
>> 
>> Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
>> Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@kernel.org/
>> Suggested-by: Jeff Layton <jlayton@kernel.org>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>> 
>> Since v1:
>> - Use kvcalloc()
>> - Use kvfree()
> 
> I think this is a good improvement, but it would be really nice to
> replace this bespoke hash table with an rhashtable.  They we wouldn't
> need to worry about these trivial details.

I agree -- I didn't want to saddle Jeff or Amir with pulling
on that chain. But I'm willing to review patches that attempt
that kind of replacement (same for the DRC hash table).


--
Chuck Lever




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init
  2022-02-24 21:39   ` Amir Goldstein
@ 2022-02-26 18:37     ` Amir Goldstein
  2022-02-26 20:00       ` Chuck Lever III
  0 siblings, 1 reply; 8+ messages in thread
From: Amir Goldstein @ 2022-02-26 18:37 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Jeff Layton, Linux NFS Mailing List

On Thu, Feb 24, 2022 at 11:39 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Feb 24, 2022 at 10:41 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >
> > Hi Amir-
> >
> > > On Feb 24, 2022, at 11:17 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > The nfsd file cache table can be pretty large and its allocation
> > > may require as many as 80 contigious pages.
> > >
> > > Employ the same fix that was employed for similar issue that was
> > > reported for the reply cache hash table allocation several years ago
> > > by commit 8f97514b423a ("nfsd: more robust allocation failure handling
> > > in nfsd_reply_cache_init").
> > >
> > > Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
> > > Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@kernel.org/
> > > Suggested-by: Jeff Layton <jlayton@kernel.org>
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >
> > > Since v1:
> > > - Use kvcalloc()
> > > - Use kvfree()
> > >
> > > fs/nfsd/filecache.c | 6 +++---
> > > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > v2 passes some simple testing, so I've applied it to NFSD for-next.
> > It should get 0-day and merge testing and is available for others
> > to try out.
> >
> > I don't have anything that exercises low memory scenarios, though.
> > Do you have anything like this to try?
>
> Well, it is not low memory really it's fragmented memory.
> I would try setting:
>
> CONFIG_FAIL_PAGE_ALLOC=y
>
> echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
> echo 100 > /sys/kernel/debug/fail_page_alloc/probability
>
> and starting (or restarting) nfsd.
> hoping that other large page allocations won't get in the way.
>
> I gave it a shot, but couldn't figure out why nfsd4_files slab
> is still there after stopping nfs-server service, meaning that
> nfsd_file_cache_shutdown() was not called - I must be missing
> something. I may play with this some more tomorrow.
>

Ok, I was missing some parameters.
This configuration reproduces and failure and verified that the
kvcalloc() fix solves the issue:

$ systemctl stop nfs-server
$ echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
$ echo 100 > /sys/kernel/debug/fail_page_alloc/probability
$ echo 1 > /sys/kernel/debug/fail_page_alloc/times
$ echo N > /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait
$ systemctl start nfs-server

[   24.410560] FAULT_INJECTION: forcing a failure.
[   24.410560] name fail_page_alloc, interval 1, probability 100,
space 0, times 1
[   24.413887] CPU: 1 PID: 1218 Comm: rpc.nfsd Not tainted
5.17.0-rc2-xfstests #5927
[   24.415625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.13.0-1ubuntu1.1 04/01/2014
[   24.417098] Call Trace:
[   24.417098]  <TASK>
[   24.417098]  dump_stack_lvl+0x45/0x59
[   24.418999]  should_fail+0x11a/0x13d
[   24.418999]  prepare_alloc_pages.isra.0+0x97/0xc5
[   24.418999]  __alloc_pages+0x76/0x1c7
[   24.418999]  kmalloc_order+0x35/0xa7
[   24.418999]  kmalloc_order_trace+0x1b/0xf3
[   24.418999]  nfsd_file_cache_init+0x5b/0x2d8
[   24.418999]  nfsd_svc+0xcd/0x2b2
[   24.427086]  write_threads+0x6d/0xb5
[   24.427086]  ? get_int+0x70/0x70
[   24.429020]  nfsctl_transaction_write+0x4f/0x67
[   24.429020]  vfs_write+0xe3/0x14b
[   24.429020]  ksys_write+0x7f/0xcb
[   24.429020]  do_syscall_64+0x6d/0x80
[   24.429020]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   24.429020] RIP: 0033:0x7f29d80d6504
[   24.429020] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f
1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89
f5 53
[   24.439028] RSP: 002b:00007ffe867a47f8 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[   24.439028] RAX: ffffffffffffffda RBX: 00007f29d8219560 RCX: 00007f29d80d6504
[   24.442325] RDX: 0000000000000002 RSI: 00007f29d8219560 RDI: 0000000000000003
[   24.442325] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffe867a4557
[   24.445644] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[   24.445644] R13: 0000000000000008 R14: 0000000000000000 R15: 00007f29d83572a0
[   24.449026]  </TASK>
[   24.450496] nfsd: unable to allocate nfsd_file_hashtbl
Job for nfs-server.service canceled.

With the fix patch, nfsd starts despite the injected failure.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init
  2022-02-26 18:37     ` Amir Goldstein
@ 2022-02-26 20:00       ` Chuck Lever III
  0 siblings, 0 replies; 8+ messages in thread
From: Chuck Lever III @ 2022-02-26 20:00 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jeff Layton, Linux NFS Mailing List



> On Feb 26, 2022, at 1:37 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> 
> On Thu, Feb 24, 2022 at 11:39 PM Amir Goldstein <amir73il@gmail.com> wrote:
>> 
>> On Thu, Feb 24, 2022 at 10:41 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>> 
>>> Hi Amir-
>>> 
>>>> On Feb 24, 2022, at 11:17 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> 
>>>> The nfsd file cache table can be pretty large and its allocation
>>>> may require as many as 80 contigious pages.
>>>> 
>>>> Employ the same fix that was employed for similar issue that was
>>>> reported for the reply cache hash table allocation several years ago
>>>> by commit 8f97514b423a ("nfsd: more robust allocation failure handling
>>>> in nfsd_reply_cache_init").
>>>> 
>>>> Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
>>>> Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@kernel.org/
>>>> Suggested-by: Jeff Layton <jlayton@kernel.org>
>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>> ---
>>>> 
>>>> Since v1:
>>>> - Use kvcalloc()
>>>> - Use kvfree()
>>>> 
>>>> fs/nfsd/filecache.c | 6 +++---
>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>> 
>>> v2 passes some simple testing, so I've applied it to NFSD for-next.
>>> It should get 0-day and merge testing and is available for others
>>> to try out.
>>> 
>>> I don't have anything that exercises low memory scenarios, though.
>>> Do you have anything like this to try?
>> 
>> Well, it is not low memory really it's fragmented memory.
>> I would try setting:
>> 
>> CONFIG_FAIL_PAGE_ALLOC=y
>> 
>> echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
>> echo 100 > /sys/kernel/debug/fail_page_alloc/probability
>> 
>> and starting (or restarting) nfsd.
>> hoping that other large page allocations won't get in the way.
>> 
>> I gave it a shot, but couldn't figure out why nfsd4_files slab
>> is still there after stopping nfs-server service, meaning that
>> nfsd_file_cache_shutdown() was not called - I must be missing
>> something. I may play with this some more tomorrow.
>> 
> 
> Ok, I was missing some parameters.
> This configuration reproduces and failure and verified that the
> kvcalloc() fix solves the issue:
> 
> $ systemctl stop nfs-server
> $ echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
> $ echo 100 > /sys/kernel/debug/fail_page_alloc/probability
> $ echo 1 > /sys/kernel/debug/fail_page_alloc/times
> $ echo N > /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait
> $ systemctl start nfs-server
> 
> [   24.410560] FAULT_INJECTION: forcing a failure.
> [   24.410560] name fail_page_alloc, interval 1, probability 100,
> space 0, times 1
> [   24.413887] CPU: 1 PID: 1218 Comm: rpc.nfsd Not tainted
> 5.17.0-rc2-xfstests #5927
> [   24.415625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.13.0-1ubuntu1.1 04/01/2014
> [   24.417098] Call Trace:
> [   24.417098]  <TASK>
> [   24.417098]  dump_stack_lvl+0x45/0x59
> [   24.418999]  should_fail+0x11a/0x13d
> [   24.418999]  prepare_alloc_pages.isra.0+0x97/0xc5
> [   24.418999]  __alloc_pages+0x76/0x1c7
> [   24.418999]  kmalloc_order+0x35/0xa7
> [   24.418999]  kmalloc_order_trace+0x1b/0xf3
> [   24.418999]  nfsd_file_cache_init+0x5b/0x2d8
> [   24.418999]  nfsd_svc+0xcd/0x2b2
> [   24.427086]  write_threads+0x6d/0xb5
> [   24.427086]  ? get_int+0x70/0x70
> [   24.429020]  nfsctl_transaction_write+0x4f/0x67
> [   24.429020]  vfs_write+0xe3/0x14b
> [   24.429020]  ksys_write+0x7f/0xcb
> [   24.429020]  do_syscall_64+0x6d/0x80
> [   24.429020]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   24.429020] RIP: 0033:0x7f29d80d6504
> [   24.429020] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f
> 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89
> f5 53
> [   24.439028] RSP: 002b:00007ffe867a47f8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000001
> [   24.439028] RAX: ffffffffffffffda RBX: 00007f29d8219560 RCX: 00007f29d80d6504
> [   24.442325] RDX: 0000000000000002 RSI: 00007f29d8219560 RDI: 0000000000000003
> [   24.442325] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffe867a4557
> [   24.445644] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [   24.445644] R13: 0000000000000008 R14: 0000000000000000 R15: 00007f29d83572a0
> [   24.449026]  </TASK>
> [   24.450496] nfsd: unable to allocate nfsd_file_hashtbl
> Job for nfs-server.service canceled.
> 
> With the fix patch, nfsd starts despite the injected failure.

Thanks. I will add your Tested-by: .


--
Chuck Lever




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-02-26 20:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-24 16:17 [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init Amir Goldstein
2022-02-24 16:45 ` Jeff Layton
2022-02-24 20:17 ` Chuck Lever III
2022-02-24 21:39   ` Amir Goldstein
2022-02-26 18:37     ` Amir Goldstein
2022-02-26 20:00       ` Chuck Lever III
2022-02-24 21:49 ` NeilBrown
2022-02-24 21:56   ` Chuck Lever III

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.