All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever III <chuck.lever@oracle.com>
To: Wang Yugui <wangyugui@e16-tech.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"tgraf@suug.ch" <tgraf@suug.ch>, Jeff Layton <jlayton@redhat.com>
Subject: Re: [PATCH RFC 00/30] Overhaul NFSD filecache
Date: Wed, 22 Jun 2022 19:04:39 +0000	[thread overview]
Message-ID: <FE520DC8-3C8F-4974-9F3B-84DE822CB899@oracle.com> (raw)
In-Reply-To: <20220623023645.F914.409509F4@e16-tech.com>



> On Jun 22, 2022, at 2:36 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
> 
> Hi,
> 
> fstests generic/531 triggered a panic on kernel 5.19.0-rc3 with this
> patchset.

As I mention in the cover letter, I haven't tried running generic/531
yet -- no claim at all that this is finished work and that #386 has
been fixed at this point. I'm merely interested in comments on the
general approach.


> [  405.478056] BUG: kernel NULL pointer dereference, address: 0000000000000049

The "RIP: " tells the location of the crash. Notice that the call
trace here does not include that information. From your attachment:

[  405.518022] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]

To match that to a line of source code:

[cel@manet ~]$ cd src/linux/linux/
[cel@manet linux]$ scripts/faddr2line ../obj/manet/fs/nfsd/filecache.o nfsd_do_file_acquire+0x4e1
nfsd_do_file_acquire+0x4e1/0xfc0:
rht_bucket_insert at /home/cel/src/linux/linux/include/linux/rhashtable.h:303
(inlined by) __rhashtable_insert_fast at /home/cel/src/linux/linux/include/linux/rhashtable.h:718
(inlined by) rhashtable_lookup_get_insert_key at /home/cel/src/linux/linux/include/linux/rhashtable.h:982
(inlined by) nfsd_file_insert at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1031
(inlined by) nfsd_do_file_acquire at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1089
[cel@manet linux]$

This is an example, I'm sure my compiled objects don't match yours.

And, now that I've added observability, you should be able to do:

  # watch cat /proc/fs/nfsd/filecache

to see how many items are in the hash and LRU list while the test
is running.


> [  405.608016] Call Trace:
> [  405.608016]  <TASK>
> [  405.613020]  nfs4_get_vfs_file+0x325/0x410 [nfsd]
> [  405.618018]  nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
> [  405.623016]  ? inode_get_bytes+0x38/0x40
> [  405.623016]  ? nfsd_permission+0x97/0xf0 [nfsd]
> [  405.628022]  ? fh_verify+0x1cc/0x6f0 [nfsd]
> [  405.633025]  nfsd4_open+0x640/0xb30 [nfsd]
> [  405.638025]  nfsd4_proc_compound+0x3bd/0x710 [nfsd]
> [  405.643017]  nfsd_dispatch+0x143/0x270 [nfsd]
> [  405.648019]  svc_process_common+0x3bf/0x5b0 [sunrpc]
> 
> more detail in attachment file(531.dmesg)
> 
> local.config of fstests:
> 	export NFS_MOUNT_OPTIONS="-o rw,relatime,vers=4.2,nconnect=8"
> changes of generic/531
> 	max_allowable_files=$(( 1 * 1024 * 1024 / $nr_cpus / 2 ))

Changed from:

	max_allowable_files=$(( $(cat /proc/sys/fs/file-max) / $nr_cpus / 2 ))

For my own information, what's $nr_cpus in your test?

Aside from the max_allowable_files setting, can you tell how the
test determines when it should stop creating files? Is it looking
for a particular error code from open(2), for instance?

On my client:

[cel@morisot generic]$ cat /proc/sys/fs/file-max
9223372036854775807
[cel@morisot generic]$

I wonder if it's realistic to expect an NFSv4 server to support
that many open files. Is 9 quintillion files really something
I'm going to have to engineer for, or is this just a crazy
test?


> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/06/23
> 
>> This series overhauls the NFSD filecache, a cache of server-side
>> "struct file" objects recently used by NFS clients. The purposes of
>> this overhaul are an immediate improvement in cache scalability in
>> the number of open files, and preparation for further improvements.
>> 
>> There are three categories of patches in this series:
>> 
>> 1. Add observability of cache operation so we can see what we're
>> doing as changes are made to the code.
>> 
>> 2. Improve the scalability of filecache garbage collection,
>> addressing several bugs along the way.
>> 
>> 3. Improve the scalability of the filecache hash table by converting
>> it to use rhashtable.
>> 
>> The series as it stands survives typical test workloads. Running
>> stress-tests like generic/531 is the next step.
>> 
>> These patches are also available in the linux-nfs-bugzilla-386
>> branch of
>> 
>>  https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git 
>> 
>> ---
>> 
>> Chuck Lever (30):
>>      NFSD: Report filecache LRU size
>>      NFSD: Report count of calls to nfsd_file_acquire()
>>      NFSD: Report count of freed filecache items
>>      NFSD: Report average age of filecache items
>>      NFSD: Add nfsd_file_lru_dispose_list() helper
>>      NFSD: Refactor nfsd_file_gc()
>>      NFSD: Refactor nfsd_file_lru_scan()
>>      NFSD: Report the number of items evicted by the LRU walk
>>      NFSD: Record number of flush calls
>>      NFSD: Report filecache item construction failures
>>      NFSD: Zero counters when the filecache is re-initialized
>>      NFSD: Hook up the filecache stat file
>>      NFSD: WARN when freeing an item still linked via nf_lru
>>      NFSD: Trace filecache LRU activity
>>      NFSD: Leave open files out of the filecache LRU
>>      NFSD: Fix the filecache LRU shrinker
>>      NFSD: Never call nfsd_file_gc() in foreground paths
>>      NFSD: No longer record nf_hashval in the trace log
>>      NFSD: Remove lockdep assertion from unhash_and_release_locked()
>>      NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
>>      NFSD: Refactor __nfsd_file_close_inode()
>>      NFSD: nfsd_file_hash_remove can compute hashval
>>      NFSD: Remove nfsd_file::nf_hashval
>>      NFSD: Remove stale comment from nfsd_file_acquire()
>>      NFSD: Clean up "open file" case in nfsd_file_acquire()
>>      NFSD: Document nfsd_file_cache_purge() API contract
>>      NFSD: Replace the "init once" mechanism
>>      NFSD: Set up an rhashtable for the filecache
>>      NFSD: Convert the filecache to use rhashtable
>>      NFSD: Clean up unusued code after rhashtable conversion
>> 
>> 
>> fs/nfsd/filecache.c | 677 +++++++++++++++++++++++++++-----------------
>> fs/nfsd/filecache.h |   6 +-
>> fs/nfsd/nfsctl.c    |  10 +
>> fs/nfsd/trace.h     | 117 ++++++--
>> 4 files changed, 522 insertions(+), 288 deletions(-)
>> 
>> --
>> Chuck Lever
> 
> <531.dmesg>

--
Chuck Lever




  reply	other threads:[~2022-06-22 19:04 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22 14:12 [PATCH RFC 00/30] Overhaul NFSD filecache Chuck Lever
2022-06-22 14:12 ` [PATCH RFC 01/30] NFSD: Report filecache LRU size Chuck Lever
2022-06-22 14:12 ` [PATCH RFC 02/30] NFSD: Report count of calls to nfsd_file_acquire() Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 03/30] NFSD: Report count of freed filecache items Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 04/30] NFSD: Report average age of " Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 05/30] NFSD: Add nfsd_file_lru_dispose_list() helper Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 06/30] NFSD: Refactor nfsd_file_gc() Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 07/30] NFSD: Refactor nfsd_file_lru_scan() Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 08/30] NFSD: Report the number of items evicted by the LRU walk Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 09/30] NFSD: Record number of flush calls Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 10/30] NFSD: Report filecache item construction failures Chuck Lever
2022-06-22 14:13 ` [PATCH RFC 11/30] NFSD: Zero counters when the filecache is re-initialized Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 12/30] NFSD: Hook up the filecache stat file Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 13/30] NFSD: WARN when freeing an item still linked via nf_lru Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 14/30] NFSD: Trace filecache LRU activity Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 15/30] NFSD: Leave open files out of the filecache LRU Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 16/30] NFSD: Fix the filecache LRU shrinker Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 17/30] NFSD: Never call nfsd_file_gc() in foreground paths Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 18/30] NFSD: No longer record nf_hashval in the trace log Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 19/30] NFSD: Remove lockdep assertion from unhash_and_release_locked() Chuck Lever
2022-06-22 14:14 ` [PATCH RFC 20/30] NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 21/30] NFSD: Refactor __nfsd_file_close_inode() Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 22/30] NFSD: nfsd_file_hash_remove can compute hashval Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 23/30] NFSD: Remove nfsd_file::nf_hashval Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 24/30] NFSD: Remove stale comment from nfsd_file_acquire() Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 25/30] NFSD: Clean up "open file" case in nfsd_file_acquire() Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 26/30] NFSD: Document nfsd_file_cache_purge() API contract Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 27/30] NFSD: Replace the "init once" mechanism Chuck Lever
2022-06-22 14:15 ` [PATCH RFC 28/30] NFSD: Set up an rhashtable for the filecache Chuck Lever
2022-06-23 22:56   ` Al Viro
2022-06-23 23:51     ` Chuck Lever III
2022-06-24  0:14       ` Chuck Lever III
2022-06-24  0:29         ` Al Viro
2022-06-22 14:15 ` [PATCH RFC 29/30] NFSD: Convert the filecache to use rhashtable Chuck Lever
2022-06-23  0:38   ` Dave Chinner
2022-06-23  0:58     ` Chuck Lever III
2022-06-23 17:27       ` Chuck Lever III
2022-06-23 22:33         ` Dave Chinner
2022-06-23 23:59           ` Chuck Lever III
2022-06-22 14:16 ` [PATCH RFC 30/30] NFSD: Clean up unusued code after rhashtable conversion Chuck Lever
2022-06-22 18:36 ` [PATCH RFC 00/30] Overhaul NFSD filecache Wang Yugui
2022-06-22 19:04   ` Chuck Lever III [this message]
2022-06-22 19:59     ` Chuck Lever III
2022-06-23  9:02       ` Wang Yugui
2022-06-23 16:44         ` Chuck Lever III
2022-06-23 17:51           ` Wang Yugui
2022-06-24 15:30             ` Chuck Lever III
2022-06-23  0:21     ` Dave Chinner
2022-06-23  1:01       ` Chuck Lever III
2022-06-23 20:27 ` Frank van der Linden
2022-06-28 17:57   ` Chuck Lever III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FE520DC8-3C8F-4974-9F3B-84DE822CB899@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=david@fromorbit.com \
    --cc=jlayton@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    --cc=wangyugui@e16-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.