From: Dai Ngo <dai.ngo@oracle.com>
To: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: 'ls -lrt' performance issue on large dir while dir is being modified
Date: Thu, 19 Dec 2019 20:01:16 -0800 [thread overview]
Message-ID: <a41af3d6-8280-e315-fb65-a9285bad50ec@oracle.com> (raw)
In-Reply-To: <e04baa28-2460-4ced-e387-618ea32d827c@oracle.com>
Hi Anna, Trond,
I made a mistake with the 5.5 numbers. The VM that runs 5.5 has some
problems. There is no regression with 5.5, here are the new numbers:
Upstream Linux 5.5.0-rc1 [ORI] 93296: 3m10.917s 197891: 10m35.789s
Upstream Linux 5.5.0-rc1 [MOD] 98614: 1m59.649s 192801: 3m55.003s
My apologies for the mistake.
Now there is no regression with 5.5, I'd like to get your opinion
regarding the change to revert the call from invalidate_mapping_pages
to nfs_zap_mapping in nfs_force_use_readdirplus to prevent the
current 'ls' from restarting the READDIRPLUS3 from cookie 0. I'm
not quite sure about the intention of the prior change from
nfs_zap_mapping to invalidate_mapping_pages so that is why I'm
seeking advise. Or do you have any suggestions to achieve the same?
Thanks,
-Dai
On 12/17/19 4:34 PM, Dai Ngo wrote:
> Hi,
>
> I'd like to report an issue with 'ls -lrt' on NFSv3 client takes
> a very long time to display the content of a large directory
> (100k - 200k files) while the directory is being modified by
> another NFSv3 client.
>
> The problem can be reproduced using 3 systems. One system serves
> as the NFS server, one system runs as the client that doing the
> 'ls -lrt' and another system runs the client that creates files
> on the server.
> Client1 creates files using this simple script:
>
>> #!/bin/sh
>>
>> if [ $# -lt 2 ]; then
>> echo "Usage: $0 number_of_files base_filename"
>> exit
>> fi nfiles=$1
>> fname=$2
>> echo "creating $nfiles files using filename[$fname]..."
>> i=0 while [ i -lt $nfiles ] ;
>> do i=`expr $i + 1`
>> echo "xyz" > $fname$i
>> echo "$fname$i" done
>
> Client2 runs 'time ls -lrt /tmp/mnt/bd1 |wc -l' in a loop.
>
> The network traces and dtrace probes showed numerous READDIRPLUS3
> requests restarting from cookie 0 which seemed to indicate the
> cached pages of the directory were invalidated causing the pages
> to be refilled starting from cookie 0 until the current requested
> cookie. The cached page invalidation were tracked to
> nfs_force_use_readdirplus(). To verify, I made the below
> modification, ran the test for various kernel versions and
> captured the results shown below.
>
> The modification is:
>
>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> index a73e2f8bd8ec..5d4a64555fa7 100644
>> --- a/fs/nfs/dir.c
>> +++ b/fs/nfs/dir.c
>> @@ -444,7 +444,7 @@ void nfs_force_use_readdirplus(struct inode *dir)
>> if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) &&
>> !list_empty(&nfsi->open_files)) {
>> set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags);
>> - invalidate_mapping_pages(dir->i_mapping, 0, -1);
>> + nfs_zap_mapping(dir, dir->i_mapping);
>> }
>> }
>
> Note that after this change, I did not see READDIRPLUS3 restarting
> with cookie 0 anymore.
>
> Below are the summary results of 'ls -lrt'. For each kernel version
> to be compared, one row for the original kernel and one row for the
> kernel with the above modification.
>
> I cloned dtrace-linux from here:
> github.com/oracle/dtrace-linux-kernel
>
> dtrace-linux 5.1.0-rc4 [ORI] 89191: 2m59.32s 193071: 6m7.810s
> dtrace-linux 5.1.0-rc4 [MOD] 98771: 1m55.900s 191322: 3m48.668s
>
> I cloned upstream Linux from here:
> git.kernel.org/pub/scm/linux/kernel/git/tovards/linux.git
>
> Upstream Linux 5.5.0-rc1 [ORI] 87891: 5m11.089s 160974: 14m4.384s
> Upstream Linux 5.5.0-rc1 [MOD] 87075: 5m2.057s 161421: 14m33.615s
>
> Please note that these are relative performance numbers and are used
> to illustrate the issue only.
>
> For reference, on the original dtrace-linux it takes about 9s for
> 'ls -ltr' to complete on a directory with 200k files if the directory
> is not modified while 'ls' is running.
>
> The number of the original Upstream Linux is *really* bad, and the
> modification did not seem to have any effect, not sure why...
> it could be something else is going on here.
>
> The cache invalidation in nfs_force_use_readdirplus seems too
> drastic and might need to be reviewed. Even though this change
> helps but it did not get the 'ls' performance to where it's
> expected to be. I think even though READDIRPLUS3 was used, the
> attribute cache was invalidated due to the directory modification,
> causing attribute cache misses resulting in the calls to
> nfs_force_use_readdirplus as shown in this stack trace:
>
> 0 17586 page_cache_tree_delete:entry
> vmlinux`remove_mapping+0x14
> vmlinux`invalidate_inode_page+0x7c
> vmlinux`invalidate_mapping_pages+0x1dd
> nfs`nfs_force_use_readdirplus+0x47
> nfs`__dta_nfs_lookup_revalidate_478+0x5dd
> vmlinux`d_revalidate.part.24+0x10
> vmlinux`lookup_fast+0x254
> vmlinux`walk_component+0x49
> vmlinux`path_lookupat+0x79
> vmlinux`filename_lookup+0xaf
> vmlinux`user_path_at_empty+0x36
> vmlinux`vfs_statx+0x77
> vmlinux`SYSC_newlstat+0x3d
> vmlinux`SyS_newlstat+0xe
> vmlinux`do_syscall_64+0x79
> vmlinux`entry_SYSCALL_64+0x18d
>
> Besides the overhead of refilling the page caches from cookie 0,
> I think the reason 'ls' still takes so long to compete because the
> client has to send a bunch of additional LOOKUP/ACCESS requests
> over the wire to service the stat(2) calls from 'ls' due to the
> attribute cache misses.
>
> Please let me know you what you think and if there is any addition
> information is needed.
>
> Thanks,
> -Dai
>
>
next prev parent reply other threads:[~2019-12-20 4:01 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-18 0:34 'ls -lrt' performance issue on large dir while dir is being modified Dai Ngo
2019-12-20 4:01 ` Dai Ngo [this message]
2020-01-15 18:11 ` Dai Ngo
2020-01-15 18:54 ` Trond Myklebust
2020-01-15 19:06 ` Trond Myklebust
2020-01-15 19:28 ` Dai Ngo
2020-01-18 2:29 ` Dai Ngo
2020-01-18 15:58 ` Trond Myklebust
2020-01-18 17:26 ` Chuck Lever
2020-01-18 17:31 ` Trond Myklebust
2020-01-18 18:03 ` Dai Ngo
2020-01-20 20:52 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a41af3d6-8280-e315-fb65-a9285bad50ec@oracle.com \
--to=dai.ngo@oracle.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).