linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dai Ngo <dai.ngo@oracle.com>
To: "Schumaker, Anna" <Anna.Schumaker@netapp.com>,
	Trond Myklebust <trondmy@hammerspace.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: 'ls -lrt' performance issue on large dir while dir is being modified
Date: Wed, 15 Jan 2020 10:11:12 -0800	[thread overview]
Message-ID: <770937d3-9439-db4a-1f6e-59a59f2c08b9@oracle.com> (raw)
In-Reply-To: <a41af3d6-8280-e315-fb65-a9285bad50ec@oracle.com>

Hi Anna, Trond,

Would you please let me know your opinion regarding reverting the change in
nfs_force_use_readdirplus to call nfs_zap_mapping instead of invalidate_mapping_pages.
This change is to prevent the cookie of the READDIRPLUS to be reset to 0 while
an instance of 'ls' is running and the directory is being modified.

> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 
> a73e2f8bd8ec..5d4a64555fa7 100644 --- a/fs/nfs/dir.c +++ 
> b/fs/nfs/dir.c @@ -444,7 +444,7 @@ void 
> nfs_force_use_readdirplus(struct inode *dir)      if 
> (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) &&          
> !list_empty(&nfsi->open_files)) {          
> set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); -        
> invalidate_mapping_pages(dir->i_mapping, 0, -1); +        
> nfs_zap_mapping(dir, dir->i_mapping);      }  } 


Thanks,
-Dai

On 12/19/19 8:01 PM, Dai Ngo wrote:
> Hi Anna, Trond,
>
> I made a mistake with the 5.5 numbers. The VM that runs 5.5 has some
> problems. There is no regression with 5.5, here are the new numbers:
>
> Upstream Linux 5.5.0-rc1 [ORI] 93296: 3m10.917s  197891: 10m35.789s
> Upstream Linux 5.5.0-rc1 [MOD] 98614: 1m59.649s  192801: 3m55.003s
>
> My apologies for the mistake.
>
> Now there is no regression with 5.5, I'd like to get your opinion
> regarding the change to revert the call from invalidate_mapping_pages
> to nfs_zap_mapping in nfs_force_use_readdirplus to prevent the
> current 'ls' from restarting the READDIRPLUS3 from cookie 0. I'm
> not quite sure about the intention of the prior change from
> nfs_zap_mapping to invalidate_mapping_pages so that is why I'm
> seeking advise. Or do you have any suggestions to achieve the same?
>
> Thanks,
> -Dai
>
> On 12/17/19 4:34 PM, Dai Ngo wrote:
>> Hi,
>>
>> I'd like to report an issue with 'ls -lrt' on NFSv3 client takes
>> a very long time to display the content of a large directory
>> (100k - 200k files) while the directory is being modified by
>> another NFSv3 client.
>>
>> The problem can be reproduced using 3 systems. One system serves
>> as the NFS server, one system runs as the client that doing the
>> 'ls -lrt' and another system runs the client that creates files
>> on the server.
>>     Client1 creates files using this simple script:
>>
>>> #!/bin/sh
>>>
>>> if [ $# -lt 2 ]; then
>>>         echo "Usage: $0 number_of_files base_filename"
>>>         exit
>>> fi    nfiles=$1
>>> fname=$2
>>> echo "creating $nfiles files using filename[$fname]..."
>>> i=0         while [ i -lt $nfiles ] ;
>>> do            i=`expr $i + 1`
>>>         echo "xyz" > $fname$i
>>>         echo "$fname$i" done
>>
>> Client2 runs 'time ls -lrt /tmp/mnt/bd1 |wc -l' in a loop.
>>
>> The network traces and dtrace probes showed numerous READDIRPLUS3
>> requests restarting  from cookie 0 which seemed to indicate the
>> cached pages of the directory were invalidated causing the pages
>> to be refilled starting from cookie 0 until the current requested
>> cookie.  The cached page invalidation were tracked to
>> nfs_force_use_readdirplus().  To verify, I made the below
>> modification, ran the test for various kernel versions and
>> captured the results shown below.
>>
>> The modification is:
>>
>>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>>> index a73e2f8bd8ec..5d4a64555fa7 100644
>>> --- a/fs/nfs/dir.c
>>> +++ b/fs/nfs/dir.c
>>> @@ -444,7 +444,7 @@ void nfs_force_use_readdirplus(struct inode *dir)
>>>      if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) &&
>>>          !list_empty(&nfsi->open_files)) {
>>>          set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags);
>>> -        invalidate_mapping_pages(dir->i_mapping, 0, -1);
>>> +        nfs_zap_mapping(dir, dir->i_mapping);
>>>      }
>>>  }
>>
>> Note that after this change, I did not see READDIRPLUS3 restarting
>> with cookie 0 anymore.
>>
>> Below are the summary results of 'ls -lrt'.  For each kernel version
>> to be compared, one row for the original kernel and one row for the
>> kernel with the above modification.
>>
>> I cloned dtrace-linux from here:
>> github.com/oracle/dtrace-linux-kernel
>>
>> dtrace-linux 5.1.0-rc4 [ORI] 89191: 2m59.32s   193071: 6m7.810s
>> dtrace-linux 5.1.0-rc4 [MOD] 98771: 1m55.900s  191322: 3m48.668s
>>
>> I cloned upstream Linux from here:
>> git.kernel.org/pub/scm/linux/kernel/git/tovards/linux.git
>>
>> Upstream Linux 5.5.0-rc1 [ORI] 87891: 5m11.089s  160974: 14m4.384s
>> Upstream Linux 5.5.0-rc1 [MOD] 87075: 5m2.057s   161421: 14m33.615s
>>
>> Please note that these are relative performance numbers and are used
>> to illustrate the issue only.
>>
>> For reference, on the original dtrace-linux it takes about 9s for
>> 'ls -ltr' to complete on a directory with 200k files if the directory
>> is not modified while 'ls' is running.
>>
>> The number of the original Upstream Linux is *really* bad, and the
>> modification did not seem to have any effect, not sure why...
>> it could be something else is going on here.
>>
>> The cache invalidation in nfs_force_use_readdirplus seems too
>> drastic and might need to be reviewed. Even though this change
>> helps but it did not get the 'ls' performance to where it's
>> expected to be. I think even though READDIRPLUS3 was used, the
>> attribute cache was invalidated due to the directory modification,
>> causing attribute cache misses resulting in the calls to
>> nfs_force_use_readdirplus as shown in this stack trace:
>>
>>   0  17586     page_cache_tree_delete:entry
>>               vmlinux`remove_mapping+0x14
>>               vmlinux`invalidate_inode_page+0x7c
>>               vmlinux`invalidate_mapping_pages+0x1dd
>>               nfs`nfs_force_use_readdirplus+0x47
>>               nfs`__dta_nfs_lookup_revalidate_478+0x5dd
>>               vmlinux`d_revalidate.part.24+0x10
>>               vmlinux`lookup_fast+0x254
>>               vmlinux`walk_component+0x49
>>               vmlinux`path_lookupat+0x79
>>               vmlinux`filename_lookup+0xaf
>>               vmlinux`user_path_at_empty+0x36
>>               vmlinux`vfs_statx+0x77
>>               vmlinux`SYSC_newlstat+0x3d
>>               vmlinux`SyS_newlstat+0xe
>>               vmlinux`do_syscall_64+0x79
>>               vmlinux`entry_SYSCALL_64+0x18d
>>
>> Besides the overhead of refilling the page caches from cookie 0,
>> I think the reason 'ls' still takes so long to compete because the
>> client has to send a bunch of additional LOOKUP/ACCESS requests
>> over the wire to service the stat(2) calls from 'ls' due to the
>> attribute cache misses.
>>
>> Please let me know you what you think and if there is any addition
>> information is needed.
>>
>> Thanks,
>> -Dai
>>
>>

  reply	other threads:[~2020-01-15 18:11 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-18  0:34 'ls -lrt' performance issue on large dir while dir is being modified Dai Ngo
2019-12-20  4:01 ` Dai Ngo
2020-01-15 18:11   ` Dai Ngo [this message]
2020-01-15 18:54     ` Trond Myklebust
2020-01-15 19:06       ` Trond Myklebust
2020-01-15 19:28         ` Dai Ngo
2020-01-18  2:29         ` Dai Ngo
2020-01-18 15:58           ` Trond Myklebust
2020-01-18 17:26             ` Chuck Lever
2020-01-18 17:31               ` Trond Myklebust
2020-01-18 18:03                 ` Dai Ngo
2020-01-20 20:52                   ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=770937d3-9439-db4a-1f6e-59a59f2c08b9@oracle.com \
    --to=dai.ngo@oracle.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).