'ls -lrt' performance issue on large dir while dir is being modified

* 'ls -lrt' performance issue on large dir while dir is being modified
@ 2019-12-18  0:34 Dai Ngo
  2019-12-20  4:01 ` Dai Ngo
  0 siblings, 1 reply; 12+ messages in thread
From: Dai Ngo @ 2019-12-18  0:34 UTC (permalink / raw)
  To: linux-nfs

Hi,

I'd like to report an issue with 'ls -lrt' on NFSv3 client takes
a very long time to display the content of a large directory
(100k - 200k files) while the directory is being modified by
another NFSv3 client.

The problem can be reproduced using 3 systems. One system serves
as the NFS server, one system runs as the client that doing the
'ls -lrt' and another system runs the client that creates files
on the server.

Client1 creates files using this simple script:

> #!/bin/sh
>
> if [ $# -lt 2 ]; then
>         echo "Usage: $0 number_of_files base_filename"
>         exit
> fi    
> nfiles=$1
> fname=$2
> echo "creating $nfiles files using filename[$fname]..."
> i=0   
>       
> while [ i -lt $nfiles ] ;
> do    
>         i=`expr $i + 1`
>         echo "xyz" > $fname$i
>         echo "$fname$i" 
> done

Client2 runs 'time ls -lrt /tmp/mnt/bd1 |wc -l' in a loop.

The network traces and dtrace probes showed numerous READDIRPLUS3
requests restarting  from cookie 0 which seemed to indicate the
cached pages of the directory were invalidated causing the pages
to be refilled starting from cookie 0 until the current requested
cookie.  The cached page invalidation were tracked to
nfs_force_use_readdirplus().  To verify, I made the below
modification, ran the test for various kernel versions and
captured the results shown below.

The modification is:

> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index a73e2f8bd8ec..5d4a64555fa7 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -444,7 +444,7 @@ void nfs_force_use_readdirplus(struct inode *dir)
>      if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) &&
>          !list_empty(&nfsi->open_files)) {
>          set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags);
> -        invalidate_mapping_pages(dir->i_mapping, 0, -1);
> +        nfs_zap_mapping(dir, dir->i_mapping);
>      }
>  }

Note that after this change, I did not see READDIRPLUS3 restarting
with cookie 0 anymore.

Below are the summary results of 'ls -lrt'.  For each kernel version
to be compared, one row for the original kernel and one row for the
kernel with the above modification.

I cloned dtrace-linux from here:
github.com/oracle/dtrace-linux-kernel

dtrace-linux 5.1.0-rc4 [ORI] 89191: 2m59.32s   193071: 6m7.810s
dtrace-linux 5.1.0-rc4 [MOD] 98771: 1m55.900s  191322: 3m48.668s

I cloned upstream Linux from here:
git.kernel.org/pub/scm/linux/kernel/git/tovards/linux.git

Upstream Linux 5.5.0-rc1 [ORI] 87891: 5m11.089s  160974: 14m4.384s
Upstream Linux 5.5.0-rc1 [MOD] 87075: 5m2.057s   161421: 14m33.615s

Please note that these are relative performance numbers and are used
to illustrate the issue only.

For reference, on the original dtrace-linux it takes about 9s for
'ls -ltr' to complete on a directory with 200k files if the directory
is not modified while 'ls' is running.

The number of the original Upstream Linux is *really* bad, and the
modification did not seem to have any effect, not sure why...
it could be something else is going on here.

The cache invalidation in nfs_force_use_readdirplus seems too
drastic and might need to be reviewed. Even though this change
helps but it did not get the 'ls' performance to where it's
expected to be. I think even though READDIRPLUS3 was used, the
attribute cache was invalidated due to the directory modification,
causing attribute cache misses resulting in the calls to
nfs_force_use_readdirplus as shown in this stack trace:

   0  17586     page_cache_tree_delete:entry
               vmlinux`remove_mapping+0x14
               vmlinux`invalidate_inode_page+0x7c
               vmlinux`invalidate_mapping_pages+0x1dd
               nfs`nfs_force_use_readdirplus+0x47
               nfs`__dta_nfs_lookup_revalidate_478+0x5dd
               vmlinux`d_revalidate.part.24+0x10
               vmlinux`lookup_fast+0x254
               vmlinux`walk_component+0x49
               vmlinux`path_lookupat+0x79
               vmlinux`filename_lookup+0xaf
               vmlinux`user_path_at_empty+0x36
               vmlinux`vfs_statx+0x77
               vmlinux`SYSC_newlstat+0x3d
               vmlinux`SyS_newlstat+0xe
               vmlinux`do_syscall_64+0x79
               vmlinux`entry_SYSCALL_64+0x18d

Besides the overhead of refilling the page caches from cookie 0,
I think the reason 'ls' still takes so long to compete because the
client has to send a bunch of additional LOOKUP/ACCESS requests
over the wire to service the stat(2) calls from 'ls' due to the
attribute cache misses.

Please let me know you what you think and if there is any addition
information is needed.

Thanks,
-Dai

^ permalink raw reply	[flat|nested] 12+ messages in thread