From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28063C43603 for ; Fri, 20 Dec 2019 04:01:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E446B24676 for ; Fri, 20 Dec 2019 04:01:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="qWA/RHay" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727148AbfLTEBZ (ORCPT ); Thu, 19 Dec 2019 23:01:25 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:57364 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727110AbfLTEBZ (ORCPT ); Thu, 19 Dec 2019 23:01:25 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id xBK40EdK075613 for ; Fri, 20 Dec 2019 04:01:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : references : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2019-08-05; bh=hruZpeVd6jBbeNjBCylObWSldrTsXJJLa81F8j2XPME=; b=qWA/RHayRgDSgNF+0zBtAaGp1bEKcjOURa84DyK01kpFpfsDGrVCyL0/hKFPPI2pzZgB kQXZMVGzTc/mNGCOuOCVQU3l1rjMngCVAyS+PBsnBZDMP/Qvh+ZZiZohYXCN88/7SOIl veaaB6Z1d0dk1cmSYQ/SV7lapE56z9FzjGXXtMok8zd3peA7KjQM4BTk24PktBRJWjlI Ffy3O2CUUWqu1+I/IE1+DaaNDBDLpmYl1IE6sZyKxwhbZ2rTgwFmfMAX8xyRHpa0mHpj 4OFO7417NrJx9Uj1W44UrrdBJ3uHewP5XbCSuKT2pxozvvJuOp5aWocdRXNYATVsu413 EQ== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2130.oracle.com with ESMTP id 2x01jaefcn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Fri, 20 Dec 2019 04:01:23 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id xBK3rT5T139545 for ; Fri, 20 Dec 2019 04:01:22 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3030.oracle.com with ESMTP id 2x0pcb8xsq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Fri, 20 Dec 2019 04:01:21 +0000 Received: from abhmp0006.oracle.com (abhmp0006.oracle.com [141.146.116.12]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id xBK41KGh002167 for ; Fri, 20 Dec 2019 04:01:20 GMT Received: from Macbooks-MacBook-Pro.local (/10.39.251.100) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 19 Dec 2019 20:01:20 -0800 Subject: Re: 'ls -lrt' performance issue on large dir while dir is being modified From: Dai Ngo To: "linux-nfs@vger.kernel.org" References: Message-ID: Date: Thu, 19 Dec 2019 20:01:16 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9476 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-1912200027 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9476 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-1912200028 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Hi Anna, Trond, I made a mistake with the 5.5 numbers. The VM that runs 5.5 has some problems. There is no regression with 5.5, here are the new numbers: Upstream Linux 5.5.0-rc1 [ORI] 93296: 3m10.917s 197891:  10m35.789s Upstream Linux 5.5.0-rc1 [MOD] 98614: 1m59.649s 192801:  3m55.003s My apologies for the mistake. Now there is no regression with 5.5, I'd like to get your opinion regarding the change to revert the call from invalidate_mapping_pages to nfs_zap_mapping in nfs_force_use_readdirplus to prevent the current 'ls' from restarting the READDIRPLUS3 from cookie 0. I'm not quite sure about the intention of the prior change from nfs_zap_mapping to invalidate_mapping_pages so that is why I'm seeking advise. Or do you have any suggestions to achieve the same? Thanks, -Dai On 12/17/19 4:34 PM, Dai Ngo wrote: > Hi, > > I'd like to report an issue with 'ls -lrt' on NFSv3 client takes > a very long time to display the content of a large directory > (100k - 200k files) while the directory is being modified by > another NFSv3 client. > > The problem can be reproduced using 3 systems. One system serves > as the NFS server, one system runs as the client that doing the > 'ls -lrt' and another system runs the client that creates files > on the server. >     Client1 creates files using this simple script: > >> #!/bin/sh >> >> if [ $# -lt 2 ]; then >>         echo "Usage: $0 number_of_files base_filename" >>         exit >> fi    nfiles=$1 >> fname=$2 >> echo "creating $nfiles files using filename[$fname]..." >> i=0         while [ i -lt $nfiles ] ; >> do            i=`expr $i + 1` >>         echo "xyz" > $fname$i >>         echo "$fname$i" done > > Client2 runs 'time ls -lrt /tmp/mnt/bd1 |wc -l' in a loop. > > The network traces and dtrace probes showed numerous READDIRPLUS3 > requests restarting  from cookie 0 which seemed to indicate the > cached pages of the directory were invalidated causing the pages > to be refilled starting from cookie 0 until the current requested > cookie.  The cached page invalidation were tracked to > nfs_force_use_readdirplus().  To verify, I made the below > modification, ran the test for various kernel versions and > captured the results shown below. > > The modification is: > >> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c >> index a73e2f8bd8ec..5d4a64555fa7 100644 >> --- a/fs/nfs/dir.c >> +++ b/fs/nfs/dir.c >> @@ -444,7 +444,7 @@ void nfs_force_use_readdirplus(struct inode *dir) >>      if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) && >>          !list_empty(&nfsi->open_files)) { >>          set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); >> -        invalidate_mapping_pages(dir->i_mapping, 0, -1); >> +        nfs_zap_mapping(dir, dir->i_mapping); >>      } >>  } > > Note that after this change, I did not see READDIRPLUS3 restarting > with cookie 0 anymore. > > Below are the summary results of 'ls -lrt'.  For each kernel version > to be compared, one row for the original kernel and one row for the > kernel with the above modification. > > I cloned dtrace-linux from here: > github.com/oracle/dtrace-linux-kernel > > dtrace-linux 5.1.0-rc4 [ORI] 89191: 2m59.32s   193071: 6m7.810s > dtrace-linux 5.1.0-rc4 [MOD] 98771: 1m55.900s  191322: 3m48.668s > > I cloned upstream Linux from here: > git.kernel.org/pub/scm/linux/kernel/git/tovards/linux.git > > Upstream Linux 5.5.0-rc1 [ORI] 87891: 5m11.089s  160974: 14m4.384s > Upstream Linux 5.5.0-rc1 [MOD] 87075: 5m2.057s   161421: 14m33.615s > > Please note that these are relative performance numbers and are used > to illustrate the issue only. > > For reference, on the original dtrace-linux it takes about 9s for > 'ls -ltr' to complete on a directory with 200k files if the directory > is not modified while 'ls' is running. > > The number of the original Upstream Linux is *really* bad, and the > modification did not seem to have any effect, not sure why... > it could be something else is going on here. > > The cache invalidation in nfs_force_use_readdirplus seems too > drastic and might need to be reviewed. Even though this change > helps but it did not get the 'ls' performance to where it's > expected to be. I think even though READDIRPLUS3 was used, the > attribute cache was invalidated due to the directory modification, > causing attribute cache misses resulting in the calls to > nfs_force_use_readdirplus as shown in this stack trace: > >   0  17586     page_cache_tree_delete:entry >               vmlinux`remove_mapping+0x14 >               vmlinux`invalidate_inode_page+0x7c >               vmlinux`invalidate_mapping_pages+0x1dd >               nfs`nfs_force_use_readdirplus+0x47 >               nfs`__dta_nfs_lookup_revalidate_478+0x5dd >               vmlinux`d_revalidate.part.24+0x10 >               vmlinux`lookup_fast+0x254 >               vmlinux`walk_component+0x49 >               vmlinux`path_lookupat+0x79 >               vmlinux`filename_lookup+0xaf >               vmlinux`user_path_at_empty+0x36 >               vmlinux`vfs_statx+0x77 >               vmlinux`SYSC_newlstat+0x3d >               vmlinux`SyS_newlstat+0xe >               vmlinux`do_syscall_64+0x79 >               vmlinux`entry_SYSCALL_64+0x18d > > Besides the overhead of refilling the page caches from cookie 0, > I think the reason 'ls' still takes so long to compete because the > client has to send a bunch of additional LOOKUP/ACCESS requests > over the wire to service the stat(2) calls from 'ls' due to the > attribute cache misses. > > Please let me know you what you think and if there is any addition > information is needed. > > Thanks, > -Dai > >