From mboxrd@z Thu Jan 1 00:00:00 1970 From: Xiaoxi Chen Subject: Re: Huge lookup when recursively mkdir Date: Mon, 23 Oct 2017 14:01:32 +0800 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: Received: from mail-it0-f54.google.com ([209.85.214.54]:53017 "EHLO mail-it0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751043AbdJWGBe (ORCPT ); Mon, 23 Oct 2017 02:01:34 -0400 Received: by mail-it0-f54.google.com with SMTP id j140so4648904itj.1 for ; Sun, 22 Oct 2017 23:01:34 -0700 (PDT) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: "Yan, Zheng" Cc: Ceph Development , John Spray Yes, actually lots of (50+) clients are trying to create the same large directory tree concurrently. So the behavior is most of the mkdir will get -EEXISTS. not very understand how the mkdir call from application level be finally turned into lookup in MDS? could you please explain a bit more ? 2017-10-23 8:44 GMT+08:00 Yan, Zheng : > On Sun, Oct 22, 2017 at 11:27 PM, Xiaoxi Chen wrote: >> To add another data point, switched to ceph-fuse 12.2.0, still seeing >> lots of lookup. >> lookup avg 1892 >> mkdir avg 367 >> create avg 222 >> open avg 228 >> > > But in your test, mkdir avg was about 1.5 times of open avg. I think > your test created millions of directories, lookups were from cache > miss. You can try enlarging client_cache_size. But I don't think it > will help much when active set of directory are so large. > > >> 2017-10-21 2:09 GMT+08:00 Xiaoxi Chen : >>> @Zheng, my kernel doesn't even has c3f4688a08f. But 200fd27 ("ceph: >>> use lookup request to revalidate dentry") is there. >>> >>> 2017-10-21 0:54 GMT+08:00 Xiaoxi Chen : >>>> Thanks, will check. >>>> >>>> A general question, does cephfs kernel client drop dentries/inode >>>> cache aggressively? What I know is if MDS issue >>>> CEPH_SESSION_RECALL_STATE, client will drop, but is there other cases >>>> client will drop cache? >>>> >>>> >>>> >>>> 2017-10-20 16:39 GMT+08:00 Yan, Zheng : >>>>> On Fri, Oct 20, 2017 at 3:28 PM, Xiaoxi Chen wrote: >>>>>> Centos 7.3, kernel version 3.10.0-514.26.2.el7.x86_64. >>>>>> >>>>>> I extract the logical of file creation in our workload into a >>>>>> reproducer , like below. >>>>>> >>>>>> Concurrently run the reproducer in 2+ node can see a lots of lookup OP. >>>>>> I thought the lookup is to open the directory tree so I tried to >>>>>> pre-make most of the dirs , use ls -i trying to read the dentries and >>>>>> cache it, then re-run the reproducer, seems nothing different.. >>>>>> >>>>>> #include >>>>>> #include >>>>>> int create_file(char * base, int count, int max, int depth) >>>>>> { >>>>>> int i; >>>>>> for(i=0; i>>>>> char dir[256]; >>>>>> int mydir = rand() % max; >>>>>> sprintf(dir, "%s/%d", path, mydir); >>>>>> if (depth >=1) { >>>>>> mkdir(dir,0777); >>>>>> create_dir(dir, count, max, depth - 1); >>>>>> } else { >>>>>> int fd = open(dir, O_CREAT | O_EXCL| O_WRONLY , 0666); >>>>>> printf("opened path : %s = %d\n", path, fd); >>>>>> close(fd); >>>>>> } >>>>>> } >>>>>> } >>>>>> int main(int argc, char argv[]) >>>>>> { >>>>>> char path[256]; >>>>>> while(1) { >>>>>> create_file("/import/SQL01", 1, 4 ,10); >>>>>> } >>>>>> } >>>>>> >>>>> >>>>> still don't see this behavior on 4.13 kernel. I suspect there is >>>>> something wrong with dentry lease. please check if your kernel >>>>> include: >>>>> >>>>> commit c3f4688a08f (ceph: don't set req->r_locked_dir in ceph_d_revalidate) >>>>> commit 5eb9f6040f3 (ceph: do a LOOKUP in d_revalidate instead of GETATTR) >>>>> >>>>> The first commit can cause this issue, the second one fixes it. >>>>> >>>>> Regards >>>>> Yan, Zheng >>>>> >>>>>> >>>>>> >>>>>> 2017-10-20 10:55 GMT+08:00 Yan, Zheng : >>>>>>> On Fri, Oct 20, 2017 at 12:49 AM, Xiaoxi Chen wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am seeing a lot of lookup request when doing recursive mkdir. >>>>>>>> The workload behavior is like: >>>>>>>> mkdir DIR0 >>>>>>>> mkdir DIR0/DIR1 >>>>>>>> mkdir DIR0/DIR1/DIR2 >>>>>>>> .... >>>>>>>> mkdir DIR0/DIR1/DIR2......./DIR7 >>>>>>>> create DIR0/DIR1/DIR2......./DIR7/FILE1 >>>>>>>> >>>>>>>> and concurrently run on 50+ clients, the dir name in different >>>>>>>> clients may or maynot be the same. >>>>>>>> >>>>>>>> from the admin socket I was seeing ~50K create requests, but >>>>>>>> got 400K lookup requests. The lookup eat up most of the mds capability >>>>>>>> so file create is slow. >>>>>>>> >>>>>>>> Where is the lookup comes from and can we have anyway to >>>>>>>> optimize it out ? >>>>>>>> >>>>>>> >>>>>>> I don't see this behavior when running following commands in 4.13 >>>>>>> kernel client and luminous version ceph-fuse. which client do you use? >>>>>>> >>>>>>> mkdir d1 >>>>>>> mkdir d1/d2 >>>>>>> mkdir d1/d2/d3 >>>>>>> mkdir d1/d2/d3/d4/ >>>>>>> mkdir d1/d2/d3/d4/d5 >>>>>>> touch d1/d2/d3/d4/d5/f >>>>>>> >>>>>>>> Xiaoxi