From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?gb2312?B?zNW2rLas?= Subject: Re: Huge lookup when recursively mkdir Date: Mon, 23 Oct 2017 14:58:30 +0800 Message-ID: <2E203A2E-DED0-456B-9354-AC1AFB9961E7@gmail.com> References: Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Content-Type: text/plain; charset=gb2312 Content-Transfer-Encoding: quoted-printable Return-path: Received: from mail-pg0-f51.google.com ([74.125.83.51]:45819 "EHLO mail-pg0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750746AbdJWG6e (ORCPT ); Mon, 23 Oct 2017 02:58:34 -0400 Received: by mail-pg0-f51.google.com with SMTP id b192so11056482pga.2 for ; Sun, 22 Oct 2017 23:58:34 -0700 (PDT) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Xiaoxi Chen Cc: "Yan, Zheng" , Ceph Development , John Spray Hi Xiaoxi ,=20 As i know, mkdir -p =A1=B0/a/b/c/d/e=A1=B1 client would first lookup the existed path , and do truly mkdir to the = left part. Thanks, Dongdong. > =D4=DA 2017=C4=EA10=D4=C223=C8=D5=A3=AC=CF=C2=CE=E72:01=A3=ACXiaoxi = Chen =D0=B4=B5=C0=A3=BA >=20 > Yes, actually lots of (50+) clients are trying to create the same > large directory tree concurrently. So the behavior is most of the > mkdir will get -EEXISTS. >=20 > not very understand how the mkdir call from application level be > finally turned into lookup in MDS? could you please explain a bit > more ? >=20 > 2017-10-23 8:44 GMT+08:00 Yan, Zheng : >> On Sun, Oct 22, 2017 at 11:27 PM, Xiaoxi Chen = wrote: >>> To add another data point, switched to ceph-fuse 12.2.0, still = seeing >>> lots of lookup. >>> lookup avg 1892 >>> mkdir avg 367 >>> create avg 222 >>> open avg 228 >>>=20 >>=20 >> But in your test, mkdir avg was about 1.5 times of open avg. I think >> your test created millions of directories, lookups were from cache >> miss. You can try enlarging client_cache_size. But I don't think it >> will help much when active set of directory are so large. >>=20 >>=20 >>> 2017-10-21 2:09 GMT+08:00 Xiaoxi Chen : >>>> @Zheng, my kernel doesn't even has c3f4688a08f. But 200fd27 = ("ceph: >>>> use lookup request to revalidate dentry") is there. >>>>=20 >>>> 2017-10-21 0:54 GMT+08:00 Xiaoxi Chen : >>>>> Thanks, will check. >>>>>=20 >>>>> A general question, does cephfs kernel client drop dentries/inode >>>>> cache aggressively? What I know is if MDS issue >>>>> CEPH_SESSION_RECALL_STATE, client will drop, but is there other = cases >>>>> client will drop cache? >>>>>=20 >>>>>=20 >>>>>=20 >>>>> 2017-10-20 16:39 GMT+08:00 Yan, Zheng : >>>>>> On Fri, Oct 20, 2017 at 3:28 PM, Xiaoxi Chen = wrote: >>>>>>> Centos 7.3, kernel version 3.10.0-514.26.2.el7.x86_64. >>>>>>>=20 >>>>>>> I extract the logical of file creation in our workload into a >>>>>>> reproducer , like below. >>>>>>>=20 >>>>>>> Concurrently run the reproducer in 2+ node can see a lots of = lookup OP. >>>>>>> I thought the lookup is to open the directory tree so I tried to >>>>>>> pre-make most of the dirs , use ls -i trying to read the = dentries and >>>>>>> cache it, then re-run the reproducer, seems nothing different.. >>>>>>>=20 >>>>>>> #include >>>>>>> #include >>>>>>> int create_file(char * base, int count, int max, int depth) >>>>>>> { >>>>>>> int i; >>>>>>> for(i=3D0; i>>>>>> char dir[256]; >>>>>>> int mydir =3D rand() % max; >>>>>>> sprintf(dir, "%s/%d", path, mydir); >>>>>>> if (depth >=3D1) { >>>>>>> mkdir(dir,0777); >>>>>>> create_dir(dir, count, max, depth - 1); >>>>>>> } else { >>>>>>> int fd =3D open(dir, O_CREAT | O_EXCL| O_WRONLY , = 0666); >>>>>>> printf("opened path : %s =3D %d\n", path, fd); >>>>>>> close(fd); >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> int main(int argc, char argv[]) >>>>>>> { >>>>>>> char path[256]; >>>>>>> while(1) { >>>>>>> create_file("/import/SQL01", 1, 4 ,10); >>>>>>> } >>>>>>> } >>>>>>>=20 >>>>>>=20 >>>>>> still don't see this behavior on 4.13 kernel. I suspect there is >>>>>> something wrong with dentry lease. please check if your kernel >>>>>> include: >>>>>>=20 >>>>>> commit c3f4688a08f (ceph: don't set req->r_locked_dir in = ceph_d_revalidate) >>>>>> commit 5eb9f6040f3 (ceph: do a LOOKUP in d_revalidate instead of = GETATTR) >>>>>>=20 >>>>>> The first commit can cause this issue, the second one fixes it. >>>>>>=20 >>>>>> Regards >>>>>> Yan, Zheng >>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> 2017-10-20 10:55 GMT+08:00 Yan, Zheng : >>>>>>>> On Fri, Oct 20, 2017 at 12:49 AM, Xiaoxi Chen = wrote: >>>>>>>>> Hi, >>>>>>>>>=20 >>>>>>>>> I am seeing a lot of lookup request when doing recursive = mkdir. >>>>>>>>> The workload behavior is like: >>>>>>>>> mkdir DIR0 >>>>>>>>> mkdir DIR0/DIR1 >>>>>>>>> mkdir DIR0/DIR1/DIR2 >>>>>>>>> .... >>>>>>>>> mkdir DIR0/DIR1/DIR2......./DIR7 >>>>>>>>> create DIR0/DIR1/DIR2......./DIR7/FILE1 >>>>>>>>>=20 >>>>>>>>> and concurrently run on 50+ clients, the dir name in = different >>>>>>>>> clients may or maynot be the same. >>>>>>>>>=20 >>>>>>>>> from the admin socket I was seeing ~50K create requests, = but >>>>>>>>> got 400K lookup requests. The lookup eat up most of the mds = capability >>>>>>>>> so file create is slow. >>>>>>>>>=20 >>>>>>>>> Where is the lookup comes from and can we have anyway to >>>>>>>>> optimize it out ? >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>> I don't see this behavior when running following commands in = 4.13 >>>>>>>> kernel client and luminous version ceph-fuse. which client do = you use? >>>>>>>>=20 >>>>>>>> mkdir d1 >>>>>>>> mkdir d1/d2 >>>>>>>> mkdir d1/d2/d3 >>>>>>>> mkdir d1/d2/d3/d4/ >>>>>>>> mkdir d1/d2/d3/d4/d5 >>>>>>>> touch d1/d2/d3/d4/d5/f >>>>>>>>=20 >>>>>>>>> Xiaoxi > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" = in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html