* Huge lookup when recursively mkdir @ 2017-10-19 16:49 Xiaoxi Chen 2017-10-20 2:55 ` Yan, Zheng 0 siblings, 1 reply; 11+ messages in thread From: Xiaoxi Chen @ 2017-10-19 16:49 UTC (permalink / raw) To: Ceph Development, John Spray, Yan, Zheng Hi, I am seeing a lot of lookup request when doing recursive mkdir. The workload behavior is like: mkdir DIR0 mkdir DIR0/DIR1 mkdir DIR0/DIR1/DIR2 .... mkdir DIR0/DIR1/DIR2......./DIR7 create DIR0/DIR1/DIR2......./DIR7/FILE1 and concurrently run on 50+ clients, the dir name in different clients may or maynot be the same. from the admin socket I was seeing ~50K create requests, but got 400K lookup requests. The lookup eat up most of the mds capability so file create is slow. Where is the lookup comes from and can we have anyway to optimize it out ? Xiaoxi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge lookup when recursively mkdir 2017-10-19 16:49 Huge lookup when recursively mkdir Xiaoxi Chen @ 2017-10-20 2:55 ` Yan, Zheng 2017-10-20 7:28 ` Xiaoxi Chen 0 siblings, 1 reply; 11+ messages in thread From: Yan, Zheng @ 2017-10-20 2:55 UTC (permalink / raw) To: Xiaoxi Chen; +Cc: Ceph Development, John Spray On Fri, Oct 20, 2017 at 12:49 AM, Xiaoxi Chen <superdebuger@gmail.com> wrote: > Hi, > > I am seeing a lot of lookup request when doing recursive mkdir. > The workload behavior is like: > mkdir DIR0 > mkdir DIR0/DIR1 > mkdir DIR0/DIR1/DIR2 > .... > mkdir DIR0/DIR1/DIR2......./DIR7 > create DIR0/DIR1/DIR2......./DIR7/FILE1 > > and concurrently run on 50+ clients, the dir name in different > clients may or maynot be the same. > > from the admin socket I was seeing ~50K create requests, but > got 400K lookup requests. The lookup eat up most of the mds capability > so file create is slow. > > Where is the lookup comes from and can we have anyway to > optimize it out ? > I don't see this behavior when running following commands in 4.13 kernel client and luminous version ceph-fuse. which client do you use? mkdir d1 mkdir d1/d2 mkdir d1/d2/d3 mkdir d1/d2/d3/d4/ mkdir d1/d2/d3/d4/d5 touch d1/d2/d3/d4/d5/f > Xiaoxi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge lookup when recursively mkdir 2017-10-20 2:55 ` Yan, Zheng @ 2017-10-20 7:28 ` Xiaoxi Chen 2017-10-20 8:39 ` Yan, Zheng 0 siblings, 1 reply; 11+ messages in thread From: Xiaoxi Chen @ 2017-10-20 7:28 UTC (permalink / raw) To: Yan, Zheng; +Cc: Ceph Development, John Spray Centos 7.3, kernel version 3.10.0-514.26.2.el7.x86_64. I extract the logical of file creation in our workload into a reproducer , like below. Concurrently run the reproducer in 2+ node can see a lots of lookup OP. I thought the lookup is to open the directory tree so I tried to pre-make most of the dirs , use ls -i trying to read the dentries and cache it, then re-run the reproducer, seems nothing different.. #include <sys/stat.h> #include <fcntl.h> int create_file(char * base, int count, int max, int depth) { int i; for(i=0; i<count; i++) { char dir[256]; int mydir = rand() % max; sprintf(dir, "%s/%d", path, mydir); if (depth >=1) { mkdir(dir,0777); create_dir(dir, count, max, depth - 1); } else { int fd = open(dir, O_CREAT | O_EXCL| O_WRONLY , 0666); printf("opened path : %s = %d\n", path, fd); close(fd); } } } int main(int argc, char argv[]) { char path[256]; while(1) { create_file("/import/SQL01", 1, 4 ,10); } } 2017-10-20 10:55 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: > On Fri, Oct 20, 2017 at 12:49 AM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >> Hi, >> >> I am seeing a lot of lookup request when doing recursive mkdir. >> The workload behavior is like: >> mkdir DIR0 >> mkdir DIR0/DIR1 >> mkdir DIR0/DIR1/DIR2 >> .... >> mkdir DIR0/DIR1/DIR2......./DIR7 >> create DIR0/DIR1/DIR2......./DIR7/FILE1 >> >> and concurrently run on 50+ clients, the dir name in different >> clients may or maynot be the same. >> >> from the admin socket I was seeing ~50K create requests, but >> got 400K lookup requests. The lookup eat up most of the mds capability >> so file create is slow. >> >> Where is the lookup comes from and can we have anyway to >> optimize it out ? >> > > I don't see this behavior when running following commands in 4.13 > kernel client and luminous version ceph-fuse. which client do you use? > > mkdir d1 > mkdir d1/d2 > mkdir d1/d2/d3 > mkdir d1/d2/d3/d4/ > mkdir d1/d2/d3/d4/d5 > touch d1/d2/d3/d4/d5/f > >> Xiaoxi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge lookup when recursively mkdir 2017-10-20 7:28 ` Xiaoxi Chen @ 2017-10-20 8:39 ` Yan, Zheng 2017-10-20 16:54 ` Xiaoxi Chen 0 siblings, 1 reply; 11+ messages in thread From: Yan, Zheng @ 2017-10-20 8:39 UTC (permalink / raw) To: Xiaoxi Chen; +Cc: Ceph Development, John Spray On Fri, Oct 20, 2017 at 3:28 PM, Xiaoxi Chen <superdebuger@gmail.com> wrote: > Centos 7.3, kernel version 3.10.0-514.26.2.el7.x86_64. > > I extract the logical of file creation in our workload into a > reproducer , like below. > > Concurrently run the reproducer in 2+ node can see a lots of lookup OP. > I thought the lookup is to open the directory tree so I tried to > pre-make most of the dirs , use ls -i trying to read the dentries and > cache it, then re-run the reproducer, seems nothing different.. > > #include <sys/stat.h> > #include <fcntl.h> > int create_file(char * base, int count, int max, int depth) > { > int i; > for(i=0; i<count; i++) { > char dir[256]; > int mydir = rand() % max; > sprintf(dir, "%s/%d", path, mydir); > if (depth >=1) { > mkdir(dir,0777); > create_dir(dir, count, max, depth - 1); > } else { > int fd = open(dir, O_CREAT | O_EXCL| O_WRONLY , 0666); > printf("opened path : %s = %d\n", path, fd); > close(fd); > } > } > } > int main(int argc, char argv[]) > { > char path[256]; > while(1) { > create_file("/import/SQL01", 1, 4 ,10); > } > } > still don't see this behavior on 4.13 kernel. I suspect there is something wrong with dentry lease. please check if your kernel include: commit c3f4688a08f (ceph: don't set req->r_locked_dir in ceph_d_revalidate) commit 5eb9f6040f3 (ceph: do a LOOKUP in d_revalidate instead of GETATTR) The first commit can cause this issue, the second one fixes it. Regards Yan, Zheng > > > 2017-10-20 10:55 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >> On Fri, Oct 20, 2017 at 12:49 AM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>> Hi, >>> >>> I am seeing a lot of lookup request when doing recursive mkdir. >>> The workload behavior is like: >>> mkdir DIR0 >>> mkdir DIR0/DIR1 >>> mkdir DIR0/DIR1/DIR2 >>> .... >>> mkdir DIR0/DIR1/DIR2......./DIR7 >>> create DIR0/DIR1/DIR2......./DIR7/FILE1 >>> >>> and concurrently run on 50+ clients, the dir name in different >>> clients may or maynot be the same. >>> >>> from the admin socket I was seeing ~50K create requests, but >>> got 400K lookup requests. The lookup eat up most of the mds capability >>> so file create is slow. >>> >>> Where is the lookup comes from and can we have anyway to >>> optimize it out ? >>> >> >> I don't see this behavior when running following commands in 4.13 >> kernel client and luminous version ceph-fuse. which client do you use? >> >> mkdir d1 >> mkdir d1/d2 >> mkdir d1/d2/d3 >> mkdir d1/d2/d3/d4/ >> mkdir d1/d2/d3/d4/d5 >> touch d1/d2/d3/d4/d5/f >> >>> Xiaoxi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge lookup when recursively mkdir 2017-10-20 8:39 ` Yan, Zheng @ 2017-10-20 16:54 ` Xiaoxi Chen 2017-10-20 18:09 ` Xiaoxi Chen 0 siblings, 1 reply; 11+ messages in thread From: Xiaoxi Chen @ 2017-10-20 16:54 UTC (permalink / raw) To: Yan, Zheng; +Cc: Ceph Development, John Spray Thanks, will check. A general question, does cephfs kernel client drop dentries/inode cache aggressively? What I know is if MDS issue CEPH_SESSION_RECALL_STATE, client will drop, but is there other cases client will drop cache? 2017-10-20 16:39 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: > On Fri, Oct 20, 2017 at 3:28 PM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >> Centos 7.3, kernel version 3.10.0-514.26.2.el7.x86_64. >> >> I extract the logical of file creation in our workload into a >> reproducer , like below. >> >> Concurrently run the reproducer in 2+ node can see a lots of lookup OP. >> I thought the lookup is to open the directory tree so I tried to >> pre-make most of the dirs , use ls -i trying to read the dentries and >> cache it, then re-run the reproducer, seems nothing different.. >> >> #include <sys/stat.h> >> #include <fcntl.h> >> int create_file(char * base, int count, int max, int depth) >> { >> int i; >> for(i=0; i<count; i++) { >> char dir[256]; >> int mydir = rand() % max; >> sprintf(dir, "%s/%d", path, mydir); >> if (depth >=1) { >> mkdir(dir,0777); >> create_dir(dir, count, max, depth - 1); >> } else { >> int fd = open(dir, O_CREAT | O_EXCL| O_WRONLY , 0666); >> printf("opened path : %s = %d\n", path, fd); >> close(fd); >> } >> } >> } >> int main(int argc, char argv[]) >> { >> char path[256]; >> while(1) { >> create_file("/import/SQL01", 1, 4 ,10); >> } >> } >> > > still don't see this behavior on 4.13 kernel. I suspect there is > something wrong with dentry lease. please check if your kernel > include: > > commit c3f4688a08f (ceph: don't set req->r_locked_dir in ceph_d_revalidate) > commit 5eb9f6040f3 (ceph: do a LOOKUP in d_revalidate instead of GETATTR) > > The first commit can cause this issue, the second one fixes it. > > Regards > Yan, Zheng > >> >> >> 2017-10-20 10:55 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >>> On Fri, Oct 20, 2017 at 12:49 AM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>>> Hi, >>>> >>>> I am seeing a lot of lookup request when doing recursive mkdir. >>>> The workload behavior is like: >>>> mkdir DIR0 >>>> mkdir DIR0/DIR1 >>>> mkdir DIR0/DIR1/DIR2 >>>> .... >>>> mkdir DIR0/DIR1/DIR2......./DIR7 >>>> create DIR0/DIR1/DIR2......./DIR7/FILE1 >>>> >>>> and concurrently run on 50+ clients, the dir name in different >>>> clients may or maynot be the same. >>>> >>>> from the admin socket I was seeing ~50K create requests, but >>>> got 400K lookup requests. The lookup eat up most of the mds capability >>>> so file create is slow. >>>> >>>> Where is the lookup comes from and can we have anyway to >>>> optimize it out ? >>>> >>> >>> I don't see this behavior when running following commands in 4.13 >>> kernel client and luminous version ceph-fuse. which client do you use? >>> >>> mkdir d1 >>> mkdir d1/d2 >>> mkdir d1/d2/d3 >>> mkdir d1/d2/d3/d4/ >>> mkdir d1/d2/d3/d4/d5 >>> touch d1/d2/d3/d4/d5/f >>> >>>> Xiaoxi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge lookup when recursively mkdir 2017-10-20 16:54 ` Xiaoxi Chen @ 2017-10-20 18:09 ` Xiaoxi Chen 2017-10-22 15:27 ` Xiaoxi Chen 0 siblings, 1 reply; 11+ messages in thread From: Xiaoxi Chen @ 2017-10-20 18:09 UTC (permalink / raw) To: Yan, Zheng; +Cc: Ceph Development, John Spray @Zheng, my kernel doesn't even has c3f4688a08f. But 200fd27 ("ceph: use lookup request to revalidate dentry") is there. 2017-10-21 0:54 GMT+08:00 Xiaoxi Chen <superdebuger@gmail.com>: > Thanks, will check. > > A general question, does cephfs kernel client drop dentries/inode > cache aggressively? What I know is if MDS issue > CEPH_SESSION_RECALL_STATE, client will drop, but is there other cases > client will drop cache? > > > > 2017-10-20 16:39 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >> On Fri, Oct 20, 2017 at 3:28 PM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>> Centos 7.3, kernel version 3.10.0-514.26.2.el7.x86_64. >>> >>> I extract the logical of file creation in our workload into a >>> reproducer , like below. >>> >>> Concurrently run the reproducer in 2+ node can see a lots of lookup OP. >>> I thought the lookup is to open the directory tree so I tried to >>> pre-make most of the dirs , use ls -i trying to read the dentries and >>> cache it, then re-run the reproducer, seems nothing different.. >>> >>> #include <sys/stat.h> >>> #include <fcntl.h> >>> int create_file(char * base, int count, int max, int depth) >>> { >>> int i; >>> for(i=0; i<count; i++) { >>> char dir[256]; >>> int mydir = rand() % max; >>> sprintf(dir, "%s/%d", path, mydir); >>> if (depth >=1) { >>> mkdir(dir,0777); >>> create_dir(dir, count, max, depth - 1); >>> } else { >>> int fd = open(dir, O_CREAT | O_EXCL| O_WRONLY , 0666); >>> printf("opened path : %s = %d\n", path, fd); >>> close(fd); >>> } >>> } >>> } >>> int main(int argc, char argv[]) >>> { >>> char path[256]; >>> while(1) { >>> create_file("/import/SQL01", 1, 4 ,10); >>> } >>> } >>> >> >> still don't see this behavior on 4.13 kernel. I suspect there is >> something wrong with dentry lease. please check if your kernel >> include: >> >> commit c3f4688a08f (ceph: don't set req->r_locked_dir in ceph_d_revalidate) >> commit 5eb9f6040f3 (ceph: do a LOOKUP in d_revalidate instead of GETATTR) >> >> The first commit can cause this issue, the second one fixes it. >> >> Regards >> Yan, Zheng >> >>> >>> >>> 2017-10-20 10:55 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >>>> On Fri, Oct 20, 2017 at 12:49 AM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>>>> Hi, >>>>> >>>>> I am seeing a lot of lookup request when doing recursive mkdir. >>>>> The workload behavior is like: >>>>> mkdir DIR0 >>>>> mkdir DIR0/DIR1 >>>>> mkdir DIR0/DIR1/DIR2 >>>>> .... >>>>> mkdir DIR0/DIR1/DIR2......./DIR7 >>>>> create DIR0/DIR1/DIR2......./DIR7/FILE1 >>>>> >>>>> and concurrently run on 50+ clients, the dir name in different >>>>> clients may or maynot be the same. >>>>> >>>>> from the admin socket I was seeing ~50K create requests, but >>>>> got 400K lookup requests. The lookup eat up most of the mds capability >>>>> so file create is slow. >>>>> >>>>> Where is the lookup comes from and can we have anyway to >>>>> optimize it out ? >>>>> >>>> >>>> I don't see this behavior when running following commands in 4.13 >>>> kernel client and luminous version ceph-fuse. which client do you use? >>>> >>>> mkdir d1 >>>> mkdir d1/d2 >>>> mkdir d1/d2/d3 >>>> mkdir d1/d2/d3/d4/ >>>> mkdir d1/d2/d3/d4/d5 >>>> touch d1/d2/d3/d4/d5/f >>>> >>>>> Xiaoxi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge lookup when recursively mkdir 2017-10-20 18:09 ` Xiaoxi Chen @ 2017-10-22 15:27 ` Xiaoxi Chen 2017-10-23 0:44 ` Yan, Zheng 0 siblings, 1 reply; 11+ messages in thread From: Xiaoxi Chen @ 2017-10-22 15:27 UTC (permalink / raw) To: Yan, Zheng; +Cc: Ceph Development, John Spray To add another data point, switched to ceph-fuse 12.2.0, still seeing lots of lookup. lookup avg 1892 mkdir avg 367 create avg 222 open avg 228 2017-10-21 2:09 GMT+08:00 Xiaoxi Chen <superdebuger@gmail.com>: > @Zheng, my kernel doesn't even has c3f4688a08f. But 200fd27 ("ceph: > use lookup request to revalidate dentry") is there. > > 2017-10-21 0:54 GMT+08:00 Xiaoxi Chen <superdebuger@gmail.com>: >> Thanks, will check. >> >> A general question, does cephfs kernel client drop dentries/inode >> cache aggressively? What I know is if MDS issue >> CEPH_SESSION_RECALL_STATE, client will drop, but is there other cases >> client will drop cache? >> >> >> >> 2017-10-20 16:39 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >>> On Fri, Oct 20, 2017 at 3:28 PM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>>> Centos 7.3, kernel version 3.10.0-514.26.2.el7.x86_64. >>>> >>>> I extract the logical of file creation in our workload into a >>>> reproducer , like below. >>>> >>>> Concurrently run the reproducer in 2+ node can see a lots of lookup OP. >>>> I thought the lookup is to open the directory tree so I tried to >>>> pre-make most of the dirs , use ls -i trying to read the dentries and >>>> cache it, then re-run the reproducer, seems nothing different.. >>>> >>>> #include <sys/stat.h> >>>> #include <fcntl.h> >>>> int create_file(char * base, int count, int max, int depth) >>>> { >>>> int i; >>>> for(i=0; i<count; i++) { >>>> char dir[256]; >>>> int mydir = rand() % max; >>>> sprintf(dir, "%s/%d", path, mydir); >>>> if (depth >=1) { >>>> mkdir(dir,0777); >>>> create_dir(dir, count, max, depth - 1); >>>> } else { >>>> int fd = open(dir, O_CREAT | O_EXCL| O_WRONLY , 0666); >>>> printf("opened path : %s = %d\n", path, fd); >>>> close(fd); >>>> } >>>> } >>>> } >>>> int main(int argc, char argv[]) >>>> { >>>> char path[256]; >>>> while(1) { >>>> create_file("/import/SQL01", 1, 4 ,10); >>>> } >>>> } >>>> >>> >>> still don't see this behavior on 4.13 kernel. I suspect there is >>> something wrong with dentry lease. please check if your kernel >>> include: >>> >>> commit c3f4688a08f (ceph: don't set req->r_locked_dir in ceph_d_revalidate) >>> commit 5eb9f6040f3 (ceph: do a LOOKUP in d_revalidate instead of GETATTR) >>> >>> The first commit can cause this issue, the second one fixes it. >>> >>> Regards >>> Yan, Zheng >>> >>>> >>>> >>>> 2017-10-20 10:55 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >>>>> On Fri, Oct 20, 2017 at 12:49 AM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>>>>> Hi, >>>>>> >>>>>> I am seeing a lot of lookup request when doing recursive mkdir. >>>>>> The workload behavior is like: >>>>>> mkdir DIR0 >>>>>> mkdir DIR0/DIR1 >>>>>> mkdir DIR0/DIR1/DIR2 >>>>>> .... >>>>>> mkdir DIR0/DIR1/DIR2......./DIR7 >>>>>> create DIR0/DIR1/DIR2......./DIR7/FILE1 >>>>>> >>>>>> and concurrently run on 50+ clients, the dir name in different >>>>>> clients may or maynot be the same. >>>>>> >>>>>> from the admin socket I was seeing ~50K create requests, but >>>>>> got 400K lookup requests. The lookup eat up most of the mds capability >>>>>> so file create is slow. >>>>>> >>>>>> Where is the lookup comes from and can we have anyway to >>>>>> optimize it out ? >>>>>> >>>>> >>>>> I don't see this behavior when running following commands in 4.13 >>>>> kernel client and luminous version ceph-fuse. which client do you use? >>>>> >>>>> mkdir d1 >>>>> mkdir d1/d2 >>>>> mkdir d1/d2/d3 >>>>> mkdir d1/d2/d3/d4/ >>>>> mkdir d1/d2/d3/d4/d5 >>>>> touch d1/d2/d3/d4/d5/f >>>>> >>>>>> Xiaoxi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge lookup when recursively mkdir 2017-10-22 15:27 ` Xiaoxi Chen @ 2017-10-23 0:44 ` Yan, Zheng 2017-10-23 6:01 ` Xiaoxi Chen 0 siblings, 1 reply; 11+ messages in thread From: Yan, Zheng @ 2017-10-23 0:44 UTC (permalink / raw) To: Xiaoxi Chen; +Cc: Ceph Development, John Spray On Sun, Oct 22, 2017 at 11:27 PM, Xiaoxi Chen <superdebuger@gmail.com> wrote: > To add another data point, switched to ceph-fuse 12.2.0, still seeing > lots of lookup. > lookup avg 1892 > mkdir avg 367 > create avg 222 > open avg 228 > But in your test, mkdir avg was about 1.5 times of open avg. I think your test created millions of directories, lookups were from cache miss. You can try enlarging client_cache_size. But I don't think it will help much when active set of directory are so large. > 2017-10-21 2:09 GMT+08:00 Xiaoxi Chen <superdebuger@gmail.com>: >> @Zheng, my kernel doesn't even has c3f4688a08f. But 200fd27 ("ceph: >> use lookup request to revalidate dentry") is there. >> >> 2017-10-21 0:54 GMT+08:00 Xiaoxi Chen <superdebuger@gmail.com>: >>> Thanks, will check. >>> >>> A general question, does cephfs kernel client drop dentries/inode >>> cache aggressively? What I know is if MDS issue >>> CEPH_SESSION_RECALL_STATE, client will drop, but is there other cases >>> client will drop cache? >>> >>> >>> >>> 2017-10-20 16:39 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >>>> On Fri, Oct 20, 2017 at 3:28 PM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>>>> Centos 7.3, kernel version 3.10.0-514.26.2.el7.x86_64. >>>>> >>>>> I extract the logical of file creation in our workload into a >>>>> reproducer , like below. >>>>> >>>>> Concurrently run the reproducer in 2+ node can see a lots of lookup OP. >>>>> I thought the lookup is to open the directory tree so I tried to >>>>> pre-make most of the dirs , use ls -i trying to read the dentries and >>>>> cache it, then re-run the reproducer, seems nothing different.. >>>>> >>>>> #include <sys/stat.h> >>>>> #include <fcntl.h> >>>>> int create_file(char * base, int count, int max, int depth) >>>>> { >>>>> int i; >>>>> for(i=0; i<count; i++) { >>>>> char dir[256]; >>>>> int mydir = rand() % max; >>>>> sprintf(dir, "%s/%d", path, mydir); >>>>> if (depth >=1) { >>>>> mkdir(dir,0777); >>>>> create_dir(dir, count, max, depth - 1); >>>>> } else { >>>>> int fd = open(dir, O_CREAT | O_EXCL| O_WRONLY , 0666); >>>>> printf("opened path : %s = %d\n", path, fd); >>>>> close(fd); >>>>> } >>>>> } >>>>> } >>>>> int main(int argc, char argv[]) >>>>> { >>>>> char path[256]; >>>>> while(1) { >>>>> create_file("/import/SQL01", 1, 4 ,10); >>>>> } >>>>> } >>>>> >>>> >>>> still don't see this behavior on 4.13 kernel. I suspect there is >>>> something wrong with dentry lease. please check if your kernel >>>> include: >>>> >>>> commit c3f4688a08f (ceph: don't set req->r_locked_dir in ceph_d_revalidate) >>>> commit 5eb9f6040f3 (ceph: do a LOOKUP in d_revalidate instead of GETATTR) >>>> >>>> The first commit can cause this issue, the second one fixes it. >>>> >>>> Regards >>>> Yan, Zheng >>>> >>>>> >>>>> >>>>> 2017-10-20 10:55 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >>>>>> On Fri, Oct 20, 2017 at 12:49 AM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I am seeing a lot of lookup request when doing recursive mkdir. >>>>>>> The workload behavior is like: >>>>>>> mkdir DIR0 >>>>>>> mkdir DIR0/DIR1 >>>>>>> mkdir DIR0/DIR1/DIR2 >>>>>>> .... >>>>>>> mkdir DIR0/DIR1/DIR2......./DIR7 >>>>>>> create DIR0/DIR1/DIR2......./DIR7/FILE1 >>>>>>> >>>>>>> and concurrently run on 50+ clients, the dir name in different >>>>>>> clients may or maynot be the same. >>>>>>> >>>>>>> from the admin socket I was seeing ~50K create requests, but >>>>>>> got 400K lookup requests. The lookup eat up most of the mds capability >>>>>>> so file create is slow. >>>>>>> >>>>>>> Where is the lookup comes from and can we have anyway to >>>>>>> optimize it out ? >>>>>>> >>>>>> >>>>>> I don't see this behavior when running following commands in 4.13 >>>>>> kernel client and luminous version ceph-fuse. which client do you use? >>>>>> >>>>>> mkdir d1 >>>>>> mkdir d1/d2 >>>>>> mkdir d1/d2/d3 >>>>>> mkdir d1/d2/d3/d4/ >>>>>> mkdir d1/d2/d3/d4/d5 >>>>>> touch d1/d2/d3/d4/d5/f >>>>>> >>>>>>> Xiaoxi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge lookup when recursively mkdir 2017-10-23 0:44 ` Yan, Zheng @ 2017-10-23 6:01 ` Xiaoxi Chen 2017-10-23 6:58 ` 陶冬冬 2017-10-23 7:41 ` Yan, Zheng 0 siblings, 2 replies; 11+ messages in thread From: Xiaoxi Chen @ 2017-10-23 6:01 UTC (permalink / raw) To: Yan, Zheng; +Cc: Ceph Development, John Spray Yes, actually lots of (50+) clients are trying to create the same large directory tree concurrently. So the behavior is most of the mkdir will get -EEXISTS. not very understand how the mkdir call from application level be finally turned into lookup in MDS? could you please explain a bit more ? 2017-10-23 8:44 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: > On Sun, Oct 22, 2017 at 11:27 PM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >> To add another data point, switched to ceph-fuse 12.2.0, still seeing >> lots of lookup. >> lookup avg 1892 >> mkdir avg 367 >> create avg 222 >> open avg 228 >> > > But in your test, mkdir avg was about 1.5 times of open avg. I think > your test created millions of directories, lookups were from cache > miss. You can try enlarging client_cache_size. But I don't think it > will help much when active set of directory are so large. > > >> 2017-10-21 2:09 GMT+08:00 Xiaoxi Chen <superdebuger@gmail.com>: >>> @Zheng, my kernel doesn't even has c3f4688a08f. But 200fd27 ("ceph: >>> use lookup request to revalidate dentry") is there. >>> >>> 2017-10-21 0:54 GMT+08:00 Xiaoxi Chen <superdebuger@gmail.com>: >>>> Thanks, will check. >>>> >>>> A general question, does cephfs kernel client drop dentries/inode >>>> cache aggressively? What I know is if MDS issue >>>> CEPH_SESSION_RECALL_STATE, client will drop, but is there other cases >>>> client will drop cache? >>>> >>>> >>>> >>>> 2017-10-20 16:39 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >>>>> On Fri, Oct 20, 2017 at 3:28 PM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>>>>> Centos 7.3, kernel version 3.10.0-514.26.2.el7.x86_64. >>>>>> >>>>>> I extract the logical of file creation in our workload into a >>>>>> reproducer , like below. >>>>>> >>>>>> Concurrently run the reproducer in 2+ node can see a lots of lookup OP. >>>>>> I thought the lookup is to open the directory tree so I tried to >>>>>> pre-make most of the dirs , use ls -i trying to read the dentries and >>>>>> cache it, then re-run the reproducer, seems nothing different.. >>>>>> >>>>>> #include <sys/stat.h> >>>>>> #include <fcntl.h> >>>>>> int create_file(char * base, int count, int max, int depth) >>>>>> { >>>>>> int i; >>>>>> for(i=0; i<count; i++) { >>>>>> char dir[256]; >>>>>> int mydir = rand() % max; >>>>>> sprintf(dir, "%s/%d", path, mydir); >>>>>> if (depth >=1) { >>>>>> mkdir(dir,0777); >>>>>> create_dir(dir, count, max, depth - 1); >>>>>> } else { >>>>>> int fd = open(dir, O_CREAT | O_EXCL| O_WRONLY , 0666); >>>>>> printf("opened path : %s = %d\n", path, fd); >>>>>> close(fd); >>>>>> } >>>>>> } >>>>>> } >>>>>> int main(int argc, char argv[]) >>>>>> { >>>>>> char path[256]; >>>>>> while(1) { >>>>>> create_file("/import/SQL01", 1, 4 ,10); >>>>>> } >>>>>> } >>>>>> >>>>> >>>>> still don't see this behavior on 4.13 kernel. I suspect there is >>>>> something wrong with dentry lease. please check if your kernel >>>>> include: >>>>> >>>>> commit c3f4688a08f (ceph: don't set req->r_locked_dir in ceph_d_revalidate) >>>>> commit 5eb9f6040f3 (ceph: do a LOOKUP in d_revalidate instead of GETATTR) >>>>> >>>>> The first commit can cause this issue, the second one fixes it. >>>>> >>>>> Regards >>>>> Yan, Zheng >>>>> >>>>>> >>>>>> >>>>>> 2017-10-20 10:55 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >>>>>>> On Fri, Oct 20, 2017 at 12:49 AM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am seeing a lot of lookup request when doing recursive mkdir. >>>>>>>> The workload behavior is like: >>>>>>>> mkdir DIR0 >>>>>>>> mkdir DIR0/DIR1 >>>>>>>> mkdir DIR0/DIR1/DIR2 >>>>>>>> .... >>>>>>>> mkdir DIR0/DIR1/DIR2......./DIR7 >>>>>>>> create DIR0/DIR1/DIR2......./DIR7/FILE1 >>>>>>>> >>>>>>>> and concurrently run on 50+ clients, the dir name in different >>>>>>>> clients may or maynot be the same. >>>>>>>> >>>>>>>> from the admin socket I was seeing ~50K create requests, but >>>>>>>> got 400K lookup requests. The lookup eat up most of the mds capability >>>>>>>> so file create is slow. >>>>>>>> >>>>>>>> Where is the lookup comes from and can we have anyway to >>>>>>>> optimize it out ? >>>>>>>> >>>>>>> >>>>>>> I don't see this behavior when running following commands in 4.13 >>>>>>> kernel client and luminous version ceph-fuse. which client do you use? >>>>>>> >>>>>>> mkdir d1 >>>>>>> mkdir d1/d2 >>>>>>> mkdir d1/d2/d3 >>>>>>> mkdir d1/d2/d3/d4/ >>>>>>> mkdir d1/d2/d3/d4/d5 >>>>>>> touch d1/d2/d3/d4/d5/f >>>>>>> >>>>>>>> Xiaoxi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge lookup when recursively mkdir 2017-10-23 6:01 ` Xiaoxi Chen @ 2017-10-23 6:58 ` 陶冬冬 2017-10-23 7:41 ` Yan, Zheng 1 sibling, 0 replies; 11+ messages in thread From: 陶冬冬 @ 2017-10-23 6:58 UTC (permalink / raw) To: Xiaoxi Chen; +Cc: Yan, Zheng, Ceph Development, John Spray Hi Xiaoxi , As i know, mkdir -p “/a/b/c/d/e” client would first lookup the existed path , and do truly mkdir to the left part. Thanks, Dongdong. > 在 2017年10月23日,下午2:01,Xiaoxi Chen <superdebuger@gmail.com> 写道: > > Yes, actually lots of (50+) clients are trying to create the same > large directory tree concurrently. So the behavior is most of the > mkdir will get -EEXISTS. > > not very understand how the mkdir call from application level be > finally turned into lookup in MDS? could you please explain a bit > more ? > > 2017-10-23 8:44 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >> On Sun, Oct 22, 2017 at 11:27 PM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>> To add another data point, switched to ceph-fuse 12.2.0, still seeing >>> lots of lookup. >>> lookup avg 1892 >>> mkdir avg 367 >>> create avg 222 >>> open avg 228 >>> >> >> But in your test, mkdir avg was about 1.5 times of open avg. I think >> your test created millions of directories, lookups were from cache >> miss. You can try enlarging client_cache_size. But I don't think it >> will help much when active set of directory are so large. >> >> >>> 2017-10-21 2:09 GMT+08:00 Xiaoxi Chen <superdebuger@gmail.com>: >>>> @Zheng, my kernel doesn't even has c3f4688a08f. But 200fd27 ("ceph: >>>> use lookup request to revalidate dentry") is there. >>>> >>>> 2017-10-21 0:54 GMT+08:00 Xiaoxi Chen <superdebuger@gmail.com>: >>>>> Thanks, will check. >>>>> >>>>> A general question, does cephfs kernel client drop dentries/inode >>>>> cache aggressively? What I know is if MDS issue >>>>> CEPH_SESSION_RECALL_STATE, client will drop, but is there other cases >>>>> client will drop cache? >>>>> >>>>> >>>>> >>>>> 2017-10-20 16:39 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >>>>>> On Fri, Oct 20, 2017 at 3:28 PM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>>>>>> Centos 7.3, kernel version 3.10.0-514.26.2.el7.x86_64. >>>>>>> >>>>>>> I extract the logical of file creation in our workload into a >>>>>>> reproducer , like below. >>>>>>> >>>>>>> Concurrently run the reproducer in 2+ node can see a lots of lookup OP. >>>>>>> I thought the lookup is to open the directory tree so I tried to >>>>>>> pre-make most of the dirs , use ls -i trying to read the dentries and >>>>>>> cache it, then re-run the reproducer, seems nothing different.. >>>>>>> >>>>>>> #include <sys/stat.h> >>>>>>> #include <fcntl.h> >>>>>>> int create_file(char * base, int count, int max, int depth) >>>>>>> { >>>>>>> int i; >>>>>>> for(i=0; i<count; i++) { >>>>>>> char dir[256]; >>>>>>> int mydir = rand() % max; >>>>>>> sprintf(dir, "%s/%d", path, mydir); >>>>>>> if (depth >=1) { >>>>>>> mkdir(dir,0777); >>>>>>> create_dir(dir, count, max, depth - 1); >>>>>>> } else { >>>>>>> int fd = open(dir, O_CREAT | O_EXCL| O_WRONLY , 0666); >>>>>>> printf("opened path : %s = %d\n", path, fd); >>>>>>> close(fd); >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> int main(int argc, char argv[]) >>>>>>> { >>>>>>> char path[256]; >>>>>>> while(1) { >>>>>>> create_file("/import/SQL01", 1, 4 ,10); >>>>>>> } >>>>>>> } >>>>>>> >>>>>> >>>>>> still don't see this behavior on 4.13 kernel. I suspect there is >>>>>> something wrong with dentry lease. please check if your kernel >>>>>> include: >>>>>> >>>>>> commit c3f4688a08f (ceph: don't set req->r_locked_dir in ceph_d_revalidate) >>>>>> commit 5eb9f6040f3 (ceph: do a LOOKUP in d_revalidate instead of GETATTR) >>>>>> >>>>>> The first commit can cause this issue, the second one fixes it. >>>>>> >>>>>> Regards >>>>>> Yan, Zheng >>>>>> >>>>>>> >>>>>>> >>>>>>> 2017-10-20 10:55 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >>>>>>>> On Fri, Oct 20, 2017 at 12:49 AM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am seeing a lot of lookup request when doing recursive mkdir. >>>>>>>>> The workload behavior is like: >>>>>>>>> mkdir DIR0 >>>>>>>>> mkdir DIR0/DIR1 >>>>>>>>> mkdir DIR0/DIR1/DIR2 >>>>>>>>> .... >>>>>>>>> mkdir DIR0/DIR1/DIR2......./DIR7 >>>>>>>>> create DIR0/DIR1/DIR2......./DIR7/FILE1 >>>>>>>>> >>>>>>>>> and concurrently run on 50+ clients, the dir name in different >>>>>>>>> clients may or maynot be the same. >>>>>>>>> >>>>>>>>> from the admin socket I was seeing ~50K create requests, but >>>>>>>>> got 400K lookup requests. The lookup eat up most of the mds capability >>>>>>>>> so file create is slow. >>>>>>>>> >>>>>>>>> Where is the lookup comes from and can we have anyway to >>>>>>>>> optimize it out ? >>>>>>>>> >>>>>>>> >>>>>>>> I don't see this behavior when running following commands in 4.13 >>>>>>>> kernel client and luminous version ceph-fuse. which client do you use? >>>>>>>> >>>>>>>> mkdir d1 >>>>>>>> mkdir d1/d2 >>>>>>>> mkdir d1/d2/d3 >>>>>>>> mkdir d1/d2/d3/d4/ >>>>>>>> mkdir d1/d2/d3/d4/d5 >>>>>>>> touch d1/d2/d3/d4/d5/f >>>>>>>> >>>>>>>>> Xiaoxi > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge lookup when recursively mkdir 2017-10-23 6:01 ` Xiaoxi Chen 2017-10-23 6:58 ` 陶冬冬 @ 2017-10-23 7:41 ` Yan, Zheng 1 sibling, 0 replies; 11+ messages in thread From: Yan, Zheng @ 2017-10-23 7:41 UTC (permalink / raw) To: Xiaoxi Chen; +Cc: Ceph Development, John Spray On Mon, Oct 23, 2017 at 2:01 PM, Xiaoxi Chen <superdebuger@gmail.com> wrote: > Yes, actually lots of (50+) clients are trying to create the same > large directory tree concurrently. So the behavior is most of the > mkdir will get -EEXISTS. > > not very understand how the mkdir call from application level be > finally turned into lookup in MDS? could you please explain a bit > more ? For 'mkdir a'. kernel first does 'lookup a'. if 'a' does not exists, does real mkdir, otherwise return -EEXIST. > > 2017-10-23 8:44 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >> On Sun, Oct 22, 2017 at 11:27 PM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>> To add another data point, switched to ceph-fuse 12.2.0, still seeing >>> lots of lookup. >>> lookup avg 1892 >>> mkdir avg 367 >>> create avg 222 >>> open avg 228 >>> >> >> But in your test, mkdir avg was about 1.5 times of open avg. I think >> your test created millions of directories, lookups were from cache >> miss. You can try enlarging client_cache_size. But I don't think it >> will help much when active set of directory are so large. >> >> >>> 2017-10-21 2:09 GMT+08:00 Xiaoxi Chen <superdebuger@gmail.com>: >>>> @Zheng, my kernel doesn't even has c3f4688a08f. But 200fd27 ("ceph: >>>> use lookup request to revalidate dentry") is there. >>>> >>>> 2017-10-21 0:54 GMT+08:00 Xiaoxi Chen <superdebuger@gmail.com>: >>>>> Thanks, will check. >>>>> >>>>> A general question, does cephfs kernel client drop dentries/inode >>>>> cache aggressively? What I know is if MDS issue >>>>> CEPH_SESSION_RECALL_STATE, client will drop, but is there other cases >>>>> client will drop cache? >>>>> >>>>> >>>>> >>>>> 2017-10-20 16:39 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >>>>>> On Fri, Oct 20, 2017 at 3:28 PM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>>>>>> Centos 7.3, kernel version 3.10.0-514.26.2.el7.x86_64. >>>>>>> >>>>>>> I extract the logical of file creation in our workload into a >>>>>>> reproducer , like below. >>>>>>> >>>>>>> Concurrently run the reproducer in 2+ node can see a lots of lookup OP. >>>>>>> I thought the lookup is to open the directory tree so I tried to >>>>>>> pre-make most of the dirs , use ls -i trying to read the dentries and >>>>>>> cache it, then re-run the reproducer, seems nothing different.. >>>>>>> >>>>>>> #include <sys/stat.h> >>>>>>> #include <fcntl.h> >>>>>>> int create_file(char * base, int count, int max, int depth) >>>>>>> { >>>>>>> int i; >>>>>>> for(i=0; i<count; i++) { >>>>>>> char dir[256]; >>>>>>> int mydir = rand() % max; >>>>>>> sprintf(dir, "%s/%d", path, mydir); >>>>>>> if (depth >=1) { >>>>>>> mkdir(dir,0777); >>>>>>> create_dir(dir, count, max, depth - 1); >>>>>>> } else { >>>>>>> int fd = open(dir, O_CREAT | O_EXCL| O_WRONLY , 0666); >>>>>>> printf("opened path : %s = %d\n", path, fd); >>>>>>> close(fd); >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> int main(int argc, char argv[]) >>>>>>> { >>>>>>> char path[256]; >>>>>>> while(1) { >>>>>>> create_file("/import/SQL01", 1, 4 ,10); >>>>>>> } >>>>>>> } >>>>>>> >>>>>> >>>>>> still don't see this behavior on 4.13 kernel. I suspect there is >>>>>> something wrong with dentry lease. please check if your kernel >>>>>> include: >>>>>> >>>>>> commit c3f4688a08f (ceph: don't set req->r_locked_dir in ceph_d_revalidate) >>>>>> commit 5eb9f6040f3 (ceph: do a LOOKUP in d_revalidate instead of GETATTR) >>>>>> >>>>>> The first commit can cause this issue, the second one fixes it. >>>>>> >>>>>> Regards >>>>>> Yan, Zheng >>>>>> >>>>>>> >>>>>>> >>>>>>> 2017-10-20 10:55 GMT+08:00 Yan, Zheng <ukernel@gmail.com>: >>>>>>>> On Fri, Oct 20, 2017 at 12:49 AM, Xiaoxi Chen <superdebuger@gmail.com> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am seeing a lot of lookup request when doing recursive mkdir. >>>>>>>>> The workload behavior is like: >>>>>>>>> mkdir DIR0 >>>>>>>>> mkdir DIR0/DIR1 >>>>>>>>> mkdir DIR0/DIR1/DIR2 >>>>>>>>> .... >>>>>>>>> mkdir DIR0/DIR1/DIR2......./DIR7 >>>>>>>>> create DIR0/DIR1/DIR2......./DIR7/FILE1 >>>>>>>>> >>>>>>>>> and concurrently run on 50+ clients, the dir name in different >>>>>>>>> clients may or maynot be the same. >>>>>>>>> >>>>>>>>> from the admin socket I was seeing ~50K create requests, but >>>>>>>>> got 400K lookup requests. The lookup eat up most of the mds capability >>>>>>>>> so file create is slow. >>>>>>>>> >>>>>>>>> Where is the lookup comes from and can we have anyway to >>>>>>>>> optimize it out ? >>>>>>>>> >>>>>>>> >>>>>>>> I don't see this behavior when running following commands in 4.13 >>>>>>>> kernel client and luminous version ceph-fuse. which client do you use? >>>>>>>> >>>>>>>> mkdir d1 >>>>>>>> mkdir d1/d2 >>>>>>>> mkdir d1/d2/d3 >>>>>>>> mkdir d1/d2/d3/d4/ >>>>>>>> mkdir d1/d2/d3/d4/d5 >>>>>>>> touch d1/d2/d3/d4/d5/f >>>>>>>> >>>>>>>>> Xiaoxi ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-10-23 7:41 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-10-19 16:49 Huge lookup when recursively mkdir Xiaoxi Chen 2017-10-20 2:55 ` Yan, Zheng 2017-10-20 7:28 ` Xiaoxi Chen 2017-10-20 8:39 ` Yan, Zheng 2017-10-20 16:54 ` Xiaoxi Chen 2017-10-20 18:09 ` Xiaoxi Chen 2017-10-22 15:27 ` Xiaoxi Chen 2017-10-23 0:44 ` Yan, Zheng 2017-10-23 6:01 ` Xiaoxi Chen 2017-10-23 6:58 ` 陶冬冬 2017-10-23 7:41 ` Yan, Zheng
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.