Sorry for the late. On 04/23, Joonsoo Kim wrote: >2018-04-13 10:18 GMT+09:00 Ye Xiaolong : >> On 04/12, Joonsoo Kim wrote: >>> >>>Really thanks for testing! >>> >>>Could you give me /proc/zoneinfo dumped before/after this test >>>for all these three commits? >> >> zoneinfo files for these three commits attached. Sample interval is 10s. > >Really thanks for giving the info. > >During last week, I studied hard on this problem and found a culprit. >Your report showed that there was some big changes on allocator's numa stat. >It seems that it's caused by classzone_idx. > >Assume the following setup that is the same with your machine. >Assume that my patch is applied so there is a movable zone on Node1. > >Node0 >DMA >DMA32 >Normal > >Node1 >Normal >Movable > >classzone_idx of GFP_HIGHUSER_MOVABLE allocation happened on node0 >will be 2 and classzone_idx of the same allocation happend on node1 will be 3. >So, different lowmem_reserve is applied to them and it causes bad numa >effects in allocator. > >I can't think a perfect solution now but following patch that removes >lowmem_reserve >for ZONE_MOVABLE will remove bad numa effect to your machine setup. > >I'm not sure whether performance will be restored or not by this patch, however, >at least, bad numa stat will be disappeared. (I observed in my test!!!) > >Could you test it? > >https://github.com/JoonsooKim/linux.git cma-fixup-thp-next-20180403 > >Related commits are: >2f54bc6 mm/page_alloc: workaround for node balance issue during allocation >f11152b mm/thp: don't count ZONE_MOVABLE as the target for freepage reserving >b2adb03 Revert "Revert "mm/cma: manage the memory of the CMA area by >using the ZONE_MOVABLE"" >76148e2 Revert "Revert "mm/page_alloc: don't reserve ZONE_HIGHMEM for >ZONE_MOVABLE request"" > >Base commit is '76148e2' and this regression happens on 'b2adb03'. >Fix is commit 'f11152b' and '2f54bc6'. > >Please also give me vmstat and zoneinfo. (before/after the test) > Here is the comparison of commit b2adb03 and commit 2f54bc6, seems the fix does help recover the performance. $ compare -at b2adb036011101120cd171db14f5f4f9e8b7938a 2f54bc68e487da5d5602629ff6e71c181d364698 -g fio-basic tests: 1 testcase/path_params/tbox_group/run: fio-basic/2pmem-ext4-200s-50%-tb-randread-2M-mmap-200G-performance/lkp-hsw-ep6 b2adb03601110112 2f54bc68e487da5d5602629ff6 ---------------- -------------------------- %stddev change %stddev \ | \ 9404 18% 11058 fio.read_bw_MBps 4702 18% 5529 fio.read_iops 52.93 28% 67.69 fio.latency_4ms% 4.709e+08 18% 5.552e+08 fio.time.major_page_faults 3.767e+09 18% 4.441e+09 fio.time.file_system_inputs 940450 18% 1105848 fio.workload 498 12% 558 fio.time.user_time 418315 11% 463880 fio.time.involuntary_context_switches 5103 5044 fio.time.system_time 11837 -6% 11115 fio.read_clat_stddev 25808 -7% 23936 fio.time.voluntary_context_switches 2322932 -8% 2134137 ± 3% fio.time.maximum_resident_set_size 23.38 -13% 20.32 ± 5% fio.latency_10ms% 5949 -15% 5059 fio.read_clat_mean_us 12864 -23% 9962 fio.read_clat_90%_us 14058 -23% 10816 fio.read_clat_95%_us 22485 -31% 15488 ± 8% fio.read_clat_99%_us 94150 ± 18% -34% 62254 ± 27% fio.time.minor_page_faults 0.39 ± 5% -44% 0.22 ± 12% fio.latency_1000us% 20.03 -53% 9.51 ± 4% fio.latency_20ms% 0.11 ± 11% -66% 0.04 ± 25% fio.latency_2ms% 1.38 ± 3% -67% 0.45 ± 23% fio.latency_50ms% 12591 34% 16884 ± 27% softirqs.NET_RX vmstat and zoneinfo for both commits attached. Thanks, Xiaolong > >There is another regression report from mainline related to this patch since >this patch is merged to mainline. > >http://lkml.kernel.org/r/<20180418010753.GA20825@yexl-desktop> >[lkp-robot] [mm/cma] a57a290bd3: vm-scalability.throughput -15.5% regression > >If above fixes works, could you test above regression on the mainline >kernel with >'2f54bc6 mm/page_alloc: workaround for node balance issue during allocation'? >commit 'f11152b mm/thp: don't count ZONE_MOVABLE as the target for >freepage reserving' >is already merged into mainline so only commit 2f54bc6 will be needed. > >Thanks.