linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Test report for kernel direct mapping performance
@ 2021-01-15  7:23 Xing Zhengjun
  2021-01-26 15:00 ` Michal Hocko
  0 siblings, 1 reply; 3+ messages in thread
From: Xing Zhengjun @ 2021-01-15  7:23 UTC (permalink / raw)
  To: linux-mm, LKML; +Cc: Dave Hansen, Tony, Tim C Chen, Huang, Ying, Du, Julie

Hi,

There is currently a bit of a debate about the kernel direct map. Does 
using 2M/1G pages aggressively for the kernel direct map help 
performance? Or, is it an old optimization which is not as helpful on 
modern CPUs as it was in the old days? What is the penalty of a kernel 
feature that heavily demotes this mapping from larger to smaller pages? 
We did a set of runs with 1G and 2M pages enabled /disabled and saw the 
changes.

[Conclusions]

Assuming that this was a good representative set of workloads and that 
the data are good, for server usage, we conclude that the existing 
aggressive use of 1G mappings is a good choice since it represents the 
best in a plurality of the workloads. However, in a *majority* of cases, 
another mapping size (2M or 4k) potentially offers a performance 
improvement. This leads us to conclude that although 1G mappings are a 
good default choice, there is no compelling evidence that it must be the 
only choice, or that folks deriving benefits (like hardening) from 
smaller mapping sizes should avoid the smaller mapping sizes.

[Summary of results]

1. The test was done on server platforms with 11 benchmarks. For the 4 
different server platforms tested, each with three different maximums 
kernel mapping sizes: 4k, 2M, and 1G. Each system has enough memory to 
effectively deploy 1G mappings.  For the 11 different benchmarks were 
used, not every benchmark was run on every system, there was a total of 
259 tests.

2. For each benchmark/system combination, the 1G mapping had the highest 
performance for 45% of the tests, 2M for ~30%, and 4k for~20%.

3. From the average delta, among 1G/2M/4K, 4K gets the lowest 
performance in all the 4 test machines, while 1G gets the best 
performance on 2 test machines and 2M gets the best performance on the 
other 2 machines.

4. By testing with machine memory from 256G to 512G, we observed that 
the larger memory will lead to the performance better for 1G page size. 
With Large memory, 
Will-it-scale/vm-scalability/unixbench/reaim/hackbench shows 1G has the 
best performance, while kbuild/memtier/netperf shows 4K has the best 
performance.

For more details please see the following web link:

https://01.org/sites/default/files/documentation/test_report_for_kernel_direct_mapping_performance_0.pdf

-- 
Zhengjun Xing

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Test report for kernel direct mapping performance
  2021-01-15  7:23 Test report for kernel direct mapping performance Xing Zhengjun
@ 2021-01-26 15:00 ` Michal Hocko
  2021-01-27  7:50   ` Xing Zhengjun
  0 siblings, 1 reply; 3+ messages in thread
From: Michal Hocko @ 2021-01-26 15:00 UTC (permalink / raw)
  To: Xing Zhengjun
  Cc: linux-mm, LKML, Dave Hansen, Tony, Tim C Chen, Huang, Ying, Du, Julie

On Fri 15-01-21 15:23:07, Xing Zhengjun wrote:
> Hi,
> 
> There is currently a bit of a debate about the kernel direct map. Does using
> 2M/1G pages aggressively for the kernel direct map help performance? Or, is
> it an old optimization which is not as helpful on modern CPUs as it was in
> the old days? What is the penalty of a kernel feature that heavily demotes
> this mapping from larger to smaller pages? We did a set of runs with 1G and
> 2M pages enabled /disabled and saw the changes.
> 
> [Conclusions]
> 
> Assuming that this was a good representative set of workloads and that the
> data are good, for server usage, we conclude that the existing aggressive
> use of 1G mappings is a good choice since it represents the best in a
> plurality of the workloads. However, in a *majority* of cases, another
> mapping size (2M or 4k) potentially offers a performance improvement. This
> leads us to conclude that although 1G mappings are a good default choice,
> there is no compelling evidence that it must be the only choice, or that
> folks deriving benefits (like hardening) from smaller mapping sizes should
> avoid the smaller mapping sizes.

Thanks for conducting these tests! This is definitely useful and quite
honestly I would have expected a much more noticeable differences.
Please note that I am not really deep into benchmarking but one thing
that popped in my mind was whethere these (micro)benchmarks are really
representative workloads. Some of them tend to be rather narrow in
executed code paths or data structures used AFAIU. Is it possible they
simply didn't generate sufficient TLB pressure?

Have you tried to look closer on profiles of respective configurations
where the overhead comes from?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Test report for kernel direct mapping performance
  2021-01-26 15:00 ` Michal Hocko
@ 2021-01-27  7:50   ` Xing Zhengjun
  0 siblings, 0 replies; 3+ messages in thread
From: Xing Zhengjun @ 2021-01-27  7:50 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, LKML, Dave Hansen, Tony, Tim C Chen, Huang, Ying, Du, Julie



On 1/26/2021 11:00 PM, Michal Hocko wrote:
> On Fri 15-01-21 15:23:07, Xing Zhengjun wrote:
>> Hi,
>>
>> There is currently a bit of a debate about the kernel direct map. Does using
>> 2M/1G pages aggressively for the kernel direct map help performance? Or, is
>> it an old optimization which is not as helpful on modern CPUs as it was in
>> the old days? What is the penalty of a kernel feature that heavily demotes
>> this mapping from larger to smaller pages? We did a set of runs with 1G and
>> 2M pages enabled /disabled and saw the changes.
>>
>> [Conclusions]
>>
>> Assuming that this was a good representative set of workloads and that the
>> data are good, for server usage, we conclude that the existing aggressive
>> use of 1G mappings is a good choice since it represents the best in a
>> plurality of the workloads. However, in a *majority* of cases, another
>> mapping size (2M or 4k) potentially offers a performance improvement. This
>> leads us to conclude that although 1G mappings are a good default choice,
>> there is no compelling evidence that it must be the only choice, or that
>> folks deriving benefits (like hardening) from smaller mapping sizes should
>> avoid the smaller mapping sizes.
> 
> Thanks for conducting these tests! This is definitely useful and quite
> honestly I would have expected a much more noticeable differences.
> Please note that I am not really deep into benchmarking but one thing
> that popped in my mind was whethere these (micro)benchmarks are really
> representative workloads. Some of them tend to be rather narrow in
> executed code paths or data structures used AFAIU. Is it possible they
> simply didn't generate sufficient TLB pressure?
> 

The test was done on 4 server platforms with 11 benchmarks which 0day 
run daily. For the 11 different benchmarks that were used, echo 
benchmarks have a lot of subcases, so there was a total of 259 test 
cases. The test memory size for the 4 server platform ranges from 128GB 
to 512GB. Yes, some of the benchmarks tend to be narrow in executed code 
paths or data structures. So we run a total of 259 cases which include 
test cases in memory, CPU scheduling, network, io, and database, try to 
cover most of the code path. For the 11 benchmarks, some of them may not 
generate sufficient TLB pressure, but I think cases in vm-scalability 
and will-it-scale may generate sufficient TLB pressure. I have provided 
the test results for different benchmarks, if you are interested, you 
can see in the details of the test report: 
https://01.org/sites/default/files/documentation/test_report_for_kernel_direct_mapping_performance_0.pdf


> Have you tried to look closer on profiles of respective configurations
> where the overhead comes from?
>

The test cases selected from the 0day daily run cases, just use the 
different kernel settings;
Enable both 2M and 1G huge pages (up to 1G, so named to "1G" in the test 
report):
            no extra kernel command line need
Disable 1G pages (up to 2M, so named to 2M in the test report):
           add kernel command line "nogbpages"
Disable both 2M and 1G huge pages (up to 4k, so named to 4K in the test 
report):
           add kernel command line "nohugepages_mapping" (by debug patch)

User spaces add THP enabled setting for all the three kernels (1G/2M/4K)
           transparent_hugepage:
               thp_enabled: always
               thp_defrag: always

During the test, we enabled some monitors, but the overhead should be 
not too big, most of the overhead should be the test cases themselves.
I will study some test cases to find the hotspot from which overhead 
comes from and provide it later if someone is interested in it.


-- 
Zhengjun Xing

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-01-27  8:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-15  7:23 Test report for kernel direct mapping performance Xing Zhengjun
2021-01-26 15:00 ` Michal Hocko
2021-01-27  7:50   ` Xing Zhengjun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).