* [PATCH v4 0/4] make vm_committed_as_batch aware of vm overcommit policy @ 2020-05-29 1:06 Feng Tang 2020-05-29 1:06 ` [PATCH v4 1/4] proc/meminfo: avoid open coded reading of vm_committed_as Feng Tang ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: Feng Tang @ 2020-05-29 1:06 UTC (permalink / raw) To: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, Qian Cai, andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel Cc: Feng Tang When checking a performance change for will-it-scale scalability mmap test [1], we found very high lock contention for spinlock of percpu counter 'vm_committed_as': 94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave 48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap; 45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap; Actually this heavy lock contention is not always necessary. The 'vm_committed_as' needs to be very precise when the strict OVERCOMMIT_NEVER policy is set, which requires a rather small batch number for the percpu counter. So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and enlarge it for not-so-strict OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies. Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T desktop, and 2097%(20X) on a 4S/72C/144T server. And for that case, whether it shows improvements depends on if the test mmap size is bigger than the batch number computed. We tested 10+ platforms in 0day (server, desktop and laptop). If we lift it to 64X, 80%+ platforms show improvements, and for 16X lift, 1/3 of the platforms will show improvements. And generally it should help the mmap/unmap usage,as Michal Hocko mentioned: " I believe that there are non-synthetic worklaods which would benefit from a larger batch. E.g. large in memory databases which do large mmaps during startups from multiple threads. " Note: There are some style complain from checkpatch for patch 3, as sysctl handler declaration follows the similar format of sibling functions patch1: a cleanup for /proc/meminfo patch2: a preparation patch which also improve the accuracy of vm_memory_committed patch3: remove the VM_WARN_ONCE for vm_committed_as underflow check patch4: main change This is against today's linux-mm git tree on github. Please help to review, thanks! - Feng ---------------------------------------------------------------- Changelog: v4: * Remove the VM_WARN_ONCE check for vm_committed_as underflow, thanks to Qian Cai for finding and testing the warning v3: * refine commit log and cleanup code, according to comments from Michal Hocko and Matthew Wilcox * change the lift from 16X and 64X after test v2: * add the sysctl handler to cover runtime overcommit policy change, as suggested by Andres Morton * address the accuracy concern of vm_memory_committed() from Andi Kleen Feng Tang (4): proc/meminfo: avoid open coded reading of vm_committed_as mm/util.c: make vm_memory_committed() more accurate mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check mm: adjust vm_committed_as_batch according to vm overcommit policy fs/proc/meminfo.c | 2 +- include/linux/mm.h | 2 ++ include/linux/mman.h | 4 ++++ kernel/sysctl.c | 2 +- mm/mm_init.c | 18 ++++++++++++++---- mm/util.c | 22 +++++++++++++--------- 6 files changed, 35 insertions(+), 15 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v4 1/4] proc/meminfo: avoid open coded reading of vm_committed_as 2020-05-29 1:06 [PATCH v4 0/4] make vm_committed_as_batch aware of vm overcommit policy Feng Tang @ 2020-05-29 1:06 ` Feng Tang 2020-05-29 1:06 ` [PATCH v4 2/4] mm/util.c: make vm_memory_committed() more accurate Feng Tang ` (2 subsequent siblings) 3 siblings, 0 replies; 16+ messages in thread From: Feng Tang @ 2020-05-29 1:06 UTC (permalink / raw) To: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, Qian Cai, andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel Cc: Feng Tang Use the existing vm_memory_committed() instead, which is also convenient for future change. Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> --- fs/proc/meminfo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index b030d8b..e3d14ee 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -41,7 +41,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v) si_meminfo(&i); si_swapinfo(&i); - committed = percpu_counter_read_positive(&vm_committed_as); + committed = vm_memory_committed(); cached = global_node_page_state(NR_FILE_PAGES) - total_swapcache_pages() - i.bufferram; -- 2.7.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 2/4] mm/util.c: make vm_memory_committed() more accurate 2020-05-29 1:06 [PATCH v4 0/4] make vm_committed_as_batch aware of vm overcommit policy Feng Tang 2020-05-29 1:06 ` [PATCH v4 1/4] proc/meminfo: avoid open coded reading of vm_committed_as Feng Tang @ 2020-05-29 1:06 ` Feng Tang 2020-06-03 13:35 ` Michal Hocko 2020-06-03 14:28 ` Andi Kleen 2020-05-29 1:06 ` [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check Feng Tang 2020-05-29 1:06 ` [PATCH v4 4/4] mm: adjust vm_committed_as_batch according to vm overcommit policy Feng Tang 3 siblings, 2 replies; 16+ messages in thread From: Feng Tang @ 2020-05-29 1:06 UTC (permalink / raw) To: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, Qian Cai, andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel Cc: Feng Tang percpu_counter_sum_positive() will provide more accurate info. As with percpu_counter_read_positive(), in worst case the deviation could be 'batch * nr_cpus', which is totalram_pages/256 for now, and will be more when the batch gets enlarged. Its time cost is about 800 nanoseconds on a 2C/4T platform and 2~3 microseconds on a 2S/36C/72T server in normal case, and in worst case where vm_committed_as's spinlock is under severe contention, it costs 30~40 microseconds for the 2S/36C/72T sever, which should be fine for its only two users: /proc/meminfo and HyperV balloon driver's status trace per second. Signed-off-by: Feng Tang <feng.tang@intel.com> --- mm/util.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/util.c b/mm/util.c index 9b3be03..3c7a08c 100644 --- a/mm/util.c +++ b/mm/util.c @@ -790,7 +790,7 @@ struct percpu_counter vm_committed_as ____cacheline_aligned_in_smp; */ unsigned long vm_memory_committed(void) { - return percpu_counter_read_positive(&vm_committed_as); + return percpu_counter_sum_positive(&vm_committed_as); } EXPORT_SYMBOL_GPL(vm_memory_committed); -- 2.7.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4 2/4] mm/util.c: make vm_memory_committed() more accurate 2020-05-29 1:06 ` [PATCH v4 2/4] mm/util.c: make vm_memory_committed() more accurate Feng Tang @ 2020-06-03 13:35 ` Michal Hocko 2020-06-03 14:28 ` Andi Kleen 1 sibling, 0 replies; 16+ messages in thread From: Michal Hocko @ 2020-06-03 13:35 UTC (permalink / raw) To: Feng Tang Cc: Andrew Morton, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, Qian Cai, andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel, K. Y. Srinivasan, Haiyang Zhang On Fri 29-05-20 09:06:08, Feng Tang wrote: > percpu_counter_sum_positive() will provide more accurate info. > > As with percpu_counter_read_positive(), in worst case the deviation > could be 'batch * nr_cpus', which is totalram_pages/256 for now, > and will be more when the batch gets enlarged. > > Its time cost is about 800 nanoseconds on a 2C/4T platform and > 2~3 microseconds on a 2S/36C/72T server in normal case, and in > worst case where vm_committed_as's spinlock is under severe > contention, it costs 30~40 microseconds for the 2S/36C/72T sever, > which should be fine for its only two users: /proc/meminfo and > HyperV balloon driver's status trace per second. > > Signed-off-by: Feng Tang <feng.tang@intel.com> I cannot speak for HyperV part. Cc maintainers but this shouldn't be a problem for meminfo. Acked-by: Michal Hocko <mhocko@suse.com> > --- > mm/util.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/util.c b/mm/util.c > index 9b3be03..3c7a08c 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -790,7 +790,7 @@ struct percpu_counter vm_committed_as ____cacheline_aligned_in_smp; > */ > unsigned long vm_memory_committed(void) > { > - return percpu_counter_read_positive(&vm_committed_as); > + return percpu_counter_sum_positive(&vm_committed_as); > } > EXPORT_SYMBOL_GPL(vm_memory_committed); > > -- > 2.7.4 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 2/4] mm/util.c: make vm_memory_committed() more accurate 2020-05-29 1:06 ` [PATCH v4 2/4] mm/util.c: make vm_memory_committed() more accurate Feng Tang 2020-06-03 13:35 ` Michal Hocko @ 2020-06-03 14:28 ` Andi Kleen 2020-06-04 1:38 ` Feng Tang 1 sibling, 1 reply; 16+ messages in thread From: Andi Kleen @ 2020-06-03 14:28 UTC (permalink / raw) To: Feng Tang Cc: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, Qian Cai, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel > Its time cost is about 800 nanoseconds on a 2C/4T platform and > 2~3 microseconds on a 2S/36C/72T server in normal case, and in > worst case where vm_committed_as's spinlock is under severe > contention, it costs 30~40 microseconds for the 2S/36C/72T sever, This will be likely 40-80us on larger systems, although the overhead is often non linear so it might get worse. > which should be fine for its only two users: /proc/meminfo and > HyperV balloon driver's status trace per second. There are some setups who do frequent sampling of /proc/meminfo in the background. Increased overhead could be a problem for them. But not proposing a change now. If someone complains have to revisit I guess, perhaps adding a rate limit of some sort. -Andi ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 2/4] mm/util.c: make vm_memory_committed() more accurate 2020-06-03 14:28 ` Andi Kleen @ 2020-06-04 1:38 ` Feng Tang 0 siblings, 0 replies; 16+ messages in thread From: Feng Tang @ 2020-06-04 1:38 UTC (permalink / raw) To: Andi Kleen Cc: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, Qian Cai, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel On Wed, Jun 03, 2020 at 07:28:53AM -0700, Andi Kleen wrote: > > Its time cost is about 800 nanoseconds on a 2C/4T platform and > > 2~3 microseconds on a 2S/36C/72T server in normal case, and in > > worst case where vm_committed_as's spinlock is under severe > > contention, it costs 30~40 microseconds for the 2S/36C/72T sever, > > This will be likely 40-80us on larger systems, although the overhead > is often non linear so it might get worse. > > > which should be fine for its only two users: /proc/meminfo and > > HyperV balloon driver's status trace per second. > > There are some setups who do frequent sampling of /proc/meminfo > in the background. Increased overhead could be a problem for them. > But not proposing a change now. If someone complains have to > revisit I guess, perhaps adding a rate limit of some sort. Agree. Maybe I should also put the time cost info into the code comments in case someone noticed the slowdown. Thanks, Feng > > -Andi ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check 2020-05-29 1:06 [PATCH v4 0/4] make vm_committed_as_batch aware of vm overcommit policy Feng Tang 2020-05-29 1:06 ` [PATCH v4 1/4] proc/meminfo: avoid open coded reading of vm_committed_as Feng Tang 2020-05-29 1:06 ` [PATCH v4 2/4] mm/util.c: make vm_memory_committed() more accurate Feng Tang @ 2020-05-29 1:06 ` Feng Tang 2020-05-29 2:49 ` Qian Cai 2020-05-29 1:06 ` [PATCH v4 4/4] mm: adjust vm_committed_as_batch according to vm overcommit policy Feng Tang 3 siblings, 1 reply; 16+ messages in thread From: Feng Tang @ 2020-05-29 1:06 UTC (permalink / raw) To: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, Qian Cai, andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel Cc: Feng Tang, Konstantin Khlebnikov As is explained by Michal Hocko: : Looking at the history, this has been added by 82f71ae4a2b8 : ("mm: catch memory commitment underflow") to have a safety check : for issues which have been fixed. There doesn't seem to be any bug : reports mentioning this splat since then so it is likely just : spending cycles for a hot path (yes many people run with DEBUG_VM) : without a strong reason. Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Qian Cai <cai@lca.pw> Cc: Michal Hocko <mhocko@suse.com> Cc: Andi Kleen <andi.kleen@intel.com> --- mm/util.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/mm/util.c b/mm/util.c index 3c7a08c..fe63271 100644 --- a/mm/util.c +++ b/mm/util.c @@ -814,14 +814,6 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) { long allowed; - /* - * A transient decrease in the value is unlikely, so no need - * READ_ONCE() for vm_committed_as.count. - */ - VM_WARN_ONCE(data_race(percpu_counter_read(&vm_committed_as) < - -(s64)vm_committed_as_batch * num_online_cpus()), - "memory commitment underflow"); - vm_acct_memory(pages); /* -- 2.7.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check 2020-05-29 1:06 ` [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check Feng Tang @ 2020-05-29 2:49 ` Qian Cai 2020-05-29 5:37 ` Feng Tang 2020-06-02 3:37 ` Feng Tang 0 siblings, 2 replies; 16+ messages in thread From: Qian Cai @ 2020-05-29 2:49 UTC (permalink / raw) To: Feng Tang Cc: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel, Konstantin Khlebnikov On Fri, May 29, 2020 at 09:06:09AM +0800, Feng Tang wrote: > As is explained by Michal Hocko: > > : Looking at the history, this has been added by 82f71ae4a2b8 > : ("mm: catch memory commitment underflow") to have a safety check > : for issues which have been fixed. There doesn't seem to be any bug > : reports mentioning this splat since then so it is likely just > : spending cycles for a hot path (yes many people run with DEBUG_VM) > : without a strong reason. Hmm, it looks like the warning is still useful to catch issues in, https://lore.kernel.org/linux-mm/20140624201606.18273.44270.stgit@zurg https://lore.kernel.org/linux-mm/54BB9A32.7080703@oracle.com/ After read the whole discussion in that thread, I actually disagree with Michal. In order to get ride of this existing warning, it is rather someone needs a strong reason that could prove the performance hit is noticeable with some data. > > Signed-off-by: Feng Tang <feng.tang@intel.com> > Cc: Konstantin Khlebnikov <koct9i@gmail.com> > Cc: Qian Cai <cai@lca.pw> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Andi Kleen <andi.kleen@intel.com> > --- > mm/util.c | 8 -------- > 1 file changed, 8 deletions(-) > > diff --git a/mm/util.c b/mm/util.c > index 3c7a08c..fe63271 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -814,14 +814,6 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) > { > long allowed; > > - /* > - * A transient decrease in the value is unlikely, so no need > - * READ_ONCE() for vm_committed_as.count. > - */ > - VM_WARN_ONCE(data_race(percpu_counter_read(&vm_committed_as) < > - -(s64)vm_committed_as_batch * num_online_cpus()), > - "memory commitment underflow"); > - > vm_acct_memory(pages); > > /* > -- > 2.7.4 > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check 2020-05-29 2:49 ` Qian Cai @ 2020-05-29 5:37 ` Feng Tang 2020-06-02 3:37 ` Feng Tang 1 sibling, 0 replies; 16+ messages in thread From: Feng Tang @ 2020-05-29 5:37 UTC (permalink / raw) To: Qian Cai Cc: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel, Konstantin Khlebnikov On Thu, May 28, 2020 at 10:49:28PM -0400, Qian Cai wrote: > On Fri, May 29, 2020 at 09:06:09AM +0800, Feng Tang wrote: > > As is explained by Michal Hocko: > > > > : Looking at the history, this has been added by 82f71ae4a2b8 > > : ("mm: catch memory commitment underflow") to have a safety check > > : for issues which have been fixed. There doesn't seem to be any bug > > : reports mentioning this splat since then so it is likely just > > : spending cycles for a hot path (yes many people run with DEBUG_VM) > > : without a strong reason. > > Hmm, it looks like the warning is still useful to catch issues in, > > https://lore.kernel.org/linux-mm/20140624201606.18273.44270.stgit@zurg > https://lore.kernel.org/linux-mm/54BB9A32.7080703@oracle.com/ > > After read the whole discussion in that thread, I actually disagree with > Michal. In order to get ride of this existing warning, it is rather > someone needs a strong reason that could prove the performance hit is > noticeable with some data. One problem with current check is percpu_counter_read(&vm_committed_as) is not accurate, and percpu_counter_sum() is way too heavy. Thanks, Feng ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check 2020-05-29 2:49 ` Qian Cai 2020-05-29 5:37 ` Feng Tang @ 2020-06-02 3:37 ` Feng Tang 1 sibling, 0 replies; 16+ messages in thread From: Feng Tang @ 2020-06-02 3:37 UTC (permalink / raw) To: Qian Cai Cc: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel, Konstantin Khlebnikov Hi Qian, On Thu, May 28, 2020 at 10:49:28PM -0400, Qian Cai wrote: > On Fri, May 29, 2020 at 09:06:09AM +0800, Feng Tang wrote: > > As is explained by Michal Hocko: > > > > : Looking at the history, this has been added by 82f71ae4a2b8 > > : ("mm: catch memory commitment underflow") to have a safety check > > : for issues which have been fixed. There doesn't seem to be any bug > > : reports mentioning this splat since then so it is likely just > > : spending cycles for a hot path (yes many people run with DEBUG_VM) > > : without a strong reason. > > Hmm, it looks like the warning is still useful to catch issues in, > > https://lore.kernel.org/linux-mm/20140624201606.18273.44270.stgit@zurg > https://lore.kernel.org/linux-mm/54BB9A32.7080703@oracle.com/ > > After read the whole discussion in that thread, I actually disagree with > Michal. In order to get ride of this existing warning, it is rather > someone needs a strong reason that could prove the performance hit is > noticeable with some data. I re-run the same benchmark with v5.7 and 5.7+remove_warning kernels, the overall performance change is trivial (which is expected) 1330147 +0.1% 1331032 will-it-scale.72.processes But the perf stats of "self" shows big change for __vm_enough_memory() 0.27 -0.3 0.00 pp.self.__vm_enough_memory I post the full compare result in the end. Thanks, Feng ========================================================================================= tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor/ucode: lkp-skl-2sp7/will-it-scale/debian-x86_64-20191114.cgz/x86_64-rhel-7.6-vm-debug/gcc-7/100%/process/mmap2/performance/0x2000065 commit: v5.7 af3eca72dc43078e1ee4a38b0ecc0225b659f345 v5.7 af3eca72dc43078e1ee4a38b0ec ---------------- --------------------------- fail:runs %reproduction fail:runs | | | 850:3 -12130% 486:2 dmesg.timestamp:last 2:3 -67% :2 kmsg.Firmware_Bug]:the_BIOS_has_corrupted_hw-PMU_resources(MSR#is#bb) :3 33% 1:2 kmsg.Firmware_Bug]:the_BIOS_has_corrupted_hw-PMU_resources(MSR#is#e08) 5:3 -177% :2 kmsg.timestamp:Firmware_Bug]:the_BIOS_has_corrupted_hw-PMU_resources(MSR#is#bb) :3 88% 2:2 kmsg.timestamp:Firmware_Bug]:the_BIOS_has_corrupted_hw-PMU_resources(MSR#is#e08) 398:3 -4444% 265:2 kmsg.timestamp:last %stddev %change %stddev \ | \ 1330147 +0.1% 1331032 will-it-scale.72.processes 0.02 +0.0% 0.02 will-it-scale.72.processes_idle 18474 +0.1% 18486 will-it-scale.per_process_ops 301.18 -0.0% 301.16 will-it-scale.time.elapsed_time 301.18 -0.0% 301.16 will-it-scale.time.elapsed_time.max 1.00 ± 81% +100.0% 2.00 will-it-scale.time.involuntary_context_switches 9452 +0.0% 9452 will-it-scale.time.maximum_resident_set_size 5925 +0.1% 5932 will-it-scale.time.minor_page_faults 4096 +0.0% 4096 will-it-scale.time.page_size 0.01 ± 35% +12.5% 0.01 ± 33% will-it-scale.time.system_time 0.03 ± 14% +5.0% 0.04 ± 14% will-it-scale.time.user_time 83.33 +0.2% 83.50 will-it-scale.time.voluntary_context_switches 1330147 +0.1% 1331032 will-it-scale.workload 0.45 ± 29% +0.0 0.50 ± 28% mpstat.cpu.all.idle% 98.41 -0.1 98.34 mpstat.cpu.all.sys% 1.14 +0.0 1.16 mpstat.cpu.all.usr% 200395 ± 18% +11.9% 224282 ± 14% cpuidle.C1.time 4008 ± 38% -2.1% 3924 ± 15% cpuidle.C1.usage 1.222e+08 ± 19% -29.2% 86444161 cpuidle.C1E.time 254203 ± 19% -23.2% 195198 ± 4% cpuidle.C1E.usage 8145747 ± 31% +339.9% 35830338 ± 72% cpuidle.C6.time 22878 ± 9% +288.2% 88823 ± 70% cpuidle.C6.usage 8891 ± 7% -7.4% 8229 cpuidle.POLL.time 3111 ± 18% -11.1% 2766 cpuidle.POLL.usage 0.00 -100.0% 0.00 numa-numastat.node0.interleave_hit 314399 ± 2% -1.0% 311244 ± 3% numa-numastat.node0.local_node 322209 +2.4% 329909 numa-numastat.node0.numa_hit 7814 ± 73% +138.9% 18670 ± 24% numa-numastat.node0.other_node 0.00 -100.0% 0.00 numa-numastat.node1.interleave_hit 343026 ± 2% -0.3% 341980 numa-numastat.node1.local_node 358632 -3.3% 346708 numa-numastat.node1.numa_hit 15613 ± 36% -69.7% 4728 ± 98% numa-numastat.node1.other_node 301.18 -0.0% 301.16 time.elapsed_time 301.18 -0.0% 301.16 time.elapsed_time.max 1.00 ± 81% +100.0% 2.00 time.involuntary_context_switches 9452 +0.0% 9452 time.maximum_resident_set_size 5925 +0.1% 5932 time.minor_page_faults 4096 +0.0% 4096 time.page_size 0.01 ± 35% +12.5% 0.01 ± 33% time.system_time 0.03 ± 14% +5.0% 0.04 ± 14% time.user_time 83.33 +0.2% 83.50 time.voluntary_context_switches 0.33 ±141% +50.0% 0.50 ±100% vmstat.cpu.id 97.00 +0.0% 97.00 vmstat.cpu.sy 1.00 +0.0% 1.00 vmstat.cpu.us 0.00 -100.0% 0.00 vmstat.io.bi 4.00 +0.0% 4.00 vmstat.memory.buff 1391751 +0.1% 1392746 vmstat.memory.cache 1.294e+08 -0.0% 1.294e+08 vmstat.memory.free 71.00 +0.0% 71.00 vmstat.procs.r 1315 -0.7% 1305 vmstat.system.cs 147433 -0.0% 147369 vmstat.system.in 0.00 -100.0% 0.00 proc-vmstat.compact_isolated 85060 +0.4% 85431 proc-vmstat.nr_active_anon 37.00 -1.4% 36.50 proc-vmstat.nr_active_file 71111 +0.1% 71200 proc-vmstat.nr_anon_pages 77.33 ± 17% +12.5% 87.00 proc-vmstat.nr_anon_transparent_hugepages 54.00 +1.9% 55.00 proc-vmstat.nr_dirtied 5.00 +0.0% 5.00 proc-vmstat.nr_dirty 3215506 -0.0% 3215471 proc-vmstat.nr_dirty_background_threshold 6438875 -0.0% 6438805 proc-vmstat.nr_dirty_threshold 327936 +0.1% 328237 proc-vmstat.nr_file_pages 50398 +0.0% 50398 proc-vmstat.nr_free_cma 32356721 -0.0% 32356374 proc-vmstat.nr_free_pages 4640 -0.1% 4636 proc-vmstat.nr_inactive_anon 82.67 ± 2% -0.8% 82.00 ± 2% proc-vmstat.nr_inactive_file 13256 -0.3% 13211 proc-vmstat.nr_kernel_stack 8057 -0.5% 8017 proc-vmstat.nr_mapped 134.00 ±141% +49.3% 200.00 ±100% proc-vmstat.nr_mlock 2229 +0.2% 2234 proc-vmstat.nr_page_table_pages 18609 ± 3% +1.6% 18898 ± 3% proc-vmstat.nr_shmem 19964 -0.3% 19901 proc-vmstat.nr_slab_reclaimable 34003 +0.1% 34025 proc-vmstat.nr_slab_unreclaimable 309227 +0.0% 309249 proc-vmstat.nr_unevictable 0.00 -100.0% 0.00 proc-vmstat.nr_writeback 53.00 +0.0% 53.00 proc-vmstat.nr_written 85060 +0.4% 85431 proc-vmstat.nr_zone_active_anon 37.00 -1.4% 36.50 proc-vmstat.nr_zone_active_file 4640 -0.1% 4636 proc-vmstat.nr_zone_inactive_anon 82.67 ± 2% -0.8% 82.00 ± 2% proc-vmstat.nr_zone_inactive_file 309227 +0.0% 309249 proc-vmstat.nr_zone_unevictable 5.00 +0.0% 5.00 proc-vmstat.nr_zone_write_pending 2181 ±124% -68.6% 685.50 ± 80% proc-vmstat.numa_hint_faults 37.67 ±109% +77.9% 67.00 ± 91% proc-vmstat.numa_hint_faults_local 702373 +0.7% 707116 proc-vmstat.numa_hit 35.33 ± 85% -70.3% 10.50 ± 4% proc-vmstat.numa_huge_pte_updates 0.00 -100.0% 0.00 proc-vmstat.numa_interleave 678938 +0.7% 683714 proc-vmstat.numa_local 23435 -0.1% 23401 proc-vmstat.numa_other 4697 ± 68% -86.8% 618.50 ± 98% proc-vmstat.numa_pages_migrated 25844 ± 52% -79.1% 5406 ± 4% proc-vmstat.numa_pte_updates 20929 ± 4% +1.9% 21332 ± 5% proc-vmstat.pgactivate 0.00 -100.0% 0.00 proc-vmstat.pgalloc_dma32 760325 -0.4% 756908 proc-vmstat.pgalloc_normal 801566 -0.5% 797832 proc-vmstat.pgfault 714690 -0.2% 713286 proc-vmstat.pgfree 4697 ± 68% -86.8% 618.50 ± 98% proc-vmstat.pgmigrate_success 0.00 -100.0% 0.00 proc-vmstat.pgpgin 103.00 +0.5% 103.50 proc-vmstat.thp_collapse_alloc 5.00 +0.0% 5.00 proc-vmstat.thp_fault_alloc 0.00 -100.0% 0.00 proc-vmstat.thp_zero_page_alloc 41.00 ± 98% +35.4% 55.50 ± 80% proc-vmstat.unevictable_pgs_culled 183.00 ±141% +50.0% 274.50 ±100% proc-vmstat.unevictable_pgs_mlocked 2.59 +0.5% 2.60 perf-stat.i.MPKI 4.854e+09 +0.0% 4.856e+09 perf-stat.i.branch-instructions 0.45 -0.0 0.43 perf-stat.i.branch-miss-rate% 21296577 -2.3% 20817170 perf-stat.i.branch-misses 39.98 -0.2 39.81 perf-stat.i.cache-miss-rate% 21372778 +0.0% 21380457 perf-stat.i.cache-misses 53441942 +0.5% 53705724 perf-stat.i.cache-references 1285 -0.7% 1277 perf-stat.i.context-switches 10.67 -0.0% 10.67 perf-stat.i.cpi 71998 -0.0% 71998 perf-stat.i.cpu-clock 2.21e+11 +0.0% 2.21e+11 perf-stat.i.cpu-cycles 117.36 +0.3% 117.71 perf-stat.i.cpu-migrations 10322 -0.0% 10321 perf-stat.i.cycles-between-cache-misses 0.05 +0.0 0.05 perf-stat.i.dTLB-load-miss-rate% 2709233 +0.1% 2712427 perf-stat.i.dTLB-load-misses 5.785e+09 +0.0% 5.787e+09 perf-stat.i.dTLB-loads 0.00 +0.0 0.00 ± 2% perf-stat.i.dTLB-store-miss-rate% 8967 -3.0% 8701 perf-stat.i.dTLB-store-misses 1.97e+09 +0.1% 1.971e+09 perf-stat.i.dTLB-stores 94.02 +0.2 94.24 perf-stat.i.iTLB-load-miss-rate% 2732366 -1.4% 2694372 perf-stat.i.iTLB-load-misses 173049 -5.7% 163172 ± 2% perf-stat.i.iTLB-loads 2.07e+10 +0.0% 2.071e+10 perf-stat.i.instructions 7671 +1.0% 7747 perf-stat.i.instructions-per-iTLB-miss 0.10 +0.1% 0.10 perf-stat.i.ipc 3.07 +0.0% 3.07 perf-stat.i.metric.GHz 0.42 +0.5% 0.43 perf-stat.i.metric.K/sec 175.98 +0.1% 176.08 perf-stat.i.metric.M/sec 2565 -0.7% 2547 perf-stat.i.minor-faults 99.55 -0.0 99.53 perf-stat.i.node-load-miss-rate% 5949351 +0.3% 5969805 perf-stat.i.node-load-misses 22301 ± 6% +5.6% 23543 ± 8% perf-stat.i.node-loads 99.73 -0.0 99.72 perf-stat.i.node-store-miss-rate% 5314673 -0.1% 5310449 perf-stat.i.node-store-misses 4704 ± 4% -1.8% 4619 perf-stat.i.node-stores 2565 -0.7% 2547 perf-stat.i.page-faults 71998 -0.0% 71998 perf-stat.i.task-clock 2.58 +0.5% 2.59 perf-stat.overall.MPKI 0.44 -0.0 0.43 perf-stat.overall.branch-miss-rate% 39.99 -0.2 39.81 perf-stat.overall.cache-miss-rate% 10.67 -0.0% 10.67 perf-stat.overall.cpi 10340 -0.0% 10337 perf-stat.overall.cycles-between-cache-misses 0.05 +0.0 0.05 perf-stat.overall.dTLB-load-miss-rate% 0.00 -0.0 0.00 perf-stat.overall.dTLB-store-miss-rate% 94.04 +0.2 94.29 perf-stat.overall.iTLB-load-miss-rate% 7577 +1.4% 7686 perf-stat.overall.instructions-per-iTLB-miss 0.09 +0.0% 0.09 perf-stat.overall.ipc 99.62 -0.0 99.60 perf-stat.overall.node-load-miss-rate% 99.91 +0.0 99.91 perf-stat.overall.node-store-miss-rate% 4691551 +0.0% 4693151 perf-stat.overall.path-length 4.838e+09 +0.0% 4.84e+09 perf-stat.ps.branch-instructions 21230930 -2.3% 20750859 perf-stat.ps.branch-misses 21302195 +0.0% 21309444 perf-stat.ps.cache-misses 53273375 +0.5% 53531696 perf-stat.ps.cache-references 1281 -0.7% 1273 perf-stat.ps.context-switches 71760 -0.0% 71759 perf-stat.ps.cpu-clock 2.203e+11 +0.0% 2.203e+11 perf-stat.ps.cpu-cycles 117.02 +0.3% 117.33 perf-stat.ps.cpu-migrations 2702184 +0.1% 2704689 perf-stat.ps.dTLB-load-misses 5.766e+09 +0.0% 5.767e+09 perf-stat.ps.dTLB-loads 9028 -3.2% 8736 perf-stat.ps.dTLB-store-misses 1.963e+09 +0.1% 1.965e+09 perf-stat.ps.dTLB-stores 2723237 -1.4% 2685365 perf-stat.ps.iTLB-load-misses 172573 -5.7% 162735 ± 2% perf-stat.ps.iTLB-loads 2.063e+10 +0.0% 2.064e+10 perf-stat.ps.instructions 2559 -0.7% 2540 perf-stat.ps.minor-faults 5929506 +0.3% 5949863 perf-stat.ps.node-load-misses 22689 ± 5% +4.7% 23766 ± 8% perf-stat.ps.node-loads 5296902 -0.1% 5292690 perf-stat.ps.node-store-misses 4724 ± 4% -2.2% 4622 perf-stat.ps.node-stores 2559 -0.7% 2540 perf-stat.ps.page-faults 71760 -0.0% 71759 perf-stat.ps.task-clock 6.24e+12 +0.1% 6.247e+12 perf-stat.total.instructions 47.20 -0.2 47.05 pp.bt.percpu_counter_add_batch.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 50.12 -0.2 49.97 pp.bt.entry_SYSCALL_64_after_hwframe.munmap 50.10 -0.1 49.95 pp.bt.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap 46.75 -0.1 46.60 pp.bt._raw_spin_lock_irqsave.percpu_counter_add_batch.__do_munmap.__vm_munmap.__x64_sys_munmap 49.36 -0.1 49.22 pp.bt.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 49.78 -0.1 49.64 pp.bt.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap 49.75 -0.1 49.61 pp.bt.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap 50.48 -0.1 50.34 pp.bt.munmap 46.56 -0.1 46.41 pp.bt.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.percpu_counter_add_batch.__do_munmap.__vm_munmap 1.88 -0.0 1.88 pp.bt.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 1.32 +0.0 1.33 pp.bt.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap 1.42 +0.0 1.44 pp.bt.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 0.51 +0.0 0.53 pp.bt.___might_sleep.unmap_page_range.unmap_vmas.unmap_region.__do_munmap 48.27 +0.1 48.39 pp.bt.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64 48.67 +0.1 48.80 pp.bt.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64 48.51 +0.1 48.65 pp.bt.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe 47.10 +0.1 47.24 pp.bt.__vm_enough_memory.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff 48.74 +0.1 48.88 pp.bt.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64 49.08 +0.1 49.23 pp.bt.entry_SYSCALL_64_after_hwframe.mmap64 46.48 +0.1 46.62 pp.bt.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.percpu_counter_add_batch.__vm_enough_memory.mmap_region 49.06 +0.1 49.21 pp.bt.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64 49.46 +0.1 49.61 pp.bt.mmap64 46.66 +0.1 46.80 pp.bt._raw_spin_lock_irqsave.percpu_counter_add_batch.__vm_enough_memory.mmap_region.do_mmap 46.84 +0.4 47.23 pp.bt.percpu_counter_add_batch.__vm_enough_memory.mmap_region.do_mmap.vm_mmap_pgoff 49.78 -0.1 49.64 pp.child.__x64_sys_munmap 50.51 -0.1 50.36 pp.child.munmap 49.76 -0.1 49.61 pp.child.__vm_munmap 49.36 -0.1 49.22 pp.child.__do_munmap 0.45 -0.0 0.41 pp.child.perf_event_mmap 0.03 ± 70% -0.0 0.00 pp.child.strlen 0.02 ±141% -0.0 0.00 pp.child.common_file_perm 0.30 -0.0 0.29 pp.child.free_pgd_range 1.88 -0.0 1.88 pp.child.unmap_region 0.28 -0.0 0.27 pp.child.free_p4d_range 0.28 ± 4% -0.0 0.27 pp.child.apic_timer_interrupt 0.19 ± 4% -0.0 0.18 ± 2% pp.child.hrtimer_interrupt 0.09 -0.0 0.08 ± 5% pp.child.find_vma 0.40 ± 2% -0.0 0.40 ± 3% pp.child.vm_area_alloc 0.35 -0.0 0.34 pp.child.syscall_return_via_sysret 0.35 ± 2% -0.0 0.34 pp.child.up_read 93.41 -0.0 93.41 pp.child._raw_spin_lock_irqsave 0.05 ± 8% -0.0 0.05 pp.child.kmem_cache_alloc_trace 0.11 ± 4% -0.0 0.11 pp.child.d_path 0.11 ± 4% -0.0 0.11 pp.child.perf_iterate_sb 0.08 ± 5% -0.0 0.08 pp.child.prepend_path 99.23 -0.0 99.23 pp.child.entry_SYSCALL_64_after_hwframe 93.04 -0.0 93.04 pp.child.native_queued_spin_lock_slowpath 0.09 -0.0 0.09 pp.child.tick_sched_timer 0.05 -0.0 0.05 pp.child.unlink_file_vma 0.05 -0.0 0.05 pp.child.task_tick_fair 0.05 -0.0 0.05 pp.child.perf_event_mmap_output 99.20 +0.0 99.20 pp.child.do_syscall_64 0.21 ± 3% +0.0 0.21 pp.child.smp_apic_timer_interrupt 0.08 +0.0 0.08 pp.child.update_process_times 0.06 +0.0 0.06 pp.child.down_write_killable 0.15 +0.0 0.15 ± 6% pp.child.rcu_all_qs 0.37 ± 2% +0.0 0.37 ± 5% pp.child.kmem_cache_alloc 0.28 +0.0 0.29 ± 5% pp.child._cond_resched 0.12 ± 3% +0.0 0.12 ± 4% pp.child.vma_link 0.09 ± 5% +0.0 0.10 ± 5% pp.child.security_mmap_file 0.06 ± 7% +0.0 0.07 ± 7% pp.child.scheduler_tick 0.13 ± 3% +0.0 0.14 ± 3% pp.child.__hrtimer_run_queues 0.32 ± 2% +0.0 0.32 pp.child._raw_spin_unlock_irqrestore 0.08 +0.0 0.08 ± 5% pp.child.tick_sched_handle 0.06 +0.0 0.07 ± 7% pp.child.down_write 0.06 +0.0 0.07 ± 7% pp.child.remove_vma 1.43 +0.0 1.44 pp.child.unmap_vmas 0.05 +0.0 0.06 pp.child.__vma_rb_erase 0.08 +0.0 0.09 pp.child.free_pgtables 1.39 +0.0 1.40 pp.child.unmap_page_range 0.35 +0.0 0.36 ± 4% pp.child.entry_SYSCALL_64 0.10 ± 4% +0.0 0.11 pp.child.arch_get_unmapped_area_topdown 0.58 +0.0 0.59 pp.child.___might_sleep 0.06 +0.0 0.08 ± 6% pp.child.shmem_mmap 0.03 ± 70% +0.0 0.05 pp.child.up_write 0.03 ± 70% +0.0 0.05 pp.child.vm_unmapped_area 0.03 ± 70% +0.0 0.05 pp.child.__vma_link_rb 0.15 ± 3% +0.0 0.17 ± 3% pp.child.shmem_get_unmapped_area 0.18 ± 2% +0.0 0.20 ± 2% pp.child.get_unmapped_area 0.00 +0.0 0.03 ±100% pp.child.prepend_name 0.02 ±141% +0.0 0.05 pp.child.touch_atime 48.28 +0.1 48.39 pp.child.mmap_region 47.10 +0.1 47.24 pp.child.__vm_enough_memory 48.51 +0.1 48.65 pp.child.do_mmap 48.74 +0.1 48.88 pp.child.ksys_mmap_pgoff 48.67 +0.1 48.81 pp.child.vm_mmap_pgoff 49.49 +0.1 49.64 pp.child.mmap64 94.04 +0.2 94.28 pp.child.percpu_counter_add_batch 0.27 -0.3 0.00 pp.self.__vm_enough_memory 0.03 ± 70% -0.0 0.00 pp.self.strlen 0.02 ±141% -0.0 0.00 pp.self.prepend_path 0.28 -0.0 0.27 pp.self.free_p4d_range 0.07 -0.0 0.06 pp.self.perf_iterate_sb 0.08 -0.0 0.07 pp.self.perf_event_mmap 0.35 ± 2% -0.0 0.34 ± 5% pp.self.kmem_cache_alloc 0.11 -0.0 0.11 ± 4% pp.self.rcu_all_qs 0.37 -0.0 0.36 pp.self._raw_spin_lock_irqsave 0.35 -0.0 0.34 pp.self.syscall_return_via_sysret 0.35 ± 2% -0.0 0.34 pp.self.up_read 0.14 ± 3% -0.0 0.14 ± 3% pp.self._cond_resched 93.04 -0.0 93.04 pp.self.native_queued_spin_lock_slowpath 0.31 +0.0 0.32 ± 4% pp.self.entry_SYSCALL_64 0.69 +0.0 0.69 pp.self.unmap_page_range 0.10 ± 4% +0.0 0.10 pp.self._raw_spin_unlock_irqrestore 0.06 ± 8% +0.0 0.06 pp.self.find_vma 0.06 ± 8% +0.0 0.06 pp.self.__do_munmap 0.06 ± 7% +0.0 0.07 pp.self.mmap_region 0.61 ± 3% +0.0 0.62 pp.self.do_syscall_64 0.02 ±141% +0.0 0.03 ±100% pp.self.up_write 0.05 +0.0 0.06 pp.self.__vma_rb_erase 0.54 +0.0 0.56 pp.self.___might_sleep 0.03 ± 70% +0.0 0.05 pp.self.vm_unmapped_area 0.03 ± 70% +0.0 0.05 pp.self.shmem_get_unmapped_area 0.03 ± 70% +0.0 0.05 pp.self.__vma_link_rb 0.00 +0.0 0.03 ±100% pp.self.prepend_name 0.00 +0.0 0.03 ±100% pp.self.do_mmap 0.00 +0.0 0.03 ±100% pp.self.arch_get_unmapped_area_topdown 0.02 ±141% +0.0 0.05 pp.self.perf_event_mmap_output 0.32 ± 2% +0.2 0.56 pp.self.percpu_counter_add_batch 552.67 ± 2% -5.2% 524.00 ± 6% softirqs.BLOCK 2.00 +0.0% 2.00 softirqs.HI 911.00 ± 47% -31.4% 625.00 ± 2% softirqs.NET_RX 63.67 ± 3% -5.0% 60.50 ± 4% softirqs.NET_TX 312414 -1.1% 309101 softirqs.RCU 228903 -1.4% 225602 ± 3% softirqs.SCHED 265.67 -1.0% 263.00 softirqs.TASKLET 8777267 +0.1% 8789634 softirqs.TIMER 23504 -0.1% 23472 interrupts.CAL:Function_call_interrupts 144.00 -0.3% 143.50 interrupts.IWI:IRQ_work_interrupts 43641147 -0.0% 43621882 interrupts.LOC:Local_timer_interrupts 0.00 -100.0% 0.00 interrupts.MCP:Machine_check_polls 570736 ± 5% +2.1% 582977 ± 5% interrupts.NMI:Non-maskable_interrupts 570736 ± 5% +2.1% 582977 ± 5% interrupts.PMI:Performance_monitoring_interrupts 45097 +0.7% 45407 interrupts.RES:Rescheduling_interrupts 0.00 -100.0% 0.00 interrupts.RTR:APIC_ICR_read_retries 193.67 ± 26% -21.3% 152.50 ± 29% interrupts.TLB:TLB_shootdowns ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v4 4/4] mm: adjust vm_committed_as_batch according to vm overcommit policy 2020-05-29 1:06 [PATCH v4 0/4] make vm_committed_as_batch aware of vm overcommit policy Feng Tang ` (2 preceding siblings ...) 2020-05-29 1:06 ` [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check Feng Tang @ 2020-05-29 1:06 ` Feng Tang 2020-06-03 13:38 ` Michal Hocko 3 siblings, 1 reply; 16+ messages in thread From: Feng Tang @ 2020-05-29 1:06 UTC (permalink / raw) To: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, Qian Cai, andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel Cc: Feng Tang When checking a performance change for will-it-scale scalability mmap test [1], we found very high lock contention for spinlock of percpu counter 'vm_committed_as': 94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave 48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap; 45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap; Actually this heavy lock contention is not always necessary. The 'vm_committed_as' needs to be very precise when the strict OVERCOMMIT_NEVER policy is set, which requires a rather small batch number for the percpu counter. So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies. Also add a sysctl handler to adjust it when the policy is reconfigured. Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T desktop, and 2097%(20X) on a 4S/72C/144T server. We tested with test platforms in 0day (server, desktop and laptop), and 80%+ platforms shows improvements with that test. And whether it shows improvements depends on if the test mmap size is bigger than the batch number computed. And if the lift is 16X, 1/3 of the platforms will show improvements, though it should help the mmap/unmap usage generally, as Michal Hocko mentioned: : I believe that there are non-synthetic worklaods which would benefit from : a larger batch. E.g. large in memory databases which do large mmaps : during startups from multiple threads. [1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/ Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mm.h | 2 ++ include/linux/mman.h | 4 ++++ kernel/sysctl.c | 2 +- mm/mm_init.c | 18 ++++++++++++++---- mm/util.c | 12 ++++++++++++ 5 files changed, 33 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 573947c..c2efea6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -206,6 +206,8 @@ int overcommit_ratio_handler(struct ctl_table *, int, void *, size_t *, loff_t *); int overcommit_kbytes_handler(struct ctl_table *, int, void *, size_t *, loff_t *); +int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *, + loff_t *); #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n)) diff --git a/include/linux/mman.h b/include/linux/mman.h index 4b08e9c..91c93c1 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -57,8 +57,12 @@ extern struct percpu_counter vm_committed_as; #ifdef CONFIG_SMP extern s32 vm_committed_as_batch; +extern void mm_compute_batch(void); #else #define vm_committed_as_batch 0 +static inline void mm_compute_batch(void) +{ +} #endif unsigned long vm_memory_committed(void); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index db1ce7a..9456c86 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2650,7 +2650,7 @@ static struct ctl_table vm_table[] = { .data = &sysctl_overcommit_memory, .maxlen = sizeof(sysctl_overcommit_memory), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = overcommit_policy_handler, .extra1 = SYSCTL_ZERO, .extra2 = &two, }, diff --git a/mm/mm_init.c b/mm/mm_init.c index 435e5f7..c5a6fb1 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -13,6 +13,7 @@ #include <linux/memory.h> #include <linux/notifier.h> #include <linux/sched.h> +#include <linux/mman.h> #include "internal.h" #ifdef CONFIG_DEBUG_MEMORY_INIT @@ -144,14 +145,23 @@ EXPORT_SYMBOL_GPL(mm_kobj); #ifdef CONFIG_SMP s32 vm_committed_as_batch = 32; -static void __meminit mm_compute_batch(void) +void mm_compute_batch(void) { u64 memsized_batch; s32 nr = num_present_cpus(); s32 batch = max_t(s32, nr*2, 32); - - /* batch size set to 0.4% of (total memory/#cpus), or max int32 */ - memsized_batch = min_t(u64, (totalram_pages()/nr)/256, 0x7fffffff); + unsigned long ram_pages = totalram_pages(); + + /* + * For policy of OVERCOMMIT_NEVER, set batch size to 0.4% + * of (total memory/#cpus), and lift it to 25% for other + * policies to easy the possible lock contention for percpu_counter + * vm_committed_as, while the max limit is INT_MAX + */ + if (sysctl_overcommit_memory == OVERCOMMIT_NEVER) + memsized_batch = min_t(u64, ram_pages/nr/256, INT_MAX); + else + memsized_batch = min_t(u64, ram_pages/nr/4, INT_MAX); vm_committed_as_batch = max_t(s32, memsized_batch, batch); } diff --git a/mm/util.c b/mm/util.c index fe63271..580d268 100644 --- a/mm/util.c +++ b/mm/util.c @@ -746,6 +746,18 @@ int overcommit_ratio_handler(struct ctl_table *table, int write, void *buffer, return ret; } +int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos) +{ + int ret; + + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); + if (ret == 0 && write) + mm_compute_batch(); + + return ret; +} + int overcommit_kbytes_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { -- 2.7.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4 4/4] mm: adjust vm_committed_as_batch according to vm overcommit policy 2020-05-29 1:06 ` [PATCH v4 4/4] mm: adjust vm_committed_as_batch according to vm overcommit policy Feng Tang @ 2020-06-03 13:38 ` Michal Hocko 0 siblings, 0 replies; 16+ messages in thread From: Michal Hocko @ 2020-06-03 13:38 UTC (permalink / raw) To: Feng Tang Cc: Andrew Morton, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, Qian Cai, andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel On Fri 29-05-20 09:06:10, Feng Tang wrote: > When checking a performance change for will-it-scale scalability mmap test > [1], we found very high lock contention for spinlock of percpu counter > 'vm_committed_as': > > 94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap; > 45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap; > > Actually this heavy lock contention is not always necessary. The > 'vm_committed_as' needs to be very precise when the strict > OVERCOMMIT_NEVER policy is set, which requires a rather small batch number > for the percpu counter. > > So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and > lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies. Also > add a sysctl handler to adjust it when the policy is reconfigured. > > Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T > desktop, and 2097%(20X) on a 4S/72C/144T server. We tested with test > platforms in 0day (server, desktop and laptop), and 80%+ platforms shows > improvements with that test. And whether it shows improvements depends on > if the test mmap size is bigger than the batch number computed. > > And if the lift is 16X, 1/3 of the platforms will show improvements, > though it should help the mmap/unmap usage generally, as Michal Hocko > mentioned: > > : I believe that there are non-synthetic worklaods which would benefit from > : a larger batch. E.g. large in memory databases which do large mmaps > : during startups from multiple threads. > > [1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/ > > Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com > Signed-off-by: Feng Tang <feng.tang@intel.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Kees Cook <keescook@chromium.org> > Cc: Andi Kleen <andi.kleen@intel.com> > Cc: Tim Chen <tim.c.chen@intel.com> > Cc: Dave Hansen <dave.hansen@intel.com> > Cc: Huang Ying <ying.huang@intel.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> > --- > include/linux/mm.h | 2 ++ > include/linux/mman.h | 4 ++++ > kernel/sysctl.c | 2 +- > mm/mm_init.c | 18 ++++++++++++++---- > mm/util.c | 12 ++++++++++++ > 5 files changed, 33 insertions(+), 5 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 573947c..c2efea6 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -206,6 +206,8 @@ int overcommit_ratio_handler(struct ctl_table *, int, void *, size_t *, > loff_t *); > int overcommit_kbytes_handler(struct ctl_table *, int, void *, size_t *, > loff_t *); > +int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *, > + loff_t *); > > #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n)) > > diff --git a/include/linux/mman.h b/include/linux/mman.h > index 4b08e9c..91c93c1 100644 > --- a/include/linux/mman.h > +++ b/include/linux/mman.h > @@ -57,8 +57,12 @@ extern struct percpu_counter vm_committed_as; > > #ifdef CONFIG_SMP > extern s32 vm_committed_as_batch; > +extern void mm_compute_batch(void); > #else > #define vm_committed_as_batch 0 > +static inline void mm_compute_batch(void) > +{ > +} > #endif > > unsigned long vm_memory_committed(void); > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index db1ce7a..9456c86 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -2650,7 +2650,7 @@ static struct ctl_table vm_table[] = { > .data = &sysctl_overcommit_memory, > .maxlen = sizeof(sysctl_overcommit_memory), > .mode = 0644, > - .proc_handler = proc_dointvec_minmax, > + .proc_handler = overcommit_policy_handler, > .extra1 = SYSCTL_ZERO, > .extra2 = &two, > }, > diff --git a/mm/mm_init.c b/mm/mm_init.c > index 435e5f7..c5a6fb1 100644 > --- a/mm/mm_init.c > +++ b/mm/mm_init.c > @@ -13,6 +13,7 @@ > #include <linux/memory.h> > #include <linux/notifier.h> > #include <linux/sched.h> > +#include <linux/mman.h> > #include "internal.h" > > #ifdef CONFIG_DEBUG_MEMORY_INIT > @@ -144,14 +145,23 @@ EXPORT_SYMBOL_GPL(mm_kobj); > #ifdef CONFIG_SMP > s32 vm_committed_as_batch = 32; > > -static void __meminit mm_compute_batch(void) > +void mm_compute_batch(void) > { > u64 memsized_batch; > s32 nr = num_present_cpus(); > s32 batch = max_t(s32, nr*2, 32); > - > - /* batch size set to 0.4% of (total memory/#cpus), or max int32 */ > - memsized_batch = min_t(u64, (totalram_pages()/nr)/256, 0x7fffffff); > + unsigned long ram_pages = totalram_pages(); > + > + /* > + * For policy of OVERCOMMIT_NEVER, set batch size to 0.4% > + * of (total memory/#cpus), and lift it to 25% for other > + * policies to easy the possible lock contention for percpu_counter > + * vm_committed_as, while the max limit is INT_MAX > + */ > + if (sysctl_overcommit_memory == OVERCOMMIT_NEVER) > + memsized_batch = min_t(u64, ram_pages/nr/256, INT_MAX); > + else > + memsized_batch = min_t(u64, ram_pages/nr/4, INT_MAX); > > vm_committed_as_batch = max_t(s32, memsized_batch, batch); > } > diff --git a/mm/util.c b/mm/util.c > index fe63271..580d268 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -746,6 +746,18 @@ int overcommit_ratio_handler(struct ctl_table *table, int write, void *buffer, > return ret; > } > > +int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer, > + size_t *lenp, loff_t *ppos) > +{ > + int ret; > + > + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); > + if (ret == 0 && write) > + mm_compute_batch(); > + > + return ret; > +} > + > int overcommit_kbytes_handler(struct ctl_table *table, int write, void *buffer, > size_t *lenp, loff_t *ppos) > { > -- > 2.7.4 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check
@ 2020-06-02 4:02 Qian Cai
2020-06-03 9:48 ` Feng Tang
0 siblings, 1 reply; 16+ messages in thread
From: Qian Cai @ 2020-06-02 4:02 UTC (permalink / raw)
To: Feng Tang
Cc: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox,
Mel Gorman, Kees Cook, andi.kleen, tim.c.chen, dave.hansen,
ying.huang, linux-mm, linux-kernel, Konstantin Khlebnikov
> On Jun 1, 2020, at 11:37 PM, Feng Tang <feng.tang@intel.com> wrote:
>
> I re-run the same benchmark with v5.7 and 5.7+remove_warning kernels,
> the overall performance change is trivial (which is expected)
>
> 1330147 +0.1% 1331032 will-it-scale.72.processes
>
> But the perf stats of "self" shows big change for __vm_enough_memory()
>
> 0.27 -0.3 0.00 pp.self.__vm_enough_memory
>
> I post the full compare result in the end.
I don’t really see what that means exactly, but I suppose the warning is there for so long and no one seems notice much trouble (or benefit) because of it, so I think you will probably need to come up with a proper justification to explain why it is a trouble now, and how your patchset suddenly start to trigger the warning as well as why it is no better way but to suffer this debuggability regression (probably tiny but still).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check 2020-06-02 4:02 [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check Qian Cai @ 2020-06-03 9:48 ` Feng Tang 2020-06-03 11:51 ` Qian Cai 2020-06-03 13:36 ` Michal Hocko 0 siblings, 2 replies; 16+ messages in thread From: Feng Tang @ 2020-06-03 9:48 UTC (permalink / raw) To: Qian Cai Cc: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel, Konstantin Khlebnikov On Tue, Jun 02, 2020 at 12:02:22AM -0400, Qian Cai wrote: > > > > On Jun 1, 2020, at 11:37 PM, Feng Tang <feng.tang@intel.com> wrote: > > > > I re-run the same benchmark with v5.7 and 5.7+remove_warning kernels, > > the overall performance change is trivial (which is expected) > > > > 1330147 +0.1% 1331032 will-it-scale.72.processes > > > > But the perf stats of "self" shows big change for __vm_enough_memory() > > > > 0.27 -0.3 0.00 pp.self.__vm_enough_memory > > > > I post the full compare result in the end. > > I don’t really see what that means exactly, but I suppose the warning is there for so long and no one seems notice much trouble (or benefit) because of it, so I think you will probably need to come up with a proper justification to explain why it is a trouble now, and how your patchset suddenly start to trigger the warning as well as why it is no better way but to suffer this debuggability regression (probably tiny but still). Thanks for the suggestion, and I updated the commit log. From 1633da8228bd3d0dcbbd8df982977ad4594962a1 Mon Sep 17 00:00:00 2001 From: Feng Tang <feng.tang@intel.com> Date: Fri, 29 May 2020 08:48:48 +0800 Subject: [PATCH] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check This check was added by 82f71ae4a2b8 ("mm: catch memory commitment underflow") in 2014 to have a safety check for issues which have been fixed. And there has been few report caught by it, as described in its commit log: : This shouldn't happen any more - the previous two patches fixed : the committed_as underflow issues. But it was really found by Qian Cai when he used the LTP memory stress suite to test a RFC patchset, which tries to improve scalability of per-cpu counter 'vm_committed_as', by chosing a bigger 'batch' number for loose overcommit policies (OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS), while keeping current number for OVERCOMMIT_NEVER. With that patchset, when system firstly uses a loose policy, the 'vm_committed_as' count could be a big negative value, as its big 'batch' number allows a big deviation, then when the policy is changed to OVERCOMMIT_NEVER, the 'batch' will be decreased to a much smaller value, thus hits this WARN check. To mitigate this, one proposed solution is to queue work on all online CPUs to do a local sync for 'vm_committed_as' when changing policy to OVERCOMMIT_NEVER, plus some global syncing to garante the case won't be hit. But this solution is costy and slow, given this check hasn't shown real trouble or benefit, simply drop it from one hot path of MM. And perf stats does show some tiny saving for removing it. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Andi Kleen <andi.kleen@intel.com> --- mm/util.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/mm/util.c b/mm/util.c index 9b3be03..c63c8e4 100644 --- a/mm/util.c +++ b/mm/util.c @@ -814,14 +814,6 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) { long allowed; - /* - * A transient decrease in the value is unlikely, so no need - * READ_ONCE() for vm_committed_as.count. - */ - VM_WARN_ONCE(data_race(percpu_counter_read(&vm_committed_as) < - -(s64)vm_committed_as_batch * num_online_cpus()), - "memory commitment underflow"); - vm_acct_memory(pages); /* -- 2.7.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check 2020-06-03 9:48 ` Feng Tang @ 2020-06-03 11:51 ` Qian Cai 2020-06-03 13:36 ` Michal Hocko 1 sibling, 0 replies; 16+ messages in thread From: Qian Cai @ 2020-06-03 11:51 UTC (permalink / raw) To: Feng Tang Cc: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel, Konstantin Khlebnikov > On Jun 3, 2020, at 5:48 AM, Feng Tang <feng.tang@intel.com> wrote: > > This check was added by 82f71ae4a2b8 ("mm: catch memory commitment underflow") > in 2014 to have a safety check for issues which have been fixed. > And there has been few report caught by it, as described in its > commit log: > > : This shouldn't happen any more - the previous two patches fixed > : the committed_as underflow issues. > > But it was really found by Qian Cai when he used the LTP memory > stress suite to test a RFC patchset, which tries to improve scalability > of per-cpu counter 'vm_committed_as', by chosing a bigger 'batch' number > for loose overcommit policies (OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS), > while keeping current number for OVERCOMMIT_NEVER. > > With that patchset, when system firstly uses a loose policy, the > 'vm_committed_as' count could be a big negative value, as its big 'batch' > number allows a big deviation, then when the policy is changed to > OVERCOMMIT_NEVER, the 'batch' will be decreased to a much smaller value, > thus hits this WARN check. > > To mitigate this, one proposed solution is to queue work on all online > CPUs to do a local sync for 'vm_committed_as' when changing policy to > OVERCOMMIT_NEVER, plus some global syncing to garante the case won't > be hit. > > But this solution is costy and slow, given this check hasn't shown real > trouble or benefit, simply drop it from one hot path of MM. And perf > stats does show some tiny saving for removing it. The text looks more reasonable than the previous one. Reviewed-by: Qian Cai <cai@lca.pw> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check 2020-06-03 9:48 ` Feng Tang 2020-06-03 11:51 ` Qian Cai @ 2020-06-03 13:36 ` Michal Hocko 1 sibling, 0 replies; 16+ messages in thread From: Michal Hocko @ 2020-06-03 13:36 UTC (permalink / raw) To: Feng Tang Cc: Qian Cai, Andrew Morton, Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook, andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel, Konstantin Khlebnikov On Wed 03-06-20 17:48:04, Feng Tang wrote: > On Tue, Jun 02, 2020 at 12:02:22AM -0400, Qian Cai wrote: > > > > > > > On Jun 1, 2020, at 11:37 PM, Feng Tang <feng.tang@intel.com> wrote: > > > > > > I re-run the same benchmark with v5.7 and 5.7+remove_warning kernels, > > > the overall performance change is trivial (which is expected) > > > > > > 1330147 +0.1% 1331032 will-it-scale.72.processes > > > > > > But the perf stats of "self" shows big change for __vm_enough_memory() > > > > > > 0.27 -0.3 0.00 pp.self.__vm_enough_memory > > > > > > I post the full compare result in the end. > > > > I don’t really see what that means exactly, but I suppose the warning is there for so long and no one seems notice much trouble (or benefit) because of it, so I think you will probably need to come up with a proper justification to explain why it is a trouble now, and how your patchset suddenly start to trigger the warning as well as why it is no better way but to suffer this debuggability regression (probably tiny but still). > > Thanks for the suggestion, and I updated the commit log. > > > >From 1633da8228bd3d0dcbbd8df982977ad4594962a1 Mon Sep 17 00:00:00 2001 > From: Feng Tang <feng.tang@intel.com> > Date: Fri, 29 May 2020 08:48:48 +0800 > Subject: [PATCH] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as > underflow check > > This check was added by 82f71ae4a2b8 ("mm: catch memory commitment underflow") > in 2014 to have a safety check for issues which have been fixed. > And there has been few report caught by it, as described in its > commit log: > > : This shouldn't happen any more - the previous two patches fixed > : the committed_as underflow issues. > > But it was really found by Qian Cai when he used the LTP memory > stress suite to test a RFC patchset, which tries to improve scalability > of per-cpu counter 'vm_committed_as', by chosing a bigger 'batch' number > for loose overcommit policies (OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS), > while keeping current number for OVERCOMMIT_NEVER. > > With that patchset, when system firstly uses a loose policy, the > 'vm_committed_as' count could be a big negative value, as its big 'batch' > number allows a big deviation, then when the policy is changed to > OVERCOMMIT_NEVER, the 'batch' will be decreased to a much smaller value, > thus hits this WARN check. > > To mitigate this, one proposed solution is to queue work on all online > CPUs to do a local sync for 'vm_committed_as' when changing policy to > OVERCOMMIT_NEVER, plus some global syncing to garante the case won't > be hit. > > But this solution is costy and slow, given this check hasn't shown real > trouble or benefit, simply drop it from one hot path of MM. And perf > stats does show some tiny saving for removing it. > > Reported-by: Qian Cai <cai@lca.pw> > Signed-off-by: Feng Tang <feng.tang@intel.com> > Cc: Konstantin Khlebnikov <koct9i@gmail.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Andi Kleen <andi.kleen@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> > --- > mm/util.c | 8 -------- > 1 file changed, 8 deletions(-) > > diff --git a/mm/util.c b/mm/util.c > index 9b3be03..c63c8e4 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -814,14 +814,6 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) > { > long allowed; > > - /* > - * A transient decrease in the value is unlikely, so no need > - * READ_ONCE() for vm_committed_as.count. > - */ > - VM_WARN_ONCE(data_race(percpu_counter_read(&vm_committed_as) < > - -(s64)vm_committed_as_batch * num_online_cpus()), > - "memory commitment underflow"); > - > vm_acct_memory(pages); > > /* > -- > 2.7.4 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2020-06-04 1:38 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-29 1:06 [PATCH v4 0/4] make vm_committed_as_batch aware of vm overcommit policy Feng Tang 2020-05-29 1:06 ` [PATCH v4 1/4] proc/meminfo: avoid open coded reading of vm_committed_as Feng Tang 2020-05-29 1:06 ` [PATCH v4 2/4] mm/util.c: make vm_memory_committed() more accurate Feng Tang 2020-06-03 13:35 ` Michal Hocko 2020-06-03 14:28 ` Andi Kleen 2020-06-04 1:38 ` Feng Tang 2020-05-29 1:06 ` [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check Feng Tang 2020-05-29 2:49 ` Qian Cai 2020-05-29 5:37 ` Feng Tang 2020-06-02 3:37 ` Feng Tang 2020-05-29 1:06 ` [PATCH v4 4/4] mm: adjust vm_committed_as_batch according to vm overcommit policy Feng Tang 2020-06-03 13:38 ` Michal Hocko 2020-06-02 4:02 [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check Qian Cai 2020-06-03 9:48 ` Feng Tang 2020-06-03 11:51 ` Qian Cai 2020-06-03 13:36 ` Michal Hocko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.