linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
@ 2020-07-07 11:43 Qian Cai
  2020-07-07 12:06 ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: Qian Cai @ 2020-07-07 11:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Feng Tang, kernel test robot, Andrew Morton, Johannes Weiner,
	Matthew Wilcox, Mel Gorman, Kees Cook, Luis Chamberlain,
	Iurii Zaikin, andi.kleen, tim.c.chen, dave.hansen, ying.huang,
	linux-mm, linux-kernel, lkp



> On Jul 7, 2020, at 6:28 AM, Michal Hocko <mhocko@kernel.org> wrote:
> 
> Would you have any examples? Because I find this highly unlikely.
> OVERCOMMIT_NEVER only works when virtual memory is not largerly
> overcommited wrt to real memory demand. And that tends to be more of
> an exception rather than a rule. "Modern" userspace (whatever that
> means) tends to be really hungry with virtual memory which is only used
> very sparsely.
> 
> I would argue that either somebody is running an "OVERCOMMIT_NEVER"
> friendly SW and this is a permanent setting or this is not used at all.
> At least this is my experience.
> 
> So I strongly suspect that LTP test failure is not something we should
> really lose sleep over. It would be nice to find a way to flush existing
> batches but I would rather see a real workload that would suffer from
> this imprecision.

I hear you many times that you really don’t care about those use cases unless you hear exactly people are using in your world.

For example, when you said LTP oom tests are totally artificial last time and how less you care about if they are failing, and I could only enjoy their efficiencies to find many issues like race conditions and bad error accumulation handling etc that your “real world use cases” are going to take ages or no way to flag them.

There are just too many valid use cases in this wild world. The difference is that I admit that I don’t know or even aware all the use cases, and I don’t believe you do as well.

If a patchset broke the existing behaviors that written exactly in the spec, it is then someone has to prove its innocent. For example, if nobody is going to rely on something like this now and future, and then fix the spec and explain exactly nobody should be rely upon.

^ permalink raw reply	[flat|nested] 26+ messages in thread
* [PATCH v5 3/3] mm: adjust vm_committed_as_batch according to vm overcommit policy
@ 2020-06-21  7:36 Feng Tang
       [not found] ` <20200702063201.GG3874@shao2-debian>
  0 siblings, 1 reply; 26+ messages in thread
From: Feng Tang @ 2020-06-21  7:36 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox,
	Mel Gorman, Kees Cook, Luis Chamberlain, Iurii Zaikin,
	andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm,
	linux-kernel
  Cc: Feng Tang

When checking a performance change for will-it-scale scalability mmap test
[1], we found very high lock contention for spinlock of percpu counter
'vm_committed_as':

    94.14%     0.35%  [kernel.kallsyms]         [k] _raw_spin_lock_irqsave
    48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
    45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;

Actually this heavy lock contention is not always necessary.  The
'vm_committed_as' needs to be very precise when the strict
OVERCOMMIT_NEVER policy is set, which requires a rather small batch number
for the percpu counter.

So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and
lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies.  Also
add a sysctl handler to adjust it when the policy is reconfigured.

Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T
desktop, and 2097%(20X) on a 4S/72C/144T server.  We tested with test
platforms in 0day (server, desktop and laptop), and 80%+ platforms shows
improvements with that test.  And whether it shows improvements depends on
if the test mmap size is bigger than the batch number computed.

And if the lift is 16X, 1/3 of the platforms will show improvements,
though it should help the mmap/unmap usage generally, as Michal Hocko
mentioned:

: I believe that there are non-synthetic worklaods which would benefit from
: a larger batch.  E.g.  large in memory databases which do large mmaps
: during startups from multiple threads.

[1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/

Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
---
 include/linux/mm.h   |  2 ++
 include/linux/mman.h |  4 ++++
 kernel/sysctl.c      |  2 +-
 mm/mm_init.c         | 18 ++++++++++++++----
 mm/util.c            | 12 ++++++++++++
 5 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e6ff54a..d00facb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -206,6 +206,8 @@ int overcommit_ratio_handler(struct ctl_table *, int, void *, size_t *,
 		loff_t *);
 int overcommit_kbytes_handler(struct ctl_table *, int, void *, size_t *,
 		loff_t *);
+int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *,
+		loff_t *);
 
 #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
 
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 4b08e9c..91c93c1 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -57,8 +57,12 @@ extern struct percpu_counter vm_committed_as;
 
 #ifdef CONFIG_SMP
 extern s32 vm_committed_as_batch;
+extern void mm_compute_batch(void);
 #else
 #define vm_committed_as_batch 0
+static inline void mm_compute_batch(void)
+{
+}
 #endif
 
 unsigned long vm_memory_committed(void);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 40180cd..10dcc06 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2650,7 +2650,7 @@ static struct ctl_table vm_table[] = {
 		.data		= &sysctl_overcommit_memory,
 		.maxlen		= sizeof(sysctl_overcommit_memory),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= overcommit_policy_handler,
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= &two,
 	},
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 435e5f7..c5a6fb1 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -13,6 +13,7 @@
 #include <linux/memory.h>
 #include <linux/notifier.h>
 #include <linux/sched.h>
+#include <linux/mman.h>
 #include "internal.h"
 
 #ifdef CONFIG_DEBUG_MEMORY_INIT
@@ -144,14 +145,23 @@ EXPORT_SYMBOL_GPL(mm_kobj);
 #ifdef CONFIG_SMP
 s32 vm_committed_as_batch = 32;
 
-static void __meminit mm_compute_batch(void)
+void mm_compute_batch(void)
 {
 	u64 memsized_batch;
 	s32 nr = num_present_cpus();
 	s32 batch = max_t(s32, nr*2, 32);
-
-	/* batch size set to 0.4% of (total memory/#cpus), or max int32 */
-	memsized_batch = min_t(u64, (totalram_pages()/nr)/256, 0x7fffffff);
+	unsigned long ram_pages = totalram_pages();
+
+	/*
+	 * For policy of OVERCOMMIT_NEVER, set batch size to 0.4%
+	 * of (total memory/#cpus), and lift it to 25% for other
+	 * policies to easy the possible lock contention for percpu_counter
+	 * vm_committed_as, while the max limit is INT_MAX
+	 */
+	if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
+		memsized_batch = min_t(u64, ram_pages/nr/256, INT_MAX);
+	else
+		memsized_batch = min_t(u64, ram_pages/nr/4, INT_MAX);
 
 	vm_committed_as_batch = max_t(s32, memsized_batch, batch);
 }
diff --git a/mm/util.c b/mm/util.c
index 1c9d097..52ed9c1 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -746,6 +746,18 @@ int overcommit_ratio_handler(struct ctl_table *table, int write, void *buffer,
 	return ret;
 }
 
+int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer,
+		size_t *lenp, loff_t *ppos)
+{
+	int ret;
+
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+	if (ret == 0 && write)
+		mm_compute_batch();
+
+	return ret;
+}
+
 int overcommit_kbytes_handler(struct ctl_table *table, int write, void *buffer,
 		size_t *lenp, loff_t *ppos)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-07-10  3:26 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-07 11:43 [mm] 4e2c82a409: ltp.overcommit_memory01.fail Qian Cai
2020-07-07 12:06 ` Michal Hocko
2020-07-07 13:04   ` Qian Cai
2020-07-07 13:56     ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2020-06-21  7:36 [PATCH v5 3/3] mm: adjust vm_committed_as_batch according to vm overcommit policy Feng Tang
     [not found] ` <20200702063201.GG3874@shao2-debian>
2020-07-02  7:12   ` [mm] 4e2c82a409: ltp.overcommit_memory01.fail Feng Tang
2020-07-05  3:20     ` Qian Cai
2020-07-05  4:44     ` Feng Tang
2020-07-05 12:15       ` Qian Cai
2020-07-05 12:58         ` Feng Tang
2020-07-05 15:52           ` Qian Cai
2020-07-06  1:43             ` Feng Tang
2020-07-06  2:36               ` Qian Cai
2020-07-06 13:24                 ` Feng Tang
2020-07-06 13:34                   ` Andi Kleen
2020-07-06 23:42                     ` Andrew Morton
2020-07-07  2:38                     ` Feng Tang
2020-07-07  4:00                       ` Huang, Ying
2020-07-07  5:41                         ` Feng Tang
2020-07-09  4:55                           ` Feng Tang
2020-07-09 13:40                             ` Qian Cai
2020-07-09 14:15                               ` Feng Tang
2020-07-10  1:38                                 ` Feng Tang
2020-07-10  3:26                                   ` Qian Cai
2020-07-07  1:06                   ` Dennis Zhou
2020-07-07  3:24                     ` Feng Tang
2020-07-07 10:28             ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).