LKML Archive on lore.kernel.org
 help / color / Atom feed
* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
@ 2020-07-07 11:43 Qian Cai
  2020-07-07 12:06 ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: Qian Cai @ 2020-07-07 11:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Feng Tang, kernel test robot, Andrew Morton, Johannes Weiner,
	Matthew Wilcox, Mel Gorman, Kees Cook, Luis Chamberlain,
	Iurii Zaikin, andi.kleen, tim.c.chen, dave.hansen, ying.huang,
	linux-mm, linux-kernel, lkp



> On Jul 7, 2020, at 6:28 AM, Michal Hocko <mhocko@kernel.org> wrote:
> 
> Would you have any examples? Because I find this highly unlikely.
> OVERCOMMIT_NEVER only works when virtual memory is not largerly
> overcommited wrt to real memory demand. And that tends to be more of
> an exception rather than a rule. "Modern" userspace (whatever that
> means) tends to be really hungry with virtual memory which is only used
> very sparsely.
> 
> I would argue that either somebody is running an "OVERCOMMIT_NEVER"
> friendly SW and this is a permanent setting or this is not used at all.
> At least this is my experience.
> 
> So I strongly suspect that LTP test failure is not something we should
> really lose sleep over. It would be nice to find a way to flush existing
> batches but I would rather see a real workload that would suffer from
> this imprecision.

I hear you many times that you really don’t care about those use cases unless you hear exactly people are using in your world.

For example, when you said LTP oom tests are totally artificial last time and how less you care about if they are failing, and I could only enjoy their efficiencies to find many issues like race conditions and bad error accumulation handling etc that your “real world use cases” are going to take ages or no way to flag them.

There are just too many valid use cases in this wild world. The difference is that I admit that I don’t know or even aware all the use cases, and I don’t believe you do as well.

If a patchset broke the existing behaviors that written exactly in the spec, it is then someone has to prove its innocent. For example, if nobody is going to rely on something like this now and future, and then fix the spec and explain exactly nobody should be rely upon.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-07 11:43 [mm] 4e2c82a409: ltp.overcommit_memory01.fail Qian Cai
@ 2020-07-07 12:06 ` Michal Hocko
  2020-07-07 13:04   ` Qian Cai
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2020-07-07 12:06 UTC (permalink / raw)
  To: Qian Cai
  Cc: Feng Tang, kernel test robot, Andrew Morton, Johannes Weiner,
	Matthew Wilcox, Mel Gorman, Kees Cook, Luis Chamberlain,
	Iurii Zaikin, andi.kleen, tim.c.chen, dave.hansen, ying.huang,
	linux-mm, linux-kernel, lkp

On Tue 07-07-20 07:43:48, Qian Cai wrote:
> 
> 
> > On Jul 7, 2020, at 6:28 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > Would you have any examples? Because I find this highly unlikely.
> > OVERCOMMIT_NEVER only works when virtual memory is not largerly
> > overcommited wrt to real memory demand. And that tends to be more of
> > an exception rather than a rule. "Modern" userspace (whatever that
> > means) tends to be really hungry with virtual memory which is only used
> > very sparsely.
> > 
> > I would argue that either somebody is running an "OVERCOMMIT_NEVER"
> > friendly SW and this is a permanent setting or this is not used at all.
> > At least this is my experience.
> > 
> > So I strongly suspect that LTP test failure is not something we should
> > really lose sleep over. It would be nice to find a way to flush existing
> > batches but I would rather see a real workload that would suffer from
> > this imprecision.
> 
> I hear you many times that you really don’t care about those use
> cases unless you hear exactly people are using in your world.
> 
> For example, when you said LTP oom tests are totally artificial last
> time and how less you care about if they are failing, and I could only
> enjoy their efficiencies to find many issues like race conditions
> and bad error accumulation handling etc that your “real world use
> cases” are going to take ages or no way to flag them.

Yes, they are effective at hitting corner cases and that is fine. I
am not dismissing their usefulness. I have tried to explain that many
times but let me try again. Seeing a corner case and think about a
potential fix is one thing. On the other hand it is not really ideal to
treat such a failure a hard regression and consider otherwise useful
functionality/improvement to be reverted without a proper cost benefit
analysis. Sure having corner cases is not really nice but really, look
at this example again. Overcommit setting is a global thing, it is hard
to change it during runtime nilly willy. Because that might have really
detrimental side effects on all workloads running. So it is quite
reasonable to expect that this is either early after the boot or when
the system is in quiescent state when almost nothing but very core
services are running and likelihood that the mode of operation changes.

> There are just too many valid use cases in this wild world. The
> difference is that I admit that I don’t know or even aware all the
> use cases, and I don’t believe you do as well.

Me neither and I am not claiming that. All I am saying is that a real
risk of a regression is reasonably low that I wouldn't lose sleep over
that. It is perfectly fine to address this pro-actively if the fix is
reasonably maintainable. I was mostly reacting to your pushing for a
revert solely based on LTP results.

LTP is a very useful tool to raise awareness of potential problems but
you shouldn't really follow those results just blindly.

> If a patchset broke the existing behaviors that written exactly in
> the spec, it is then someone has to prove its innocent. For example,
> if nobody is going to rely on something like this now and future, and
> then fix the spec and explain exactly nobody should be rely upon.

I am all for clarifications in the documentation.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-07 12:06 ` Michal Hocko
@ 2020-07-07 13:04   ` Qian Cai
  2020-07-07 13:56     ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: Qian Cai @ 2020-07-07 13:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Feng Tang, kernel test robot, Andrew Morton, Johannes Weiner,
	Matthew Wilcox, Mel Gorman, Kees Cook, Luis Chamberlain,
	Iurii Zaikin, andi.kleen, tim.c.chen, dave.hansen, ying.huang,
	linux-mm, linux-kernel, lkp

On Tue, Jul 07, 2020 at 02:06:19PM +0200, Michal Hocko wrote:
> On Tue 07-07-20 07:43:48, Qian Cai wrote:
> > 
> > 
> > > On Jul 7, 2020, at 6:28 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > > 
> > > Would you have any examples? Because I find this highly unlikely.
> > > OVERCOMMIT_NEVER only works when virtual memory is not largerly
> > > overcommited wrt to real memory demand. And that tends to be more of
> > > an exception rather than a rule. "Modern" userspace (whatever that
> > > means) tends to be really hungry with virtual memory which is only used
> > > very sparsely.
> > > 
> > > I would argue that either somebody is running an "OVERCOMMIT_NEVER"
> > > friendly SW and this is a permanent setting or this is not used at all.
> > > At least this is my experience.
> > > 
> > > So I strongly suspect that LTP test failure is not something we should
> > > really lose sleep over. It would be nice to find a way to flush existing
> > > batches but I would rather see a real workload that would suffer from
> > > this imprecision.
> > 
> > I hear you many times that you really don’t care about those use
> > cases unless you hear exactly people are using in your world.
> > 
> > For example, when you said LTP oom tests are totally artificial last
> > time and how less you care about if they are failing, and I could only
> > enjoy their efficiencies to find many issues like race conditions
> > and bad error accumulation handling etc that your “real world use
> > cases” are going to take ages or no way to flag them.
> 
> Yes, they are effective at hitting corner cases and that is fine. I
> am not dismissing their usefulness. I have tried to explain that many
> times but let me try again. Seeing a corner case and think about a
> potential fix is one thing. On the other hand it is not really ideal to
> treat such a failure a hard regression and consider otherwise useful

Well, terms like "corner cases" and "hard regression" are rather
subjective.

> functionality/improvement to be reverted without a proper cost benefit
> analysis. Sure having corner cases is not really nice but really, look
> at this example again. Overcommit setting is a global thing, it is hard
> to change it during runtime nilly willy. Because that might have really
> detrimental side effects on all workloads running. So it is quite
> reasonable to expect that this is either early after the boot or when
> the system is in quiescent state when almost nothing but very core
> services are running and likelihood that the mode of operation changes.

Not really convinced that is only way people will use those tunables.

> 
> > There are just too many valid use cases in this wild world. The
> > difference is that I admit that I don’t know or even aware all the
> > use cases, and I don’t believe you do as well.
> 
> Me neither and I am not claiming that. All I am saying is that a real
> risk of a regression is reasonably low that I wouldn't lose sleep over
> that. It is perfectly fine to address this pro-actively if the fix is
> reasonably maintainable. I was mostly reacting to your pushing for a
> revert solely based on LTP results.
> 
> LTP is a very useful tool to raise awareness of potential problems but
> you shouldn't really follow those results just blindly.

You must think I am a newbie tester to give me this piece of advice
then.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-07 13:04   ` Qian Cai
@ 2020-07-07 13:56     ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2020-07-07 13:56 UTC (permalink / raw)
  To: Qian Cai
  Cc: Feng Tang, kernel test robot, Andrew Morton, Johannes Weiner,
	Matthew Wilcox, Mel Gorman, Kees Cook, Luis Chamberlain,
	Iurii Zaikin, andi.kleen, tim.c.chen, dave.hansen, ying.huang,
	linux-mm, linux-kernel, lkp

On Tue 07-07-20 09:04:36, Qian Cai wrote:
> On Tue, Jul 07, 2020 at 02:06:19PM +0200, Michal Hocko wrote:
> > On Tue 07-07-20 07:43:48, Qian Cai wrote:
> > > 
> > > 
> > > > On Jul 7, 2020, at 6:28 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > > > 
> > > > Would you have any examples? Because I find this highly unlikely.
> > > > OVERCOMMIT_NEVER only works when virtual memory is not largerly
> > > > overcommited wrt to real memory demand. And that tends to be more of
> > > > an exception rather than a rule. "Modern" userspace (whatever that
> > > > means) tends to be really hungry with virtual memory which is only used
> > > > very sparsely.
> > > > 
> > > > I would argue that either somebody is running an "OVERCOMMIT_NEVER"
> > > > friendly SW and this is a permanent setting or this is not used at all.
> > > > At least this is my experience.
> > > > 
> > > > So I strongly suspect that LTP test failure is not something we should
> > > > really lose sleep over. It would be nice to find a way to flush existing
> > > > batches but I would rather see a real workload that would suffer from
> > > > this imprecision.
> > > 
> > > I hear you many times that you really don’t care about those use
> > > cases unless you hear exactly people are using in your world.
> > > 
> > > For example, when you said LTP oom tests are totally artificial last
> > > time and how less you care about if they are failing, and I could only
> > > enjoy their efficiencies to find many issues like race conditions
> > > and bad error accumulation handling etc that your “real world use
> > > cases” are going to take ages or no way to flag them.
> > 
> > Yes, they are effective at hitting corner cases and that is fine. I
> > am not dismissing their usefulness. I have tried to explain that many
> > times but let me try again. Seeing a corner case and think about a
> > potential fix is one thing. On the other hand it is not really ideal to
> > treat such a failure a hard regression and consider otherwise useful
> 
> Well, terms like "corner cases" and "hard regression" are rather
> subjective.

Existing real life examples really makes them less subjective though.

[...]

> > LTP is a very useful tool to raise awareness of potential problems but
> > you shouldn't really follow those results just blindly.
> 
> You must think I am a newbie tester to give me this piece of advice
> then.

Not by even close. I can clearly see your involvement in testing and how
many good bug reports that results in.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-10  1:38                                 ` Feng Tang
@ 2020-07-10  3:26                                   ` Qian Cai
  0 siblings, 0 replies; 26+ messages in thread
From: Qian Cai @ 2020-07-10  3:26 UTC (permalink / raw)
  To: Feng Tang
  Cc: Huang, Ying, Andi Kleen, Andrew Morton, Michal Hocko,
	Dennis Zhou, Tejun Heo, Christoph Lameter, kernel test robot,
	Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook,
	Luis Chamberlain, Iurii Zaikin, tim.c.chen, dave.hansen,
	linux-mm, linux-kernel, lkp



> On Jul 9, 2020, at 9:38 PM, Feng Tang <feng.tang@intel.com> wrote:
> 
> Give it a second thought, my previous way has more indents and lines,
> but it is easier to be understood that we have special handling for
> 'write' case. So I would prefer using it. 
> 
> Thoughts?

I don’t feel it is easier to understand. I generally prefer to bail out early if possible to also make code a bit more solid for future extensions (once the indentation reached 3+ levels, we will need to rework it).

But, I realize that I have spent too much time debugging than actually writing code those days, so my taste is probably not all that good. Thus, feel free to submit what style you prefer, so other people have more experience coding could review them more.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-09 14:15                               ` Feng Tang
@ 2020-07-10  1:38                                 ` Feng Tang
  2020-07-10  3:26                                   ` Qian Cai
  0 siblings, 1 reply; 26+ messages in thread
From: Feng Tang @ 2020-07-10  1:38 UTC (permalink / raw)
  To: Qian Cai
  Cc: Huang, Ying, Andi Kleen, Andrew Morton, Michal Hocko,
	Dennis Zhou, Tejun Heo, Christoph Lameter, kernel test robot,
	Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook,
	Luis Chamberlain, Iurii Zaikin, tim.c.chen, dave.hansen,
	linux-mm, linux-kernel, lkp

On Thu, Jul 09, 2020 at 10:15:19PM +0800, Feng Tang wrote:
> Hi Qian Cai,
> 
> On Thu, Jul 09, 2020 at 09:40:40AM -0400, Qian Cai wrote:
> > > > > Can we change the batch firstly, then sync the global counter, finally
> > > > > change the overcommit policy?
> > > > 
> > > > These reorderings are really head scratching :)
> > > > 
> > > > I've thought about this before when Qian Cai first reported the warning
> > > > message, as kernel had a check: 
> > > > 
> > > > 	VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) <
> > > > 			-(s64)vm_committed_as_batch * num_online_cpus(),
> > > > 			"memory commitment underflow");
> > > > 
> > > > If the batch is decreased first, the warning will be easier/earlier to be
> > > > triggered, so I didn't brought this up when handling the warning message.
> > > > 
> > > > But it might work now, as the warning has been removed.
> > > 
> > > I tested the reorder way, and the test could pass in 100 times run. The
> > > new order when changing policy to OVERCOMMIT_NEVER:
> > >   1. re-compute the batch ( to the smaller one)
> > >   2. do the on_each_cpu sync
> > >   3. really change the policy to NEVER.
> > > 
> > > It solves one of previous concern, that after the sync is done on cpuX,
> > > but before the whole sync on all cpus are done, there is a window that
> > > the percpu-counter could be enlarged again.
> > > 
> > > IIRC Andi had concern about read side cost when doing the sync, my
> > > understanding is most of the readers (malloc/free/map/unmap) are using
> > > percpu_counter_read_positive, which is a fast path without involving lock.
> > > 
> > > As for the problem itself, I agree with Michal's point, that usually there
> > > is no normal case that will change the overcommit_policy too frequently.
> > > 
> > > The code logic is mainly in overcommit_policy_handler(), based on the
> > > previous sync fix. please help to review, thanks!
> > > 
> > > int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer,
> > > 		size_t *lenp, loff_t *ppos)
> > > {
> > > 	int ret;
> > > 
> > > 	if (write) {
> > > 		int new_policy;
> > > 		struct ctl_table t;
> > > 
> > > 		t = *table;
> > > 		t.data = &new_policy;
> > > 		ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
> > > 		if (ret)
> > > 			return ret;
> > > 
> > > 		mm_compute_batch(new_policy);
> > > 		if (new_policy == OVERCOMMIT_NEVER)
> > > 			schedule_on_each_cpu(sync_overcommit_as);
> > > 		sysctl_overcommit_memory = new_policy;
> > > 	} else {
> > > 		ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> > > 	}
> > > 
> > > 	return ret;
> > > }
> > 
> > Rather than having to indent those many lines, how about this?
> 
> Thanks for the cleanup suggestion.
> 
> > t = *table;
> > t.data = &new_policy;
> 
> The input table->data is actually &sysctl_overcommit_memory, so
> there is a problem for "read" case, it will return the 'new_policy'
> value instead of real sysctl_overcommit_memory.
> 
> It should work after adding a check
> 	if (write)
> 		t.data = &new_policy;
> 
> > ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> 			    --> &t
 
Give it a second thought, my previous way has more indents and lines,
but it is easier to be understood that we have special handling for
'write' case. So I would prefer using it. 

Thoughts?

Thanks,
Feng

> Thanks,
> Feng
> 	
> > if (ret || !write)
> > 	return ret;
> > mm_compute_batch(new_policy);
> > if (new_policy == OVERCOMMIT_NEVER)
> > 	schedule_on_each_cpu(sync_overcommit_as);
> > 
> > sysctl_overcommit_memory = new_policy;
> > return ret;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-09 13:40                             ` Qian Cai
@ 2020-07-09 14:15                               ` Feng Tang
  2020-07-10  1:38                                 ` Feng Tang
  0 siblings, 1 reply; 26+ messages in thread
From: Feng Tang @ 2020-07-09 14:15 UTC (permalink / raw)
  To: Qian Cai
  Cc: Huang, Ying, Andi Kleen, Andrew Morton, Michal Hocko,
	Dennis Zhou, Tejun Heo, Christoph Lameter, kernel test robot,
	Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook,
	Luis Chamberlain, Iurii Zaikin, tim.c.chen, dave.hansen,
	linux-mm, linux-kernel, lkp

Hi Qian Cai,

On Thu, Jul 09, 2020 at 09:40:40AM -0400, Qian Cai wrote:
> > > > Can we change the batch firstly, then sync the global counter, finally
> > > > change the overcommit policy?
> > > 
> > > These reorderings are really head scratching :)
> > > 
> > > I've thought about this before when Qian Cai first reported the warning
> > > message, as kernel had a check: 
> > > 
> > > 	VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) <
> > > 			-(s64)vm_committed_as_batch * num_online_cpus(),
> > > 			"memory commitment underflow");
> > > 
> > > If the batch is decreased first, the warning will be easier/earlier to be
> > > triggered, so I didn't brought this up when handling the warning message.
> > > 
> > > But it might work now, as the warning has been removed.
> > 
> > I tested the reorder way, and the test could pass in 100 times run. The
> > new order when changing policy to OVERCOMMIT_NEVER:
> >   1. re-compute the batch ( to the smaller one)
> >   2. do the on_each_cpu sync
> >   3. really change the policy to NEVER.
> > 
> > It solves one of previous concern, that after the sync is done on cpuX,
> > but before the whole sync on all cpus are done, there is a window that
> > the percpu-counter could be enlarged again.
> > 
> > IIRC Andi had concern about read side cost when doing the sync, my
> > understanding is most of the readers (malloc/free/map/unmap) are using
> > percpu_counter_read_positive, which is a fast path without involving lock.
> > 
> > As for the problem itself, I agree with Michal's point, that usually there
> > is no normal case that will change the overcommit_policy too frequently.
> > 
> > The code logic is mainly in overcommit_policy_handler(), based on the
> > previous sync fix. please help to review, thanks!
> > 
> > int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer,
> > 		size_t *lenp, loff_t *ppos)
> > {
> > 	int ret;
> > 
> > 	if (write) {
> > 		int new_policy;
> > 		struct ctl_table t;
> > 
> > 		t = *table;
> > 		t.data = &new_policy;
> > 		ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
> > 		if (ret)
> > 			return ret;
> > 
> > 		mm_compute_batch(new_policy);
> > 		if (new_policy == OVERCOMMIT_NEVER)
> > 			schedule_on_each_cpu(sync_overcommit_as);
> > 		sysctl_overcommit_memory = new_policy;
> > 	} else {
> > 		ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> > 	}
> > 
> > 	return ret;
> > }
> 
> Rather than having to indent those many lines, how about this?

Thanks for the cleanup suggestion.

> t = *table;
> t.data = &new_policy;

The input table->data is actually &sysctl_overcommit_memory, so
there is a problem for "read" case, it will return the 'new_policy'
value instead of real sysctl_overcommit_memory.

It should work after adding a check
	if (write)
		t.data = &new_policy;

> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
			    --> &t

Thanks,
Feng
	
> if (ret || !write)
> 	return ret;
> mm_compute_batch(new_policy);
> if (new_policy == OVERCOMMIT_NEVER)
> 	schedule_on_each_cpu(sync_overcommit_as);
> 
> sysctl_overcommit_memory = new_policy;
> return ret;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-09  4:55                           ` Feng Tang
@ 2020-07-09 13:40                             ` Qian Cai
  2020-07-09 14:15                               ` Feng Tang
  0 siblings, 1 reply; 26+ messages in thread
From: Qian Cai @ 2020-07-09 13:40 UTC (permalink / raw)
  To: Feng Tang
  Cc: Huang, Ying, Andi Kleen, Andrew Morton, Michal Hocko,
	Dennis Zhou, Tejun Heo, Christoph Lameter, kernel test robot,
	Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook,
	Luis Chamberlain, Iurii Zaikin, tim.c.chen, dave.hansen,
	linux-mm, linux-kernel, lkp

On Thu, Jul 09, 2020 at 12:55:54PM +0800, Feng Tang wrote:
> On Tue, Jul 07, 2020 at 01:41:20PM +0800, Feng Tang wrote:
> > On Tue, Jul 07, 2020 at 12:00:09PM +0800, Huang, Ying wrote:
> > > Feng Tang <feng.tang@intel.com> writes:
> > > 
> > > > On Mon, Jul 06, 2020 at 06:34:34AM -0700, Andi Kleen wrote:
> > > >> >  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> > > >> > -	if (ret == 0 && write)
> > > >> > +	if (ret == 0 && write) {
> > > >> > +		if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
> > > >> > +			schedule_on_each_cpu(sync_overcommit_as);
> > > >> 
> > > >> The schedule_on_each_cpu is not atomic, so the problem could still happen
> > > >> in that window.
> > > >> 
> > > >> I think it may be ok if it eventually resolves, but certainly needs
> > > >> a comment explaining it. Can you do some stress testing toggling the
> > > >> policy all the time on different CPUs and running the test on
> > > >> other CPUs and see if the test fails?
> > > >
> > > > For the raw test case reported by 0day, this patch passed in 200 times
> > > > run. And I will read the ltp code and try stress testing it as you
> > > > suggested.
> > > >
> > > >
> > > >> The other alternative would be to define some intermediate state
> > > >> for the sysctl variable and only switch to never once the schedule_on_each_cpu
> > > >> returned. But that's more complexity.
> > > >
> > > > One thought I had is to put this schedule_on_each_cpu() before
> > > > the proc_dointvec_minmax() to do the sync before sysctl_overcommit_memory
> > > > is really changed. But the window still exists, as the batch is
> > > > still the larger one. 
> > > 
> > > Can we change the batch firstly, then sync the global counter, finally
> > > change the overcommit policy?
> > 
> > These reorderings are really head scratching :)
> > 
> > I've thought about this before when Qian Cai first reported the warning
> > message, as kernel had a check: 
> > 
> > 	VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) <
> > 			-(s64)vm_committed_as_batch * num_online_cpus(),
> > 			"memory commitment underflow");
> > 
> > If the batch is decreased first, the warning will be easier/earlier to be
> > triggered, so I didn't brought this up when handling the warning message.
> > 
> > But it might work now, as the warning has been removed.
> 
> I tested the reorder way, and the test could pass in 100 times run. The
> new order when changing policy to OVERCOMMIT_NEVER:
>   1. re-compute the batch ( to the smaller one)
>   2. do the on_each_cpu sync
>   3. really change the policy to NEVER.
> 
> It solves one of previous concern, that after the sync is done on cpuX,
> but before the whole sync on all cpus are done, there is a window that
> the percpu-counter could be enlarged again.
> 
> IIRC Andi had concern about read side cost when doing the sync, my
> understanding is most of the readers (malloc/free/map/unmap) are using
> percpu_counter_read_positive, which is a fast path without involving lock.
> 
> As for the problem itself, I agree with Michal's point, that usually there
> is no normal case that will change the overcommit_policy too frequently.
> 
> The code logic is mainly in overcommit_policy_handler(), based on the
> previous sync fix. please help to review, thanks!
> 
> int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer,
> 		size_t *lenp, loff_t *ppos)
> {
> 	int ret;
> 
> 	if (write) {
> 		int new_policy;
> 		struct ctl_table t;
> 
> 		t = *table;
> 		t.data = &new_policy;
> 		ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
> 		if (ret)
> 			return ret;
> 
> 		mm_compute_batch(new_policy);
> 		if (new_policy == OVERCOMMIT_NEVER)
> 			schedule_on_each_cpu(sync_overcommit_as);
> 		sysctl_overcommit_memory = new_policy;
> 	} else {
> 		ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> 	}
> 
> 	return ret;
> }

Rather than having to indent those many lines, how about this?

t = *table;
t.data = &new_policy;
ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (ret || !write)
	return ret;

mm_compute_batch(new_policy);
if (new_policy == OVERCOMMIT_NEVER)
	schedule_on_each_cpu(sync_overcommit_as);

sysctl_overcommit_memory = new_policy;
return ret;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-07  5:41                         ` Feng Tang
@ 2020-07-09  4:55                           ` Feng Tang
  2020-07-09 13:40                             ` Qian Cai
  0 siblings, 1 reply; 26+ messages in thread
From: Feng Tang @ 2020-07-09  4:55 UTC (permalink / raw)
  To: Huang, Ying, Andi Kleen, Qian Cai, Andrew Morton, Michal Hocko
  Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, kernel test robot,
	Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook,
	Luis Chamberlain, Iurii Zaikin, tim.c.chen, dave.hansen,
	linux-mm, linux-kernel, lkp

On Tue, Jul 07, 2020 at 01:41:20PM +0800, Feng Tang wrote:
> On Tue, Jul 07, 2020 at 12:00:09PM +0800, Huang, Ying wrote:
> > Feng Tang <feng.tang@intel.com> writes:
> > 
> > > On Mon, Jul 06, 2020 at 06:34:34AM -0700, Andi Kleen wrote:
> > >> >  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> > >> > -	if (ret == 0 && write)
> > >> > +	if (ret == 0 && write) {
> > >> > +		if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
> > >> > +			schedule_on_each_cpu(sync_overcommit_as);
> > >> 
> > >> The schedule_on_each_cpu is not atomic, so the problem could still happen
> > >> in that window.
> > >> 
> > >> I think it may be ok if it eventually resolves, but certainly needs
> > >> a comment explaining it. Can you do some stress testing toggling the
> > >> policy all the time on different CPUs and running the test on
> > >> other CPUs and see if the test fails?
> > >
> > > For the raw test case reported by 0day, this patch passed in 200 times
> > > run. And I will read the ltp code and try stress testing it as you
> > > suggested.
> > >
> > >
> > >> The other alternative would be to define some intermediate state
> > >> for the sysctl variable and only switch to never once the schedule_on_each_cpu
> > >> returned. But that's more complexity.
> > >
> > > One thought I had is to put this schedule_on_each_cpu() before
> > > the proc_dointvec_minmax() to do the sync before sysctl_overcommit_memory
> > > is really changed. But the window still exists, as the batch is
> > > still the larger one. 
> > 
> > Can we change the batch firstly, then sync the global counter, finally
> > change the overcommit policy?
> 
> These reorderings are really head scratching :)
> 
> I've thought about this before when Qian Cai first reported the warning
> message, as kernel had a check: 
> 
> 	VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) <
> 			-(s64)vm_committed_as_batch * num_online_cpus(),
> 			"memory commitment underflow");
> 
> If the batch is decreased first, the warning will be easier/earlier to be
> triggered, so I didn't brought this up when handling the warning message.
> 
> But it might work now, as the warning has been removed.

I tested the reorder way, and the test could pass in 100 times run. The
new order when changing policy to OVERCOMMIT_NEVER:
  1. re-compute the batch ( to the smaller one)
  2. do the on_each_cpu sync
  3. really change the policy to NEVER.

It solves one of previous concern, that after the sync is done on cpuX,
but before the whole sync on all cpus are done, there is a window that
the percpu-counter could be enlarged again.

IIRC Andi had concern about read side cost when doing the sync, my
understanding is most of the readers (malloc/free/map/unmap) are using
percpu_counter_read_positive, which is a fast path without involving lock.

As for the problem itself, I agree with Michal's point, that usually there
is no normal case that will change the overcommit_policy too frequently.

The code logic is mainly in overcommit_policy_handler(), based on the
previous sync fix. please help to review, thanks!

int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer,
		size_t *lenp, loff_t *ppos)
{
	int ret;

	if (write) {
		int new_policy;
		struct ctl_table t;

		t = *table;
		t.data = &new_policy;
		ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
		if (ret)
			return ret;

		mm_compute_batch(new_policy);
		if (new_policy == OVERCOMMIT_NEVER)
			schedule_on_each_cpu(sync_overcommit_as);
		sysctl_overcommit_memory = new_policy;
	} else {
		ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
	}

	return ret;
}

- Feng



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-05 15:52           ` Qian Cai
  2020-07-06  1:43             ` Feng Tang
@ 2020-07-07 10:28             ` Michal Hocko
  1 sibling, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2020-07-07 10:28 UTC (permalink / raw)
  To: Qian Cai
  Cc: Feng Tang, kernel test robot, Andrew Morton, Johannes Weiner,
	Matthew Wilcox, Mel Gorman, Kees Cook, Luis Chamberlain,
	Iurii Zaikin, andi.kleen, tim.c.chen, dave.hansen, ying.huang,
	linux-mm, linux-kernel, lkp

On Sun 05-07-20 11:52:32, Qian Cai wrote:
> On Sun, Jul 05, 2020 at 08:58:54PM +0800, Feng Tang wrote:
> > On Sun, Jul 05, 2020 at 08:15:03AM -0400, Qian Cai wrote:
> > > 
> > > 
> > > > On Jul 5, 2020, at 12:45 AM, Feng Tang <feng.tang@intel.com> wrote:
> > > > 
> > > > I did reproduce the problem, and from the debugging, this should
> > > > be the same root cause as lore.kernel.org/lkml/20200526181459.GD991@lca.pw/
> > > > that loosing the batch cause some accuracy problem, and the solution of
> > > > adding some sync is still needed, which is dicussed in
> > > 
> > > Well, before taking any of those patches now to fix the regression,
> > > we will need some performance data first. If it turned out the
> > > original performance gain is no longer relevant anymore due to this
> > > regression fix on top, it is best to drop this patchset and restore
> > > that VM_WARN_ONCE, so you can retry later once you found a better
> > > way to optimize.
> > 
> > The fix of adding sync only happens when the memory policy is being
> > changed to OVERCOMMIT_NEVER, which is not a frequent operation in
> > normal cases.
> > 
> > For the performance improvment data both in commit log and 0day report
> > https://lore.kernel.org/lkml/20200622132548.GS5535@shao2-debian/
> > it is for the will-it-scale's mmap testcase, which will not runtime
> > change memory overcommit policy, so the data should be still valid
> > with this fix.
> 
> Well, I would expect people are perfectly reasonable to use
> OVERCOMMIT_NEVER for some workloads making it more frequent operations.

Would you have any examples? Because I find this highly unlikely.
OVERCOMMIT_NEVER only works when virtual memory is not largerly
overcommited wrt to real memory demand. And that tends to be more of
an exception rather than a rule. "Modern" userspace (whatever that
means) tends to be really hungry with virtual memory which is only used
very sparsely.

I would argue that either somebody is running an "OVERCOMMIT_NEVER"
friendly SW and this is a permanent setting or this is not used at all.
At least this is my experience.

So I strongly suspect that LTP test failure is not something we should
really lose sleep over. It would be nice to find a way to flush existing
batches but I would rather see a real workload that would suffer from
this imprecision.

On the other hand perf. boost with larger batches with defualt overcommit
setting sounds like a nice improvement to have.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-07  4:00                       ` Huang, Ying
@ 2020-07-07  5:41                         ` Feng Tang
  2020-07-09  4:55                           ` Feng Tang
  0 siblings, 1 reply; 26+ messages in thread
From: Feng Tang @ 2020-07-07  5:41 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andi Kleen, Qian Cai, Andrew Morton, Dennis Zhou, Tejun Heo,
	Christoph Lameter, kernel test robot, Michal Hocko,
	Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook,
	Luis Chamberlain, Iurii Zaikin, tim.c.chen, dave.hansen,
	linux-mm, linux-kernel, lkp

On Tue, Jul 07, 2020 at 12:00:09PM +0800, Huang, Ying wrote:
> Feng Tang <feng.tang@intel.com> writes:
> 
> > On Mon, Jul 06, 2020 at 06:34:34AM -0700, Andi Kleen wrote:
> >> >  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> >> > -	if (ret == 0 && write)
> >> > +	if (ret == 0 && write) {
> >> > +		if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
> >> > +			schedule_on_each_cpu(sync_overcommit_as);
> >> 
> >> The schedule_on_each_cpu is not atomic, so the problem could still happen
> >> in that window.
> >> 
> >> I think it may be ok if it eventually resolves, but certainly needs
> >> a comment explaining it. Can you do some stress testing toggling the
> >> policy all the time on different CPUs and running the test on
> >> other CPUs and see if the test fails?
> >
> > For the raw test case reported by 0day, this patch passed in 200 times
> > run. And I will read the ltp code and try stress testing it as you
> > suggested.
> >
> >
> >> The other alternative would be to define some intermediate state
> >> for the sysctl variable and only switch to never once the schedule_on_each_cpu
> >> returned. But that's more complexity.
> >
> > One thought I had is to put this schedule_on_each_cpu() before
> > the proc_dointvec_minmax() to do the sync before sysctl_overcommit_memory
> > is really changed. But the window still exists, as the batch is
> > still the larger one. 
> 
> Can we change the batch firstly, then sync the global counter, finally
> change the overcommit policy?

These reorderings are really head scratching :)

I've thought about this before when Qian Cai first reported the warning
message, as kernel had a check: 

	VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) <
			-(s64)vm_committed_as_batch * num_online_cpus(),
			"memory commitment underflow");

If the batch is decreased first, the warning will be easier/earlier to be
triggered, so I didn't brought this up when handling the warning message.

But it might work now, as the warning has been removed.

Thanks,
Feng




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-07  2:38                     ` Feng Tang
@ 2020-07-07  4:00                       ` Huang, Ying
  2020-07-07  5:41                         ` Feng Tang
  0 siblings, 1 reply; 26+ messages in thread
From: Huang, Ying @ 2020-07-07  4:00 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andi Kleen, Qian Cai, Andrew Morton, Dennis Zhou, Tejun Heo,
	Christoph Lameter, kernel test robot, Michal Hocko,
	Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook,
	Luis Chamberlain, Iurii Zaikin, tim.c.chen, dave.hansen,
	linux-mm, linux-kernel, lkp

Feng Tang <feng.tang@intel.com> writes:

> On Mon, Jul 06, 2020 at 06:34:34AM -0700, Andi Kleen wrote:
>> >  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>> > -	if (ret == 0 && write)
>> > +	if (ret == 0 && write) {
>> > +		if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
>> > +			schedule_on_each_cpu(sync_overcommit_as);
>> 
>> The schedule_on_each_cpu is not atomic, so the problem could still happen
>> in that window.
>> 
>> I think it may be ok if it eventually resolves, but certainly needs
>> a comment explaining it. Can you do some stress testing toggling the
>> policy all the time on different CPUs and running the test on
>> other CPUs and see if the test fails?
>
> For the raw test case reported by 0day, this patch passed in 200 times
> run. And I will read the ltp code and try stress testing it as you
> suggested.
>
>
>> The other alternative would be to define some intermediate state
>> for the sysctl variable and only switch to never once the schedule_on_each_cpu
>> returned. But that's more complexity.
>
> One thought I had is to put this schedule_on_each_cpu() before
> the proc_dointvec_minmax() to do the sync before sysctl_overcommit_memory
> is really changed. But the window still exists, as the batch is
> still the larger one. 

Can we change the batch firstly, then sync the global counter, finally
change the overcommit policy?

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-07  1:06                   ` Dennis Zhou
@ 2020-07-07  3:24                     ` Feng Tang
  0 siblings, 0 replies; 26+ messages in thread
From: Feng Tang @ 2020-07-07  3:24 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Qian Cai, Andrew Morton, Tejun Heo, Christoph Lameter,
	kernel test robot, Michal Hocko, Johannes Weiner, Matthew Wilcox,
	Mel Gorman, Kees Cook, Luis Chamberlain, Iurii Zaikin,
	andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm,
	linux-kernel, lkp

On Tue, Jul 07, 2020 at 01:06:51AM +0000, Dennis Zhou wrote:
> On Mon, Jul 06, 2020 at 09:24:43PM +0800, Feng Tang wrote:
> > Hi All,
> > 
> > Please help to review this fix patch, thanks!
> > 
> > It is against today's linux-mm tree. For easy review, I put the fix
> > into one patch, and I could split it to 2 parts for percpu-counter
> > and mm/util.c if it's preferred.
> > 
> > From 593f9dc139181a7c3bb1705aacd1f625f400e458 Mon Sep 17 00:00:00 2001
> > From: Feng Tang <feng.tang@intel.com>
> > Date: Mon, 6 Jul 2020 14:48:29 +0800
> > Subject: [PATCH] mm/util.c: sync vm_committed_as when changing memory policy
> >  to OVERCOMMIT_NEVER
> > 
> > With the patch to improve scalability of vm_committed_as [1], 0day reported
> > the ltp overcommit_memory test case could fail (fail rate is about 5/50) [2].
> > The root cause is when system is running with loose memory overcommit policy
> > like OVERCOMMIT_GUESS/ALWAYS, the deviation of vm_committed_as could be big,
> > and once the policy is runtime changed to OVERCOMMIT_NEVER, vm_committed_as's 
> > batch is decreased to 1/64 of original one, but the deviation is not
> > compensated accordingly, and following __vm_enough_memory() check for vm
> > overcommit could be wrong due to this deviation, which breaks the ltp
> > overcommit_memory case.
> > 
> > Fix it by forcing a sync for percpu counter vm_committed_as when overcommit
> > policy is changed to OVERCOMMIT_NEVER (sysctl -w vm.overcommit_memory=2).
> > The sync itself is not a fast operation, and is toleratable given user is
> > not expected to frequently changing policy to OVERCOMMIT_NEVER.
> > 
> > [1] https://lore.kernel.org/lkml/1592725000-73486-1-git-send-email-feng.tang@intel.com/
> > [2] https://marc.info/?l=linux-mm&m=159367156428286 (can't find a link in lore.kernel.org)
> > 
> > Reported-by: kernel test robot <rong.a.chen@intel.com>
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > ---
> >  include/linux/percpu_counter.h |  4 ++++
> >  lib/percpu_counter.c           | 14 ++++++++++++++
> >  mm/util.c                      | 11 ++++++++++-
> >  3 files changed, 28 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
> > index 0a4f54d..01861ee 100644
> > --- a/include/linux/percpu_counter.h
> > +++ b/include/linux/percpu_counter.h
> > @@ -44,6 +44,7 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount,
> >  			      s32 batch);
> >  s64 __percpu_counter_sum(struct percpu_counter *fbc);
> >  int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch);
> > +void percpu_counter_sync(struct percpu_counter *fbc);
> >  
> >  static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
> >  {
> > @@ -172,6 +173,9 @@ static inline bool percpu_counter_initialized(struct percpu_counter *fbc)
> >  	return true;
> >  }
> >  
> > +static inline void percpu_counter_sync(struct percpu_counter *fbc)
> > +{
> > +}
> >  #endif	/* CONFIG_SMP */
> >  
> >  static inline void percpu_counter_inc(struct percpu_counter *fbc)
> > diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
> > index a66595b..02d87fc 100644
> > --- a/lib/percpu_counter.c
> > +++ b/lib/percpu_counter.c
> > @@ -98,6 +98,20 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
> >  }
> >  EXPORT_SYMBOL(percpu_counter_add_batch);
> >  
> > +void percpu_counter_sync(struct percpu_counter *fbc)
> > +{
> > +	unsigned long flags;
> > +	s64 count;
> > +
> > +	raw_spin_lock_irqsave(&fbc->lock, flags);
> > +	count = __this_cpu_read(*fbc->counters);
> > +	fbc->count += count;
> > +	__this_cpu_sub(*fbc->counters, count);
> > +	raw_spin_unlock_irqrestore(&fbc->lock, flags);
> > +}
> > +EXPORT_SYMBOL(percpu_counter_sync);
> > +
> > +
> >  /*
> >   * Add up all the per-cpu counts, return the result.  This is a more accurate
> >   * but much slower version of percpu_counter_read_positive()
> > diff --git a/mm/util.c b/mm/util.c
> > index 52ed9c1..5fb62c0 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -746,14 +746,23 @@ int overcommit_ratio_handler(struct ctl_table *table, int write, void *buffer,
> >  	return ret;
> >  }
> >  
> > +static void sync_overcommit_as(struct work_struct *dummy)
> > +{
> > +	percpu_counter_sync(&vm_committed_as);
> > +}
> > +
> 
> This seems like a rather niche use case as it's currently coupled with a
> schedule_on_each_cpu(). I can't imagine a use case where you'd want to
> do this without being called by schedule_on_each_cpu().

Yes!

> 
> Would it be better to modify or introduce something akin to
> percpu_counter_sum() which sums and folds in the counter state? I'd be
> curious to see what the cost of always folding would be as this is
> already considered the cold path and would help with the next batch too.

Initially, I also thought about doing the sync just like percpu_counter_sum():

	raw_spin_lock_irqsave
	for_each_online_cpu(cpu)			}
		do-the-sync	
	raw_spin_unlock_irqrestore

One problem is the per_cpu_ptr(fbc->counters, cpu) could still be
updated on other CPUs as the fast path update is not protected by
fbc->lock.

As for cost, it is about about 800 nanoseconds on a 2C/4T platform 
and 2~3 microseconds on a 2S/36C/72T Skylake server in normal case,
and in worst case where vm_committed_as's spinlock is under severe
contention, it costs 30~40 microseconds for the 2S/36C/72T Skylake
sever.

Thanks,
Feng


> >  int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer,
> >  		size_t *lenp, loff_t *ppos)
> >  {
> >  	int ret;
> >  
> >  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> > -	if (ret == 0 && write)
> > +	if (ret == 0 && write) {
> > +		if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
> > +			schedule_on_each_cpu(sync_overcommit_as);
> > +
> >  		mm_compute_batch();
> > +	}
> >  
> >  	return ret;
> >  }
> > -- 
> > 2.7.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-06 13:34                   ` Andi Kleen
  2020-07-06 23:42                     ` Andrew Morton
@ 2020-07-07  2:38                     ` Feng Tang
  2020-07-07  4:00                       ` Huang, Ying
  1 sibling, 1 reply; 26+ messages in thread
From: Feng Tang @ 2020-07-07  2:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Qian Cai, Andrew Morton, Dennis Zhou, Tejun Heo,
	Christoph Lameter, kernel test robot, Michal Hocko,
	Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook,
	Luis Chamberlain, Iurii Zaikin, tim.c.chen, dave.hansen,
	ying.huang, linux-mm, linux-kernel, lkp

On Mon, Jul 06, 2020 at 06:34:34AM -0700, Andi Kleen wrote:
> >  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> > -	if (ret == 0 && write)
> > +	if (ret == 0 && write) {
> > +		if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
> > +			schedule_on_each_cpu(sync_overcommit_as);
> 
> The schedule_on_each_cpu is not atomic, so the problem could still happen
> in that window.
> 
> I think it may be ok if it eventually resolves, but certainly needs
> a comment explaining it. Can you do some stress testing toggling the
> policy all the time on different CPUs and running the test on
> other CPUs and see if the test fails?

For the raw test case reported by 0day, this patch passed in 200 times
run. And I will read the ltp code and try stress testing it as you
suggested.


> The other alternative would be to define some intermediate state
> for the sysctl variable and only switch to never once the schedule_on_each_cpu
> returned. But that's more complexity.

One thought I had is to put this schedule_on_each_cpu() before
the proc_dointvec_minmax() to do the sync before sysctl_overcommit_memory
is really changed. But the window still exists, as the batch is
still the larger one. 

Thanks,
Feng

> 
> 
> -Andi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-06 13:24                 ` Feng Tang
  2020-07-06 13:34                   ` Andi Kleen
@ 2020-07-07  1:06                   ` Dennis Zhou
  2020-07-07  3:24                     ` Feng Tang
  1 sibling, 1 reply; 26+ messages in thread
From: Dennis Zhou @ 2020-07-07  1:06 UTC (permalink / raw)
  To: Feng Tang
  Cc: Qian Cai, Andrew Morton, Dennis Zhou, Tejun Heo,
	Christoph Lameter, kernel test robot, Michal Hocko,
	Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook,
	Luis Chamberlain, Iurii Zaikin, andi.kleen, tim.c.chen,
	dave.hansen, ying.huang, linux-mm, linux-kernel, lkp

On Mon, Jul 06, 2020 at 09:24:43PM +0800, Feng Tang wrote:
> Hi All,
> 
> Please help to review this fix patch, thanks!
> 
> It is against today's linux-mm tree. For easy review, I put the fix
> into one patch, and I could split it to 2 parts for percpu-counter
> and mm/util.c if it's preferred.
> 
> From 593f9dc139181a7c3bb1705aacd1f625f400e458 Mon Sep 17 00:00:00 2001
> From: Feng Tang <feng.tang@intel.com>
> Date: Mon, 6 Jul 2020 14:48:29 +0800
> Subject: [PATCH] mm/util.c: sync vm_committed_as when changing memory policy
>  to OVERCOMMIT_NEVER
> 
> With the patch to improve scalability of vm_committed_as [1], 0day reported
> the ltp overcommit_memory test case could fail (fail rate is about 5/50) [2].
> The root cause is when system is running with loose memory overcommit policy
> like OVERCOMMIT_GUESS/ALWAYS, the deviation of vm_committed_as could be big,
> and once the policy is runtime changed to OVERCOMMIT_NEVER, vm_committed_as's 
> batch is decreased to 1/64 of original one, but the deviation is not
> compensated accordingly, and following __vm_enough_memory() check for vm
> overcommit could be wrong due to this deviation, which breaks the ltp
> overcommit_memory case.
> 
> Fix it by forcing a sync for percpu counter vm_committed_as when overcommit
> policy is changed to OVERCOMMIT_NEVER (sysctl -w vm.overcommit_memory=2).
> The sync itself is not a fast operation, and is toleratable given user is
> not expected to frequently changing policy to OVERCOMMIT_NEVER.
> 
> [1] https://lore.kernel.org/lkml/1592725000-73486-1-git-send-email-feng.tang@intel.com/
> [2] https://marc.info/?l=linux-mm&m=159367156428286 (can't find a link in lore.kernel.org)
> 
> Reported-by: kernel test robot <rong.a.chen@intel.com>
> Signed-off-by: Feng Tang <feng.tang@intel.com>
> ---
>  include/linux/percpu_counter.h |  4 ++++
>  lib/percpu_counter.c           | 14 ++++++++++++++
>  mm/util.c                      | 11 ++++++++++-
>  3 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
> index 0a4f54d..01861ee 100644
> --- a/include/linux/percpu_counter.h
> +++ b/include/linux/percpu_counter.h
> @@ -44,6 +44,7 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount,
>  			      s32 batch);
>  s64 __percpu_counter_sum(struct percpu_counter *fbc);
>  int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch);
> +void percpu_counter_sync(struct percpu_counter *fbc);
>  
>  static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
>  {
> @@ -172,6 +173,9 @@ static inline bool percpu_counter_initialized(struct percpu_counter *fbc)
>  	return true;
>  }
>  
> +static inline void percpu_counter_sync(struct percpu_counter *fbc)
> +{
> +}
>  #endif	/* CONFIG_SMP */
>  
>  static inline void percpu_counter_inc(struct percpu_counter *fbc)
> diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
> index a66595b..02d87fc 100644
> --- a/lib/percpu_counter.c
> +++ b/lib/percpu_counter.c
> @@ -98,6 +98,20 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
>  }
>  EXPORT_SYMBOL(percpu_counter_add_batch);
>  
> +void percpu_counter_sync(struct percpu_counter *fbc)
> +{
> +	unsigned long flags;
> +	s64 count;
> +
> +	raw_spin_lock_irqsave(&fbc->lock, flags);
> +	count = __this_cpu_read(*fbc->counters);
> +	fbc->count += count;
> +	__this_cpu_sub(*fbc->counters, count);
> +	raw_spin_unlock_irqrestore(&fbc->lock, flags);
> +}
> +EXPORT_SYMBOL(percpu_counter_sync);
> +
> +
>  /*
>   * Add up all the per-cpu counts, return the result.  This is a more accurate
>   * but much slower version of percpu_counter_read_positive()
> diff --git a/mm/util.c b/mm/util.c
> index 52ed9c1..5fb62c0 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -746,14 +746,23 @@ int overcommit_ratio_handler(struct ctl_table *table, int write, void *buffer,
>  	return ret;
>  }
>  
> +static void sync_overcommit_as(struct work_struct *dummy)
> +{
> +	percpu_counter_sync(&vm_committed_as);
> +}
> +

This seems like a rather niche use case as it's currently coupled with a
schedule_on_each_cpu(). I can't imagine a use case where you'd want to
do this without being called by schedule_on_each_cpu().

Would it be better to modify or introduce something akin to
percpu_counter_sum() which sums and folds in the counter state? I'd be
curious to see what the cost of always folding would be as this is
already considered the cold path and would help with the next batch too.

>  int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer,
>  		size_t *lenp, loff_t *ppos)
>  {
>  	int ret;
>  
>  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> -	if (ret == 0 && write)
> +	if (ret == 0 && write) {
> +		if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
> +			schedule_on_each_cpu(sync_overcommit_as);
> +
>  		mm_compute_batch();
> +	}
>  
>  	return ret;
>  }
> -- 
> 2.7.4
>      
> 
> On Sun, Jul 05, 2020 at 10:36:14PM -0400, Qian Cai wrote:
> > > In my last email, I was not saying OVERCOMMIT_NEVER is not a normal case,
> > > but I don't think user will too frequently runtime change the overcommit
> > > policy. And the fix patch of syncing 'vm_committed_as' is only called when
> > > user calls 'sysctl -w vm.overcommit_memory=2'.
> > > 
> > > > The question is now if any of those regression fixes would now regress
> > > > performance of OVERCOMMIT_NEVER workloads or just in-par with the data
> > > > before the patchset?
> > > 
> > > For the original patchset, it keeps vm_committed_as unchanged for
> > > OVERCOMMIT_NEVER policy and enlarge it for the other 2 loose policies
> > > OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS, and I don't expect the "OVERCOMMIT_NEVER
> > > workloads" performance  will be impacted. If you have suggetions for this
> > > kind of benchmarks, I can test them to better verify the patchset, thanks!
> > 
> > Then, please capture those information into a proper commit log when you
> > submit the regression fix on top of the patchset, and CC PER-CPU MEMORY
> > ALLOCATOR maintainers, so they might be able to review it properly.
> 
> 
> 

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-06 13:34                   ` Andi Kleen
@ 2020-07-06 23:42                     ` Andrew Morton
  2020-07-07  2:38                     ` Feng Tang
  1 sibling, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2020-07-06 23:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Feng Tang, Qian Cai, Dennis Zhou, Tejun Heo, Christoph Lameter,
	kernel test robot, Michal Hocko, Johannes Weiner, Matthew Wilcox,
	Mel Gorman, Kees Cook, Luis Chamberlain, Iurii Zaikin,
	tim.c.chen, dave.hansen, ying.huang, linux-mm, linux-kernel, lkp

On Mon, 6 Jul 2020 06:34:34 -0700 Andi Kleen <andi.kleen@intel.com> wrote:

> >  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> > -	if (ret == 0 && write)
> > +	if (ret == 0 && write) {
> > +		if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
> > +			schedule_on_each_cpu(sync_overcommit_as);
> 
> The schedule_on_each_cpu is not atomic, so the problem could still happen
> in that window.
> 
> I think it may be ok if it eventually resolves, but certainly needs
> a comment explaining it.

It sure does.

The new exported-to-everything percpu_counter_sync() should have full
formal documentation as well, please.

> Can you do some stress testing toggling the
> policy all the time on different CPUs and running the test on
> other CPUs and see if the test fails?
> 
> The other alternative would be to define some intermediate state
> for the sysctl variable and only switch to never once the schedule_on_each_cpu
> returned. But that's more complexity.
> 
> 
> -Andi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-06 13:24                 ` Feng Tang
@ 2020-07-06 13:34                   ` Andi Kleen
  2020-07-06 23:42                     ` Andrew Morton
  2020-07-07  2:38                     ` Feng Tang
  2020-07-07  1:06                   ` Dennis Zhou
  1 sibling, 2 replies; 26+ messages in thread
From: Andi Kleen @ 2020-07-06 13:34 UTC (permalink / raw)
  To: Feng Tang
  Cc: Qian Cai, Andrew Morton, Dennis Zhou, Tejun Heo,
	Christoph Lameter, kernel test robot, Michal Hocko,
	Johannes Weiner, Matthew Wilcox, Mel Gorman, Kees Cook,
	Luis Chamberlain, Iurii Zaikin, tim.c.chen, dave.hansen,
	ying.huang, linux-mm, linux-kernel, lkp

>  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> -	if (ret == 0 && write)
> +	if (ret == 0 && write) {
> +		if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
> +			schedule_on_each_cpu(sync_overcommit_as);

The schedule_on_each_cpu is not atomic, so the problem could still happen
in that window.

I think it may be ok if it eventually resolves, but certainly needs
a comment explaining it. Can you do some stress testing toggling the
policy all the time on different CPUs and running the test on
other CPUs and see if the test fails?

The other alternative would be to define some intermediate state
for the sysctl variable and only switch to never once the schedule_on_each_cpu
returned. But that's more complexity.


-Andi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-06  2:36               ` Qian Cai
@ 2020-07-06 13:24                 ` Feng Tang
  2020-07-06 13:34                   ` Andi Kleen
  2020-07-07  1:06                   ` Dennis Zhou
  0 siblings, 2 replies; 26+ messages in thread
From: Feng Tang @ 2020-07-06 13:24 UTC (permalink / raw)
  To: Qian Cai, Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter
  Cc: kernel test robot, Michal Hocko, Johannes Weiner, Matthew Wilcox,
	Mel Gorman, Kees Cook, Luis Chamberlain, Iurii Zaikin,
	andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm,
	linux-kernel, lkp

Hi All,

Please help to review this fix patch, thanks!

It is against today's linux-mm tree. For easy review, I put the fix
into one patch, and I could split it to 2 parts for percpu-counter
and mm/util.c if it's preferred.

From 593f9dc139181a7c3bb1705aacd1f625f400e458 Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang@intel.com>
Date: Mon, 6 Jul 2020 14:48:29 +0800
Subject: [PATCH] mm/util.c: sync vm_committed_as when changing memory policy
 to OVERCOMMIT_NEVER

With the patch to improve scalability of vm_committed_as [1], 0day reported
the ltp overcommit_memory test case could fail (fail rate is about 5/50) [2].
The root cause is when system is running with loose memory overcommit policy
like OVERCOMMIT_GUESS/ALWAYS, the deviation of vm_committed_as could be big,
and once the policy is runtime changed to OVERCOMMIT_NEVER, vm_committed_as's 
batch is decreased to 1/64 of original one, but the deviation is not
compensated accordingly, and following __vm_enough_memory() check for vm
overcommit could be wrong due to this deviation, which breaks the ltp
overcommit_memory case.

Fix it by forcing a sync for percpu counter vm_committed_as when overcommit
policy is changed to OVERCOMMIT_NEVER (sysctl -w vm.overcommit_memory=2).
The sync itself is not a fast operation, and is toleratable given user is
not expected to frequently changing policy to OVERCOMMIT_NEVER.

[1] https://lore.kernel.org/lkml/1592725000-73486-1-git-send-email-feng.tang@intel.com/
[2] https://marc.info/?l=linux-mm&m=159367156428286 (can't find a link in lore.kernel.org)

Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 include/linux/percpu_counter.h |  4 ++++
 lib/percpu_counter.c           | 14 ++++++++++++++
 mm/util.c                      | 11 ++++++++++-
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
index 0a4f54d..01861ee 100644
--- a/include/linux/percpu_counter.h
+++ b/include/linux/percpu_counter.h
@@ -44,6 +44,7 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount,
 			      s32 batch);
 s64 __percpu_counter_sum(struct percpu_counter *fbc);
 int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch);
+void percpu_counter_sync(struct percpu_counter *fbc);
 
 static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
 {
@@ -172,6 +173,9 @@ static inline bool percpu_counter_initialized(struct percpu_counter *fbc)
 	return true;
 }
 
+static inline void percpu_counter_sync(struct percpu_counter *fbc)
+{
+}
 #endif	/* CONFIG_SMP */
 
 static inline void percpu_counter_inc(struct percpu_counter *fbc)
diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index a66595b..02d87fc 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -98,6 +98,20 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
 }
 EXPORT_SYMBOL(percpu_counter_add_batch);
 
+void percpu_counter_sync(struct percpu_counter *fbc)
+{
+	unsigned long flags;
+	s64 count;
+
+	raw_spin_lock_irqsave(&fbc->lock, flags);
+	count = __this_cpu_read(*fbc->counters);
+	fbc->count += count;
+	__this_cpu_sub(*fbc->counters, count);
+	raw_spin_unlock_irqrestore(&fbc->lock, flags);
+}
+EXPORT_SYMBOL(percpu_counter_sync);
+
+
 /*
  * Add up all the per-cpu counts, return the result.  This is a more accurate
  * but much slower version of percpu_counter_read_positive()
diff --git a/mm/util.c b/mm/util.c
index 52ed9c1..5fb62c0 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -746,14 +746,23 @@ int overcommit_ratio_handler(struct ctl_table *table, int write, void *buffer,
 	return ret;
 }
 
+static void sync_overcommit_as(struct work_struct *dummy)
+{
+	percpu_counter_sync(&vm_committed_as);
+}
+
 int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer,
 		size_t *lenp, loff_t *ppos)
 {
 	int ret;
 
 	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
-	if (ret == 0 && write)
+	if (ret == 0 && write) {
+		if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
+			schedule_on_each_cpu(sync_overcommit_as);
+
 		mm_compute_batch();
+	}
 
 	return ret;
 }
-- 
2.7.4
     

On Sun, Jul 05, 2020 at 10:36:14PM -0400, Qian Cai wrote:
> > In my last email, I was not saying OVERCOMMIT_NEVER is not a normal case,
> > but I don't think user will too frequently runtime change the overcommit
> > policy. And the fix patch of syncing 'vm_committed_as' is only called when
> > user calls 'sysctl -w vm.overcommit_memory=2'.
> > 
> > > The question is now if any of those regression fixes would now regress
> > > performance of OVERCOMMIT_NEVER workloads or just in-par with the data
> > > before the patchset?
> > 
> > For the original patchset, it keeps vm_committed_as unchanged for
> > OVERCOMMIT_NEVER policy and enlarge it for the other 2 loose policies
> > OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS, and I don't expect the "OVERCOMMIT_NEVER
> > workloads" performance  will be impacted. If you have suggetions for this
> > kind of benchmarks, I can test them to better verify the patchset, thanks!
> 
> Then, please capture those information into a proper commit log when you
> submit the regression fix on top of the patchset, and CC PER-CPU MEMORY
> ALLOCATOR maintainers, so they might be able to review it properly.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-06  1:43             ` Feng Tang
@ 2020-07-06  2:36               ` Qian Cai
  2020-07-06 13:24                 ` Feng Tang
  0 siblings, 1 reply; 26+ messages in thread
From: Qian Cai @ 2020-07-06  2:36 UTC (permalink / raw)
  To: Feng Tang
  Cc: kernel test robot, Andrew Morton, Michal Hocko, Johannes Weiner,
	Matthew Wilcox, Mel Gorman, Kees Cook, Luis Chamberlain,
	Iurii Zaikin, andi.kleen, tim.c.chen, dave.hansen, ying.huang,
	linux-mm, linux-kernel, lkp

On Mon, Jul 06, 2020 at 09:43:13AM +0800, Feng Tang wrote:
> On Sun, Jul 05, 2020 at 11:52:32AM -0400, Qian Cai wrote:
> > On Sun, Jul 05, 2020 at 08:58:54PM +0800, Feng Tang wrote:
> > > On Sun, Jul 05, 2020 at 08:15:03AM -0400, Qian Cai wrote:
> > > > 
> > > > 
> > > > > On Jul 5, 2020, at 12:45 AM, Feng Tang <feng.tang@intel.com> wrote:
> > > > > 
> > > > > I did reproduce the problem, and from the debugging, this should
> > > > > be the same root cause as lore.kernel.org/lkml/20200526181459.GD991@lca.pw/
> > > > > that loosing the batch cause some accuracy problem, and the solution of
> > > > > adding some sync is still needed, which is dicussed in
> > > > 
> > > > Well, before taking any of those patches now to fix the regression,
> > > > we will need some performance data first. If it turned out the
> > > > original performance gain is no longer relevant anymore due to this
> > > > regression fix on top, it is best to drop this patchset and restore
> > > > that VM_WARN_ONCE, so you can retry later once you found a better
> > > > way to optimize.
> > > 
> > > The fix of adding sync only happens when the memory policy is being
> > > changed to OVERCOMMIT_NEVER, which is not a frequent operation in
> > > normal cases.
> > > 
> > > For the performance improvment data both in commit log and 0day report
> > > https://lore.kernel.org/lkml/20200622132548.GS5535@shao2-debian/
> > > it is for the will-it-scale's mmap testcase, which will not runtime
> > > change memory overcommit policy, so the data should be still valid
> > > with this fix.
> > 
> > Well, I would expect people are perfectly reasonable to use
> > OVERCOMMIT_NEVER for some workloads making it more frequent operations.
> 
> In my last email, I was not saying OVERCOMMIT_NEVER is not a normal case,
> but I don't think user will too frequently runtime change the overcommit
> policy. And the fix patch of syncing 'vm_committed_as' is only called when
> user calls 'sysctl -w vm.overcommit_memory=2'.
> 
> > The question is now if any of those regression fixes would now regress
> > performance of OVERCOMMIT_NEVER workloads or just in-par with the data
> > before the patchset?
> 
> For the original patchset, it keeps vm_committed_as unchanged for
> OVERCOMMIT_NEVER policy and enlarge it for the other 2 loose policies
> OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS, and I don't expect the "OVERCOMMIT_NEVER
> workloads" performance  will be impacted. If you have suggetions for this
> kind of benchmarks, I can test them to better verify the patchset, thanks!

Then, please capture those information into a proper commit log when you
submit the regression fix on top of the patchset, and CC PER-CPU MEMORY
ALLOCATOR maintainers, so they might be able to review it properly.

> 
> - Feng
> 
> > 
> > Given now this patchset has had so much churn recently, I would think
> > "should be still valid" is not really the answer we are looking for.
> > 
> > > 
> > > Thanks,
> > > Feng
> > > 
> > > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-05 15:52           ` Qian Cai
@ 2020-07-06  1:43             ` Feng Tang
  2020-07-06  2:36               ` Qian Cai
  2020-07-07 10:28             ` Michal Hocko
  1 sibling, 1 reply; 26+ messages in thread
From: Feng Tang @ 2020-07-06  1:43 UTC (permalink / raw)
  To: Qian Cai
  Cc: kernel test robot, Andrew Morton, Michal Hocko, Johannes Weiner,
	Matthew Wilcox, Mel Gorman, Kees Cook, Luis Chamberlain,
	Iurii Zaikin, andi.kleen, tim.c.chen, dave.hansen, ying.huang,
	linux-mm, linux-kernel, lkp

On Sun, Jul 05, 2020 at 11:52:32AM -0400, Qian Cai wrote:
> On Sun, Jul 05, 2020 at 08:58:54PM +0800, Feng Tang wrote:
> > On Sun, Jul 05, 2020 at 08:15:03AM -0400, Qian Cai wrote:
> > > 
> > > 
> > > > On Jul 5, 2020, at 12:45 AM, Feng Tang <feng.tang@intel.com> wrote:
> > > > 
> > > > I did reproduce the problem, and from the debugging, this should
> > > > be the same root cause as lore.kernel.org/lkml/20200526181459.GD991@lca.pw/
> > > > that loosing the batch cause some accuracy problem, and the solution of
> > > > adding some sync is still needed, which is dicussed in
> > > 
> > > Well, before taking any of those patches now to fix the regression,
> > > we will need some performance data first. If it turned out the
> > > original performance gain is no longer relevant anymore due to this
> > > regression fix on top, it is best to drop this patchset and restore
> > > that VM_WARN_ONCE, so you can retry later once you found a better
> > > way to optimize.
> > 
> > The fix of adding sync only happens when the memory policy is being
> > changed to OVERCOMMIT_NEVER, which is not a frequent operation in
> > normal cases.
> > 
> > For the performance improvment data both in commit log and 0day report
> > https://lore.kernel.org/lkml/20200622132548.GS5535@shao2-debian/
> > it is for the will-it-scale's mmap testcase, which will not runtime
> > change memory overcommit policy, so the data should be still valid
> > with this fix.
> 
> Well, I would expect people are perfectly reasonable to use
> OVERCOMMIT_NEVER for some workloads making it more frequent operations.

In my last email, I was not saying OVERCOMMIT_NEVER is not a normal case,
but I don't think user will too frequently runtime change the overcommit
policy. And the fix patch of syncing 'vm_committed_as' is only called when
user calls 'sysctl -w vm.overcommit_memory=2'.

> The question is now if any of those regression fixes would now regress
> performance of OVERCOMMIT_NEVER workloads or just in-par with the data
> before the patchset?

For the original patchset, it keeps vm_committed_as unchanged for
OVERCOMMIT_NEVER policy and enlarge it for the other 2 loose policies
OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS, and I don't expect the "OVERCOMMIT_NEVER
workloads" performance  will be impacted. If you have suggetions for this
kind of benchmarks, I can test them to better verify the patchset, thanks!

- Feng

> 
> Given now this patchset has had so much churn recently, I would think
> "should be still valid" is not really the answer we are looking for.
> 
> > 
> > Thanks,
> > Feng
> > 
> > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-05 12:58         ` Feng Tang
@ 2020-07-05 15:52           ` Qian Cai
  2020-07-06  1:43             ` Feng Tang
  2020-07-07 10:28             ` Michal Hocko
  0 siblings, 2 replies; 26+ messages in thread
From: Qian Cai @ 2020-07-05 15:52 UTC (permalink / raw)
  To: Feng Tang
  Cc: kernel test robot, Andrew Morton, Michal Hocko, Johannes Weiner,
	Matthew Wilcox, Mel Gorman, Kees Cook, Luis Chamberlain,
	Iurii Zaikin, andi.kleen, tim.c.chen, dave.hansen, ying.huang,
	linux-mm, linux-kernel, lkp

On Sun, Jul 05, 2020 at 08:58:54PM +0800, Feng Tang wrote:
> On Sun, Jul 05, 2020 at 08:15:03AM -0400, Qian Cai wrote:
> > 
> > 
> > > On Jul 5, 2020, at 12:45 AM, Feng Tang <feng.tang@intel.com> wrote:
> > > 
> > > I did reproduce the problem, and from the debugging, this should
> > > be the same root cause as lore.kernel.org/lkml/20200526181459.GD991@lca.pw/
> > > that loosing the batch cause some accuracy problem, and the solution of
> > > adding some sync is still needed, which is dicussed in
> > 
> > Well, before taking any of those patches now to fix the regression,
> > we will need some performance data first. If it turned out the
> > original performance gain is no longer relevant anymore due to this
> > regression fix on top, it is best to drop this patchset and restore
> > that VM_WARN_ONCE, so you can retry later once you found a better
> > way to optimize.
> 
> The fix of adding sync only happens when the memory policy is being
> changed to OVERCOMMIT_NEVER, which is not a frequent operation in
> normal cases.
> 
> For the performance improvment data both in commit log and 0day report
> https://lore.kernel.org/lkml/20200622132548.GS5535@shao2-debian/
> it is for the will-it-scale's mmap testcase, which will not runtime
> change memory overcommit policy, so the data should be still valid
> with this fix.

Well, I would expect people are perfectly reasonable to use
OVERCOMMIT_NEVER for some workloads making it more frequent operations.
The question is now if any of those regression fixes would now regress
performance of OVERCOMMIT_NEVER workloads or just in-par with the data
before the patchset?

Given now this patchset has had so much churn recently, I would think
"should be still valid" is not really the answer we are looking for.

> 
> Thanks,
> Feng
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-05 12:15       ` Qian Cai
@ 2020-07-05 12:58         ` Feng Tang
  2020-07-05 15:52           ` Qian Cai
  0 siblings, 1 reply; 26+ messages in thread
From: Feng Tang @ 2020-07-05 12:58 UTC (permalink / raw)
  To: Qian Cai
  Cc: kernel test robot, Andrew Morton, Michal Hocko, Johannes Weiner,
	Matthew Wilcox, Mel Gorman, Kees Cook, Luis Chamberlain,
	Iurii Zaikin, andi.kleen, tim.c.chen, dave.hansen, ying.huang,
	linux-mm, linux-kernel, lkp

On Sun, Jul 05, 2020 at 08:15:03AM -0400, Qian Cai wrote:
> 
> 
> > On Jul 5, 2020, at 12:45 AM, Feng Tang <feng.tang@intel.com> wrote:
> > 
> > I did reproduce the problem, and from the debugging, this should
> > be the same root cause as lore.kernel.org/lkml/20200526181459.GD991@lca.pw/
> > that loosing the batch cause some accuracy problem, and the solution of
> > adding some sync is still needed, which is dicussed in
> 
> Well, before taking any of those patches now to fix the regression, we will need some performance data first. If it turned out the original performance gain is no longer relevant anymore due to this regression fix on top, it is best to drop this patchset and restore that VM_WARN_ONCE, so you can retry later once you found a better way to optimize.

The fix of adding sync only happens when the memory policy is being
changed to OVERCOMMIT_NEVER, which is not a frequent operation in
normal cases.

For the performance improvment data both in commit log and 0day report
https://lore.kernel.org/lkml/20200622132548.GS5535@shao2-debian/
it is for the will-it-scale's mmap testcase, which will not runtime
change memory overcommit policy, so the data should be still valid
with this fix.

Thanks,
Feng



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-05  4:44     ` Feng Tang
@ 2020-07-05 12:15       ` Qian Cai
  2020-07-05 12:58         ` Feng Tang
  0 siblings, 1 reply; 26+ messages in thread
From: Qian Cai @ 2020-07-05 12:15 UTC (permalink / raw)
  To: Feng Tang
  Cc: kernel test robot, Andrew Morton, Michal Hocko, Johannes Weiner,
	Matthew Wilcox, Mel Gorman, Kees Cook, Luis Chamberlain,
	Iurii Zaikin, andi.kleen, tim.c.chen, dave.hansen, ying.huang,
	linux-mm, linux-kernel, lkp



> On Jul 5, 2020, at 12:45 AM, Feng Tang <feng.tang@intel.com> wrote:
> 
> I did reproduce the problem, and from the debugging, this should
> be the same root cause as lore.kernel.org/lkml/20200526181459.GD991@lca.pw/
> that loosing the batch cause some accuracy problem, and the solution of
> adding some sync is still needed, which is dicussed in

Well, before taking any of those patches now to fix the regression, we will need some performance data first. If it turned out the original performance gain is no longer relevant anymore due to this regression fix on top, it is best to drop this patchset and restore that VM_WARN_ONCE, so you can retry later once you found a better way to optimize.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-02  7:12   ` [mm] 4e2c82a409: ltp.overcommit_memory01.fail Feng Tang
  2020-07-05  3:20     ` Qian Cai
@ 2020-07-05  4:44     ` Feng Tang
  2020-07-05 12:15       ` Qian Cai
  1 sibling, 1 reply; 26+ messages in thread
From: Feng Tang @ 2020-07-05  4:44 UTC (permalink / raw)
  To: kernel test robot
  Cc: Andrew Morton, cai, Michal Hocko, Johannes Weiner,
	Matthew Wilcox, Mel Gorman, Kees Cook, Luis Chamberlain,
	Iurii Zaikin, andi.kleen, tim.c.chen, dave.hansen, ying.huang,
	linux-mm, linux-kernel, lkp

On Thu, Jul 02, 2020 at 03:12:30PM +0800, Feng Tang wrote:
> > <<<test_start>>>
> > tag=overcommit_memory01 stime=1593425044
> > cmdline="overcommit_memory"
> > contacts=""
> > analysis=exit
> > <<<test_output>>>
> > tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s
> > overcommit_memory.c:116: INFO: MemTotal is 16394252 kB
> > overcommit_memory.c:118: INFO: SwapTotal is 268435452 kB
> > overcommit_memory.c:122: INFO: CommitLimit is 276632576 kB
> > mem.c:817: INFO: set overcommit_ratio to 50
> > mem.c:817: INFO: set overcommit_memory to 2
> > overcommit_memory.c:187: INFO: malloc 551061440 kB failed
> > overcommit_memory.c:208: PASS: alloc failed as expected
> > overcommit_memory.c:183: INFO: malloc 276632576 kB successfully
> > overcommit_memory.c:210: FAIL: alloc passed, expected to fail
> 
> Thanks for the report!
> 
> I took a rough look, and it all happens after changing the
> overcommit policy from a looser one to OVERCOMMIT_NEVER. I suspect 
> it is due to the same cause as the previous warning message reported
> by Qian Cai https://lore.kernel.org/lkml/20200526181459.GD991@lca.pw/ 
> 
> Will further check it.

I did reproduce the problem, and from the debugging, this should
be the same root cause as https://lore.kernel.org/lkml/20200526181459.GD991@lca.pw/
that loosing the batch cause some accuracy problem, and the solution of
adding some sync is still needed, which is dicussed in 

First thing I tried a simple patch of using percpucounter_sum_read, and
the problem can't be reproduced:

--- a/mm/util.c
+++ b/mm/util.c
@@ -845,7 +845,7 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
 		allowed -= min_t(long, mm->total_vm / 32, reserve);
 	}
 
-	if (percpu_counter_read_positive(&vm_committed_as) < allowed)
+	if (percpu_counter_sum(&vm_committed_as) < allowed)
 		return 0;
 error:
 	vm_unacct_memory(pages);
 

Then, I tried the sync patch we've discussed one month ago
https://lore.kernel.org/lkml/20200529154315.GI93879@shbuild999.sh.intel.com/  
with it, I run the case 200 times and the problem was not reproduced,
can we consider taking this patch?

Thanks,
Feng
 
diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
index 0a4f54d..01861ee 100644
--- a/include/linux/percpu_counter.h
+++ b/include/linux/percpu_counter.h
@@ -44,6 +44,7 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount,
 			      s32 batch);
 s64 __percpu_counter_sum(struct percpu_counter *fbc);
 int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch);
+void percpu_counter_sync(struct percpu_counter *fbc);
 
 static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
 {
@@ -172,6 +173,9 @@ static inline bool percpu_counter_initialized(struct percpu_counter *fbc)
 	return true;
 }
 
+static inline void percpu_counter_sync(struct percpu_counter *fbc)
+{
+}
 #endif	/* CONFIG_SMP */
 
 static inline void percpu_counter_inc(struct percpu_counter *fbc)
diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index a66595b..d025137 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -98,6 +98,20 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
 }
 EXPORT_SYMBOL(percpu_counter_add_batch);
 
+void percpu_counter_sync(struct percpu_counter *fbc)
+{
+       unsigned long flags;
+       s64 count;
+
+       raw_spin_lock_irqsave(&fbc->lock, flags);
+       count = __this_cpu_read(*fbc->counters);
+       fbc->count += count;
+       __this_cpu_sub(*fbc->counters, count);
+       raw_spin_unlock_irqrestore(&fbc->lock, flags);
+}
+EXPORT_SYMBOL(percpu_counter_sync);
+
+
 /*
  * Add up all the per-cpu counts, return the result.  This is a more accurate
  * but much slower version of percpu_counter_read_positive()
diff --git a/mm/util.c b/mm/util.c
index 98813da..8b9664e 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -746,14 +746,23 @@ int overcommit_ratio_handler(struct ctl_table *table, int write, void *buffer,
 	return ret;
 }
 
+static  void sync_overcommit_as(struct work_struct *dummy)
+{
+	percpu_counter_sync(&vm_committed_as);
+}
+
 int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer,
 		size_t *lenp, loff_t *ppos)
 {
 	int ret;
 
 	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
-	if (ret == 0 && write)
+	if (ret == 0 && write) {
+		if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
+			schedule_on_each_cpu(sync_overcommit_as);
+
 		mm_compute_batch();
+	}
 
 	pr_info("ocommit=%lld, real=%lld policy[%d] ratio=%d\n\n\n",
 			percpu_counter_read_positive(&vm_committed_as),



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
  2020-07-02  7:12   ` [mm] 4e2c82a409: ltp.overcommit_memory01.fail Feng Tang
@ 2020-07-05  3:20     ` Qian Cai
  2020-07-05  4:44     ` Feng Tang
  1 sibling, 0 replies; 26+ messages in thread
From: Qian Cai @ 2020-07-05  3:20 UTC (permalink / raw)
  To: Feng Tang
  Cc: kernel test robot, Andrew Morton, Michal Hocko, Johannes Weiner,
	Matthew Wilcox, Mel Gorman, Kees Cook, Luis Chamberlain,
	Iurii Zaikin, andi.kleen, tim.c.chen, dave.hansen, ying.huang,
	linux-mm, linux-kernel, lkp

On Thu, Jul 02, 2020 at 03:12:30PM +0800, Feng Tang wrote:
> Hi,
> 
> On Thu, Jul 02, 2020 at 02:32:01PM +0800, kernel test robot wrote:
> > Greeting,
> > 
> > FYI, we noticed the following commit (built with gcc-9):
> > 
> > commit: 4e2c82a40911c19419349918e675aa202b113b4d ("[PATCH v5 3/3] mm: adjust vm_committed_as_batch according to vm overcommit policy")
> > url: https://github.com/0day-ci/linux/commits/Feng-Tang/make-vm_committed_as_batch-aware-of-vm-overcommit-policy/20200621-153906
> > 
> > 
> > in testcase: ltp
> > with following parameters:
> > 
> > 	disk: 1HDD
> > 	test: mm-01
> > 
> > test-description: The LTP testsuite contains a collection of tools for testing the Linux kernel and related features.
> > test-url: http://linux-test-project.github.io/
> > 
> > 
> > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > 
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > 
> > 
> > 
> > 
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <rong.a.chen@intel.com>
> > 
> > 
> > 
> > <<<test_start>>>
> > tag=overcommit_memory01 stime=1593425044
> > cmdline="overcommit_memory"
> > contacts=""
> > analysis=exit
> > <<<test_output>>>
> > tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s
> > overcommit_memory.c:116: INFO: MemTotal is 16394252 kB
> > overcommit_memory.c:118: INFO: SwapTotal is 268435452 kB
> > overcommit_memory.c:122: INFO: CommitLimit is 276632576 kB
> > mem.c:817: INFO: set overcommit_ratio to 50
> > mem.c:817: INFO: set overcommit_memory to 2
> > overcommit_memory.c:187: INFO: malloc 551061440 kB failed
> > overcommit_memory.c:208: PASS: alloc failed as expected
> > overcommit_memory.c:183: INFO: malloc 276632576 kB successfully
> > overcommit_memory.c:210: FAIL: alloc passed, expected to fail
> 
> Thanks for the report!
> 
> I took a rough look, and it all happens after changing the
> overcommit policy from a looser one to OVERCOMMIT_NEVER. I suspect 
> it is due to the same cause as the previous warning message reported
> by Qian Cai https://lore.kernel.org/lkml/20200526181459.GD991@lca.pw/ 

Hmm, the LTP test [1] looks like a faithful implementation of
Documentation/vm/overcommit-accounting.rst which is now failing because
of this patchset.

Also, It was a mistake to merge c571686a92ff ("mm/util.c: remove the
VM_WARN_ONCE for vm_committed_as underflow check") separately (I am
taking a blame to ACK it separately and I forgot to run those tests to
double-check earlier) which is now making me wonder that VM_WARN_ONCE is
actually legitimate to catch the sign of brokenness in the first place.

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/tunable/overcommit_memory.c

> 
> Will further check it.
> 
> Thanks,
> Feng
> 
> > overcommit_memory.c:183: INFO: malloc 137765294 kB successfully
> > overcommit_memory.c:202: PASS: alloc passed as expected
> > mem.c:817: INFO: set overcommit_memory to 0
> > overcommit_memory.c:183: INFO: malloc 140770308 kB successfully
> > overcommit_memory.c:202: PASS: alloc passed as expected
> > overcommit_memory.c:187: INFO: malloc 569659408 kB failed
> > overcommit_memory.c:208: PASS: alloc failed as expected
> > mem.c:817: INFO: set overcommit_memory to 1
> > overcommit_memory.c:183: INFO: malloc 142414852 kB successfully
> > overcommit_memory.c:202: PASS: alloc passed as expected
> > overcommit_memory.c:183: INFO: malloc 284829704 kB successfully
> > overcommit_memory.c:202: PASS: alloc passed as expected
> > overcommit_memory.c:183: INFO: malloc 569659408 kB successfully
> > overcommit_memory.c:202: PASS: alloc passed as expected
> > mem.c:817: INFO: set overcommit_memory to 0
> > mem.c:817: INFO: set overcommit_ratio to 50
> > 
> > Summary:
> > passed   7
> > failed   1
> > skipped  0
> > warnings 0
> > <<<execution_status>>>
> > initiation_status="ok"
> > duration=0 termination_type=exited termination_id=1 corefile=no
> > cutime=0 cstime=1
> > <<<test_end>>>
> > <<<test_start>>>
> > tag=overcommit_memory02 stime=1593425044
> > cmdline="overcommit_memory -R 0"
> > contacts=""
> > analysis=exit
> > <<<test_output>>>
> > tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s
> > overcommit_memory.c:116: INFO: MemTotal is 16394252 kB
> > overcommit_memory.c:118: INFO: SwapTotal is 268435452 kB
> > overcommit_memory.c:122: INFO: CommitLimit is 276632576 kB
> > mem.c:817: INFO: set overcommit_ratio to 0
> > mem.c:817: INFO: set overcommit_memory to 2
> > overcommit_memory.c:187: INFO: malloc 534667184 kB failed
> > overcommit_memory.c:208: PASS: alloc failed as expected
> > overcommit_memory.c:183: INFO: malloc 268435452 kB successfully
> > overcommit_memory.c:210: FAIL: alloc passed, expected to fail
> > overcommit_memory.c:183: INFO: malloc 133666730 kB successfully
> > overcommit_memory.c:202: PASS: alloc passed as expected
> > mem.c:817: INFO: set overcommit_memory to 0
> > overcommit_memory.c:183: INFO: malloc 140770304 kB successfully
> > overcommit_memory.c:202: PASS: alloc passed as expected
> > overcommit_memory.c:187: INFO: malloc 569659408 kB failed
> > overcommit_memory.c:208: PASS: alloc failed as expected
> > mem.c:817: INFO: set overcommit_memory to 1
> > overcommit_memory.c:183: INFO: malloc 142414852 kB successfully
> > overcommit_memory.c:202: PASS: alloc passed as expected
> > overcommit_memory.c:183: INFO: malloc 284829704 kB successfully
> > overcommit_memory.c:202: PASS: alloc passed as expected
> > overcommit_memory.c:183: INFO: malloc 569659408 kB successfully
> > overcommit_memory.c:202: PASS: alloc passed as expected
> > mem.c:817: INFO: set overcommit_memory to 0
> > mem.c:817: INFO: set overcommit_ratio to 50
> > 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
       [not found] ` <20200702063201.GG3874@shao2-debian>
@ 2020-07-02  7:12   ` Feng Tang
  2020-07-05  3:20     ` Qian Cai
  2020-07-05  4:44     ` Feng Tang
  0 siblings, 2 replies; 26+ messages in thread
From: Feng Tang @ 2020-07-02  7:12 UTC (permalink / raw)
  To: kernel test robot
  Cc: Andrew Morton, Michal Hocko, Johannes Weiner, Matthew Wilcox,
	Mel Gorman, Kees Cook, Luis Chamberlain, Iurii Zaikin,
	andi.kleen, tim.c.chen, dave.hansen, ying.huang, linux-mm,
	linux-kernel, lkp

Hi,

On Thu, Jul 02, 2020 at 02:32:01PM +0800, kernel test robot wrote:
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: 4e2c82a40911c19419349918e675aa202b113b4d ("[PATCH v5 3/3] mm: adjust vm_committed_as_batch according to vm overcommit policy")
> url: https://github.com/0day-ci/linux/commits/Feng-Tang/make-vm_committed_as_batch-aware-of-vm-overcommit-policy/20200621-153906
> 
> 
> in testcase: ltp
> with following parameters:
> 
> 	disk: 1HDD
> 	test: mm-01
> 
> test-description: The LTP testsuite contains a collection of tools for testing the Linux kernel and related features.
> test-url: http://linux-test-project.github.io/
> 
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> 
> 
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <rong.a.chen@intel.com>
> 
> 
> 
> <<<test_start>>>
> tag=overcommit_memory01 stime=1593425044
> cmdline="overcommit_memory"
> contacts=""
> analysis=exit
> <<<test_output>>>
> tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s
> overcommit_memory.c:116: INFO: MemTotal is 16394252 kB
> overcommit_memory.c:118: INFO: SwapTotal is 268435452 kB
> overcommit_memory.c:122: INFO: CommitLimit is 276632576 kB
> mem.c:817: INFO: set overcommit_ratio to 50
> mem.c:817: INFO: set overcommit_memory to 2
> overcommit_memory.c:187: INFO: malloc 551061440 kB failed
> overcommit_memory.c:208: PASS: alloc failed as expected
> overcommit_memory.c:183: INFO: malloc 276632576 kB successfully
> overcommit_memory.c:210: FAIL: alloc passed, expected to fail

Thanks for the report!

I took a rough look, and it all happens after changing the
overcommit policy from a looser one to OVERCOMMIT_NEVER. I suspect 
it is due to the same cause as the previous warning message reported
by Qian Cai https://lore.kernel.org/lkml/20200526181459.GD991@lca.pw/ 

Will further check it.

Thanks,
Feng

> overcommit_memory.c:183: INFO: malloc 137765294 kB successfully
> overcommit_memory.c:202: PASS: alloc passed as expected
> mem.c:817: INFO: set overcommit_memory to 0
> overcommit_memory.c:183: INFO: malloc 140770308 kB successfully
> overcommit_memory.c:202: PASS: alloc passed as expected
> overcommit_memory.c:187: INFO: malloc 569659408 kB failed
> overcommit_memory.c:208: PASS: alloc failed as expected
> mem.c:817: INFO: set overcommit_memory to 1
> overcommit_memory.c:183: INFO: malloc 142414852 kB successfully
> overcommit_memory.c:202: PASS: alloc passed as expected
> overcommit_memory.c:183: INFO: malloc 284829704 kB successfully
> overcommit_memory.c:202: PASS: alloc passed as expected
> overcommit_memory.c:183: INFO: malloc 569659408 kB successfully
> overcommit_memory.c:202: PASS: alloc passed as expected
> mem.c:817: INFO: set overcommit_memory to 0
> mem.c:817: INFO: set overcommit_ratio to 50
> 
> Summary:
> passed   7
> failed   1
> skipped  0
> warnings 0
> <<<execution_status>>>
> initiation_status="ok"
> duration=0 termination_type=exited termination_id=1 corefile=no
> cutime=0 cstime=1
> <<<test_end>>>
> <<<test_start>>>
> tag=overcommit_memory02 stime=1593425044
> cmdline="overcommit_memory -R 0"
> contacts=""
> analysis=exit
> <<<test_output>>>
> tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s
> overcommit_memory.c:116: INFO: MemTotal is 16394252 kB
> overcommit_memory.c:118: INFO: SwapTotal is 268435452 kB
> overcommit_memory.c:122: INFO: CommitLimit is 276632576 kB
> mem.c:817: INFO: set overcommit_ratio to 0
> mem.c:817: INFO: set overcommit_memory to 2
> overcommit_memory.c:187: INFO: malloc 534667184 kB failed
> overcommit_memory.c:208: PASS: alloc failed as expected
> overcommit_memory.c:183: INFO: malloc 268435452 kB successfully
> overcommit_memory.c:210: FAIL: alloc passed, expected to fail
> overcommit_memory.c:183: INFO: malloc 133666730 kB successfully
> overcommit_memory.c:202: PASS: alloc passed as expected
> mem.c:817: INFO: set overcommit_memory to 0
> overcommit_memory.c:183: INFO: malloc 140770304 kB successfully
> overcommit_memory.c:202: PASS: alloc passed as expected
> overcommit_memory.c:187: INFO: malloc 569659408 kB failed
> overcommit_memory.c:208: PASS: alloc failed as expected
> mem.c:817: INFO: set overcommit_memory to 1
> overcommit_memory.c:183: INFO: malloc 142414852 kB successfully
> overcommit_memory.c:202: PASS: alloc passed as expected
> overcommit_memory.c:183: INFO: malloc 284829704 kB successfully
> overcommit_memory.c:202: PASS: alloc passed as expected
> overcommit_memory.c:183: INFO: malloc 569659408 kB successfully
> overcommit_memory.c:202: PASS: alloc passed as expected
> mem.c:817: INFO: set overcommit_memory to 0
> mem.c:817: INFO: set overcommit_ratio to 50
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, back to index

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-07 11:43 [mm] 4e2c82a409: ltp.overcommit_memory01.fail Qian Cai
2020-07-07 12:06 ` Michal Hocko
2020-07-07 13:04   ` Qian Cai
2020-07-07 13:56     ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2020-06-21  7:36 [PATCH v5 3/3] mm: adjust vm_committed_as_batch according to vm overcommit policy Feng Tang
     [not found] ` <20200702063201.GG3874@shao2-debian>
2020-07-02  7:12   ` [mm] 4e2c82a409: ltp.overcommit_memory01.fail Feng Tang
2020-07-05  3:20     ` Qian Cai
2020-07-05  4:44     ` Feng Tang
2020-07-05 12:15       ` Qian Cai
2020-07-05 12:58         ` Feng Tang
2020-07-05 15:52           ` Qian Cai
2020-07-06  1:43             ` Feng Tang
2020-07-06  2:36               ` Qian Cai
2020-07-06 13:24                 ` Feng Tang
2020-07-06 13:34                   ` Andi Kleen
2020-07-06 23:42                     ` Andrew Morton
2020-07-07  2:38                     ` Feng Tang
2020-07-07  4:00                       ` Huang, Ying
2020-07-07  5:41                         ` Feng Tang
2020-07-09  4:55                           ` Feng Tang
2020-07-09 13:40                             ` Qian Cai
2020-07-09 14:15                               ` Feng Tang
2020-07-10  1:38                                 ` Feng Tang
2020-07-10  3:26                                   ` Qian Cai
2020-07-07  1:06                   ` Dennis Zhou
2020-07-07  3:24                     ` Feng Tang
2020-07-07 10:28             ` Michal Hocko

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git