All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/util.c: Add error logs for commitment overflow
@ 2020-10-02 11:57 Pintu Kumar
  2020-10-02 12:11 ` Matthew Wilcox
  2020-10-02 12:17 ` Michal Hocko
  0 siblings, 2 replies; 8+ messages in thread
From: Pintu Kumar @ 2020-10-02 11:57 UTC (permalink / raw)
  To: linux-kernel, akpm, linux-mm; +Cc: pintu, pintu.ping

The headless embedded devices often come with very limited amount
of RAM such as: 256MB or even lesser.
These types of system often rely on command line interface which can
execute system commands in the background using the fork/exec combination.
There could be even many child tasks invoked internally to handle multiple
requests.
In this scenario, if the parent task keeps committing large amount of
memory, there are chances that this commitment can easily overflow the
total RAM available in the system. Now if the parent process invokes fork
or system commands (using system() call) and the commitment ratio is at
50%, the request fails with the following, even though there are large
amount of free memory available in the system:
fork failed: Cannot allocate memory

If there are too many 3rd party tasks calling fork, it becomes difficult to
identify exactly which parent process is overcommitting memory.
Since, free memory is also available, this "Cannot allocate memory" from
fork creates confusion to application developer.

Thus, I found that this simple print message (even once) is helping in
quickly identifying the culprit.

This is the output we can see on a 256MB system and with a simple malloc
and fork program.

[root@ ~]# cat /proc/meminfo
MemTotal:         249520 kB   ==> 243MB
MemFree:          179100 kB

PPID	PID	USER	RSS	VSZ	STAT	ARGS
 150	164	root	1440	250580	S	./consume-and-fork.out 243

__vm_enough_memory: commitment overflow: ppid:150, pid:164, pages:62451
fork failed[count:0]: Cannot allocate memory

Signed-off-by: Pintu Kumar <pintu@codeaurora.org>
---
 mm/util.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/util.c b/mm/util.c
index 5ef378a..9431ce7a 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -895,6 +895,9 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
 error:
 	vm_unacct_memory(pages);
 
+	pr_err_once("%s: commitment overflow: ppid:%d, pid:%d, pages:%ld\n",
+			__func__, current->parent->pid, current->pid, pages);
+
 	return -ENOMEM;
 }
 
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/util.c: Add error logs for commitment overflow
  2020-10-02 11:57 [PATCH] mm/util.c: Add error logs for commitment overflow Pintu Kumar
@ 2020-10-02 12:11 ` Matthew Wilcox
  2020-10-02 16:18   ` pintu
  2020-10-02 12:17 ` Michal Hocko
  1 sibling, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2020-10-02 12:11 UTC (permalink / raw)
  To: Pintu Kumar; +Cc: linux-kernel, akpm, linux-mm, pintu.ping

On Fri, Oct 02, 2020 at 05:27:41PM +0530, Pintu Kumar wrote:
> This is the output we can see on a 256MB system and with a simple malloc
> and fork program.
> 
> [root@ ~]# cat /proc/meminfo
> MemTotal:         249520 kB   ==> 243MB
> MemFree:          179100 kB
> 
> PPID	PID	USER	RSS	VSZ	STAT	ARGS
>  150	164	root	1440	250580	S	./consume-and-fork.out 243
> 
> __vm_enough_memory: commitment overflow: ppid:150, pid:164, pages:62451
> fork failed[count:0]: Cannot allocate memory

I don't think the __func__ is particularly useful information.  I would
also expect the name of the process to be more interesting than the PID.
And why is the ppid useful?

Wouldn't this message be more useful?

fork: Would overcommit system (pid:162 name:./consume-and-fork.out)

ie put it in dup_mmap() and use current->comm

> +	pr_err_once("%s: commitment overflow: ppid:%d, pid:%d, pages:%ld\n",
> +			__func__, current->parent->pid, current->pid, pages);
> +

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/util.c: Add error logs for commitment overflow
  2020-10-02 11:57 [PATCH] mm/util.c: Add error logs for commitment overflow Pintu Kumar
  2020-10-02 12:11 ` Matthew Wilcox
@ 2020-10-02 12:17 ` Michal Hocko
  2020-10-02 16:23   ` pintu
  1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2020-10-02 12:17 UTC (permalink / raw)
  To: Pintu Kumar; +Cc: linux-kernel, akpm, linux-mm, pintu.ping

On Fri 02-10-20 17:27:41, Pintu Kumar wrote:
> The headless embedded devices often come with very limited amount
> of RAM such as: 256MB or even lesser.
> These types of system often rely on command line interface which can
> execute system commands in the background using the fork/exec combination.
> There could be even many child tasks invoked internally to handle multiple
> requests.
> In this scenario, if the parent task keeps committing large amount of
> memory, there are chances that this commitment can easily overflow the
> total RAM available in the system. Now if the parent process invokes fork
> or system commands (using system() call) and the commitment ratio is at
> 50%, the request fails with the following, even though there are large
> amount of free memory available in the system:
> fork failed: Cannot allocate memory
> 
> If there are too many 3rd party tasks calling fork, it becomes difficult to
> identify exactly which parent process is overcommitting memory.
> Since, free memory is also available, this "Cannot allocate memory" from
> fork creates confusion to application developer.
> 
> Thus, I found that this simple print message (even once) is helping in
> quickly identifying the culprit.
> 
> This is the output we can see on a 256MB system and with a simple malloc
> and fork program.
> 
> [root@ ~]# cat /proc/meminfo
> MemTotal:         249520 kB   ==> 243MB
> MemFree:          179100 kB
> 
> PPID	PID	USER	RSS	VSZ	STAT	ARGS
>  150	164	root	1440	250580	S	./consume-and-fork.out 243
> 
> __vm_enough_memory: commitment overflow: ppid:150, pid:164, pages:62451
> fork failed[count:0]: Cannot allocate memory

While I understand that fork failing due to overrcomit heuristic is non
intuitive and I have seen people scratching heads due to this in the
past I am not convinced this is a right approach to tackle the problem.
First off, referencing pids is not really going to help much if process
is short lived. Secondly, __vm_enough_memory is about any address space
allocation. Why would you be interested in parent when doing mmap?

Last but not least _once is questionable as well. The first instance
might happen early during the system lifetime and you will not learn
about future failures so the overall point of debuggability is seriously
inhibited.

Maybe what you want is to report higher up the call chain (fork?) and
have it ratelimited rather than _once? Or maybe just try to live with
the confusing situation?

> Signed-off-by: Pintu Kumar <pintu@codeaurora.org>
> ---
>  mm/util.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 5ef378a..9431ce7a 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -895,6 +895,9 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
>  error:
>  	vm_unacct_memory(pages);
>  
> +	pr_err_once("%s: commitment overflow: ppid:%d, pid:%d, pages:%ld\n",
> +			__func__, current->parent->pid, current->pid, pages);
> +
>  	return -ENOMEM;
>  }
>  
> -- 
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
> is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/util.c: Add error logs for commitment overflow
  2020-10-02 12:11 ` Matthew Wilcox
@ 2020-10-02 16:18   ` pintu
  0 siblings, 0 replies; 8+ messages in thread
From: pintu @ 2020-10-02 16:18 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-kernel, akpm, linux-mm, pintu.ping

On 2020-10-02 17:41, Matthew Wilcox wrote:

> 
> I don't think the __func__ is particularly useful information.  I would
> also expect the name of the process to be more interesting than the 
> PID.
> And why is the ppid useful?
> 

Dear Matthew, First, thank you so much for your review and comments.
I totally agree with you.
Yes, initially I included process-name but later removed it to shrink 
some lines.
I thought just pid should be enough to figure out the culprit process 
from dumps.
Okay, I agree __func__ can also be removed.
ppid, I thought might be useful, so I included it. Okay I will remove 
that too.

> Wouldn't this message be more useful?
> 
> fork: Would overcommit system (pid:162 name:./consume-and-fork.out)
> 

Okay, yes I think this should be sufficient.
But I think printing pages also should be good to indicate the users to 
lack
information about this commitment.

> ie put it in dup_mmap() and use current->comm

Sorry, this part about dup_mmap(), I could not understand.


Thanks,
Pintu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/util.c: Add error logs for commitment overflow
  2020-10-02 12:17 ` Michal Hocko
@ 2020-10-02 16:23   ` pintu
  2020-10-05  7:20     ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: pintu @ 2020-10-02 16:23 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, akpm, linux-mm, pintu.ping

On 2020-10-02 17:47, Michal Hocko wrote:

>> __vm_enough_memory: commitment overflow: ppid:150, pid:164, 
>> pages:62451
>> fork failed[count:0]: Cannot allocate memory
> 
> While I understand that fork failing due to overrcomit heuristic is non
> intuitive and I have seen people scratching heads due to this in the
> past I am not convinced this is a right approach to tackle the problem.

Dear Michal,
First, thank you so much for your review and comments.
I totally agree with you.

> First off, referencing pids is not really going to help much if process
> is short lived.

Yes, I agree with you.
But I think this is most important mainly for short lived processes 
itself.
Because, when this situation occurs, no one knows who could be the 
culprit.
However, user keeps dumping "ps" or "top" in background to reproduce 
once again.
At this time, we can easily match the pid, process-name (at least in 
most cases).

> Secondly, __vm_enough_memory is about any address space
> allocation. Why would you be interested in parent when doing mmap?
> 

Yes agree, we can remove ppid from here.
I thought it might be useful at least in case of fork (or short lived 
process).

> Last but not least _once is questionable as well. The first instance
> might happen early during the system lifetime and you will not learn
> about future failures so the overall point of debuggability is 
> seriously
> inhibited.
> 
> Maybe what you want is to report higher up the call chain (fork?) and
> have it ratelimited rather than _once? Or maybe just try to live with
> the confusing situation?
> 

Okay agree. I can change to pr_err_ratelimited.
In-fact, initially I thought to use ratelimited itself but then I 
thought
just once also should be fine at least.


Thanks,
Pintu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/util.c: Add error logs for commitment overflow
  2020-10-02 16:23   ` pintu
@ 2020-10-05  7:20     ` Michal Hocko
  2020-10-19  3:02       ` pintu
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2020-10-05  7:20 UTC (permalink / raw)
  To: pintu; +Cc: linux-kernel, akpm, linux-mm, pintu.ping

On Fri 02-10-20 21:53:37, pintu@codeaurora.org wrote:
> On 2020-10-02 17:47, Michal Hocko wrote:
> 
> > > __vm_enough_memory: commitment overflow: ppid:150, pid:164,
> > > pages:62451
> > > fork failed[count:0]: Cannot allocate memory
> > 
> > While I understand that fork failing due to overrcomit heuristic is non
> > intuitive and I have seen people scratching heads due to this in the
> > past I am not convinced this is a right approach to tackle the problem.
> 
> Dear Michal,
> First, thank you so much for your review and comments.
> I totally agree with you.
> 
> > First off, referencing pids is not really going to help much if process
> > is short lived.
> 
> Yes, I agree with you.
> But I think this is most important mainly for short lived processes itself.
> Because, when this situation occurs, no one knows who could be the culprit.

Pid will not tell you much for those processes, right?

> However, user keeps dumping "ps" or "top" in background to reproduce once
> again.

I do not think this would be an effective way to catch the problem.
Especially with _once reporting. My experience with these reports is
that a reporter notices a malfunctioning (usually more complex)
workload. In some cases ENOMEM from fork is reported into the log by the
userspace.

For others it is strace -f that tells us that fork is failing and a
test with OVERCOMMIT_ALWAYS usually confirms the theory that this is
the culprit. But a rule of thumb is that it is almost always overcommit
to blame. Why? An undocumented secret is that ENOMEM resulting from an
actual memory allocation in the fork/clone path is close to impossible
because kernel does all it can to satisfy them (an even invokes OOM
killer). There are exceptions (e.g. like high order allocation) but
those should be very rare in that path.

> At this time, we can easily match the pid, process-name (at least in most
> cases).

Maybe our definitions of short lived processes differ but in my book
those are pretty hard to catch in flight.

> > Secondly, __vm_enough_memory is about any address space
> > allocation. Why would you be interested in parent when doing mmap?
> > 
> 
> Yes agree, we can remove ppid from here.
> I thought it might be useful at least in case of fork (or short lived
> process).

I suspect you have missed my point here. Let me clarify a bit more.
__vm_enough_memory is called from much more places than fork.
Essentially any mmap, brk etc are going though this. This is where
parent pid certainly doesn't make any sense. In fork this is a different
case because your forked process pid on its own doesn't make much sense
as it is going to die very quickly anyway. This is when parent is likely
a more important information.

That being said the content really depends on the specific path and that
suggestes that you are trying to log at a wrong layer.

Another question is whether we really need a logging done by the kernel.
Ratelimiting would be tricky to get right and we do not want to allow an
easy way to swamp logs either.
As already mentioned ENOMEM usually means overcommit failure. Maybe we
want to be more explicit this in the man page?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/util.c: Add error logs for commitment overflow
  2020-10-05  7:20     ` Michal Hocko
@ 2020-10-19  3:02       ` pintu
  2020-10-19 13:06         ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: pintu @ 2020-10-19  3:02 UTC (permalink / raw)
  To: Michal Hocko, willy; +Cc: linux-kernel, akpm, linux-mm, pintu.ping

On 2020-10-05 12:50, Michal Hocko wrote:
> On Fri 02-10-20 21:53:37, pintu@codeaurora.org wrote:
>> On 2020-10-02 17:47, Michal Hocko wrote:
>> 
>> > > __vm_enough_memory: commitment overflow: ppid:150, pid:164,
>> > > pages:62451
>> > > fork failed[count:0]: Cannot allocate memory
>> >
>> > While I understand that fork failing due to overrcomit heuristic is non
>> > intuitive and I have seen people scratching heads due to this in the
>> > past I am not convinced this is a right approach to tackle the problem.
>> 
>> Dear Michal,
>> First, thank you so much for your review and comments.
>> I totally agree with you.
>> 
>> > First off, referencing pids is not really going to help much if process
>> > is short lived.
>> 
>> Yes, I agree with you.
>> But I think this is most important mainly for short lived processes 
>> itself.
>> Because, when this situation occurs, no one knows who could be the 
>> culprit.
> 
> Pid will not tell you much for those processes, right?
> 
>> However, user keeps dumping "ps" or "top" in background to reproduce 
>> once
>> again.
> 
> I do not think this would be an effective way to catch the problem.
> Especially with _once reporting. My experience with these reports is
> that a reporter notices a malfunctioning (usually more complex)
> workload. In some cases ENOMEM from fork is reported into the log by 
> the
> userspace.
> 
> For others it is strace -f that tells us that fork is failing and a
> test with OVERCOMMIT_ALWAYS usually confirms the theory that this is
> the culprit. But a rule of thumb is that it is almost always overcommit
> to blame. Why? An undocumented secret is that ENOMEM resulting from an
> actual memory allocation in the fork/clone path is close to impossible
> because kernel does all it can to satisfy them (an even invokes OOM
> killer). There are exceptions (e.g. like high order allocation) but
> those should be very rare in that path.
> 
>> At this time, we can easily match the pid, process-name (at least in 
>> most
>> cases).
> 
> Maybe our definitions of short lived processes differ but in my book
> those are pretty hard to catch in flight.
> 
>> > Secondly, __vm_enough_memory is about any address space
>> > allocation. Why would you be interested in parent when doing mmap?
>> >
>> 
>> Yes agree, we can remove ppid from here.
>> I thought it might be useful at least in case of fork (or short lived
>> process).
> 
> I suspect you have missed my point here. Let me clarify a bit more.
> __vm_enough_memory is called from much more places than fork.
> Essentially any mmap, brk etc are going though this. This is where
> parent pid certainly doesn't make any sense. In fork this is a 
> different
> case because your forked process pid on its own doesn't make much sense
> as it is going to die very quickly anyway. This is when parent is 
> likely
> a more important information.
> 
> That being said the content really depends on the specific path and 
> that
> suggestes that you are trying to log at a wrong layer.
> 
> Another question is whether we really need a logging done by the 
> kernel.
> Ratelimiting would be tricky to get right and we do not want to allow 
> an
> easy way to swamp logs either.
> As already mentioned ENOMEM usually means overcommit failure. Maybe we
> want to be more explicit this in the man page?


Thank you so much for your feedback.
First of all I am sorry for my delayed response.

As I understand, the conclusion here is that:
a) The pr_err_once is not helpful and neither pr_err_ratelimited ?
Even with this below logs:
+ pr_err_ratelimited("vm memory overflow: pid:%d, name: %s, 
pages:%ld\n",
+     current->pid, current->comm, pages);

b) This can be invoked from many places so we are adding the logging at 
wrong layer?
If so, any other better places which can be explored?

c) Adding logging at kernel layer is not the right approach to tackle 
this problem ?

d) Another thing we can do is, update the man page with more detailed 
information about this commitment overflow ?

e) May be returning ENOMEM (Cannot allocate memory) from VM path is 
slightly misleading for user space folks even though there are enough 
memory?
=> Either we can introduce ENOVMEM (Cannot create virtual memory 
mapping)
=> Or, update the documentation with approach to further debug this 
issue?

If there are any more suggestions to easily catch this issue please let 
us know, we can explore further.


Thanks,
Pintu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/util.c: Add error logs for commitment overflow
  2020-10-19  3:02       ` pintu
@ 2020-10-19 13:06         ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2020-10-19 13:06 UTC (permalink / raw)
  To: pintu; +Cc: willy, linux-kernel, akpm, linux-mm, pintu.ping

On Mon 19-10-20 08:32:34, pintu@codeaurora.org wrote:
[...]
> b) This can be invoked from many places so we are adding the logging at
> wrong layer?
> If so, any other better places which can be explored?

dup_mmap?

> c) Adding logging at kernel layer is not the right approach to tackle this
> problem ?
> 
> d) Another thing we can do is, update the man page with more detailed
> information about this commitment overflow ?

This is a good thing in general

> e) May be returning ENOMEM (Cannot allocate memory) from VM path is slightly
> misleading for user space folks even though there are enough memory?
> => Either we can introduce ENOVMEM (Cannot create virtual memory mapping)
> => Or, update the documentation with approach to further debug this issue?

No, it is close to impossible to add a new error code for interface that
is used so heavily.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-19 13:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-02 11:57 [PATCH] mm/util.c: Add error logs for commitment overflow Pintu Kumar
2020-10-02 12:11 ` Matthew Wilcox
2020-10-02 16:18   ` pintu
2020-10-02 12:17 ` Michal Hocko
2020-10-02 16:23   ` pintu
2020-10-05  7:20     ` Michal Hocko
2020-10-19  3:02       ` pintu
2020-10-19 13:06         ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.