All of lore.kernel.org
 help / color / mirror / Atom feed
* [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
@ 2014-01-20 10:36 ` Jianguo Wu
  0 siblings, 0 replies; 12+ messages in thread
From: Jianguo Wu @ 2014-01-20 10:36 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner, Rik van Riel, David Rientjes,
	linux-mm, linux-kernel

When OOM happen, will dump buddy free areas info, hugetlb pages info,
memory state of all eligible tasks, per-cpu memory info.
But do not dump slab/vmalloc info, sometime, it's not enough to figure out the
reason OOM happened.

So, my questions are:
1. Should dump slab/vmalloc info when OOM happen? Though we can get these from proc file,
but usually we do not monitor the logs and check proc file immediately when OOM happened.

2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be dumped?

3. Without these info, usually how to figure out OOM reason?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
@ 2014-01-20 10:36 ` Jianguo Wu
  0 siblings, 0 replies; 12+ messages in thread
From: Jianguo Wu @ 2014-01-20 10:36 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner, Rik van Riel, David Rientjes,
	linux-mm, linux-kernel

When OOM happen, will dump buddy free areas info, hugetlb pages info,
memory state of all eligible tasks, per-cpu memory info.
But do not dump slab/vmalloc info, sometime, it's not enough to figure out the
reason OOM happened.

So, my questions are:
1. Should dump slab/vmalloc info when OOM happen? Though we can get these from proc file,
but usually we do not monitor the logs and check proc file immediately when OOM happened.

2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be dumped?

3. Without these info, usually how to figure out OOM reason?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
  2014-01-20 10:36 ` Jianguo Wu
@ 2014-01-21  5:34   ` David Rientjes
  -1 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2014-01-21  5:34 UTC (permalink / raw)
  To: Jianguo Wu
  Cc: Andrew Morton, Johannes Weiner, Rik van Riel, linux-mm, linux-kernel

On Mon, 20 Jan 2014, Jianguo Wu wrote:

> When OOM happen, will dump buddy free areas info, hugetlb pages info,
> memory state of all eligible tasks, per-cpu memory info.
> But do not dump slab/vmalloc info, sometime, it's not enough to figure out the
> reason OOM happened.
> 
> So, my questions are:
> 1. Should dump slab/vmalloc info when OOM happen? Though we can get these from proc file,
> but usually we do not monitor the logs and check proc file immediately when OOM happened.
> 

The problem is that slabinfo becomes excessively verbose and dumping it 
all to the kernel log often times causes important messages to be lost.  
This is why we control things like the tasklist dump with a VM sysctl.  It 
would be possible to dump, say, the top ten slab caches with the highest 
memory usage, but it will only be helpful for slab leaks.  Typically there 
are better debugging tools available than analyzing the kernel log; if you 
see unusually high slab memory in the meminfo dump, you can enable it.

> 2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be dumped?
> 

Also very verbose and would cause important messages to be lost, we try to 
avoid spamming the kernel log with all of this information as much as 
possible.

> 3. Without these info, usually how to figure out OOM reason?
> 

Analyze the memory usage in the meminfo and determine what is unusually 
high; if it's mostly anonymous memory, you can usually correlate it back 
to a high rss for a process in the tasklist that you didn't suspect to be 
using that much memory, for example.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
@ 2014-01-21  5:34   ` David Rientjes
  0 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2014-01-21  5:34 UTC (permalink / raw)
  To: Jianguo Wu
  Cc: Andrew Morton, Johannes Weiner, Rik van Riel, linux-mm, linux-kernel

On Mon, 20 Jan 2014, Jianguo Wu wrote:

> When OOM happen, will dump buddy free areas info, hugetlb pages info,
> memory state of all eligible tasks, per-cpu memory info.
> But do not dump slab/vmalloc info, sometime, it's not enough to figure out the
> reason OOM happened.
> 
> So, my questions are:
> 1. Should dump slab/vmalloc info when OOM happen? Though we can get these from proc file,
> but usually we do not monitor the logs and check proc file immediately when OOM happened.
> 

The problem is that slabinfo becomes excessively verbose and dumping it 
all to the kernel log often times causes important messages to be lost.  
This is why we control things like the tasklist dump with a VM sysctl.  It 
would be possible to dump, say, the top ten slab caches with the highest 
memory usage, but it will only be helpful for slab leaks.  Typically there 
are better debugging tools available than analyzing the kernel log; if you 
see unusually high slab memory in the meminfo dump, you can enable it.

> 2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be dumped?
> 

Also very verbose and would cause important messages to be lost, we try to 
avoid spamming the kernel log with all of this information as much as 
possible.

> 3. Without these info, usually how to figure out OOM reason?
> 

Analyze the memory usage in the meminfo and determine what is unusually 
high; if it's mostly anonymous memory, you can usually correlate it back 
to a high rss for a process in the tasklist that you didn't suspect to be 
using that much memory, for example.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
  2014-01-21  5:34   ` David Rientjes
@ 2014-01-21 12:40     ` Jianguo Wu
  -1 siblings, 0 replies; 12+ messages in thread
From: Jianguo Wu @ 2014-01-21 12:40 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Johannes Weiner, Rik van Riel, linux-mm, linux-kernel

On 2014/1/21 13:34, David Rientjes wrote:

> On Mon, 20 Jan 2014, Jianguo Wu wrote:
> 
>> When OOM happen, will dump buddy free areas info, hugetlb pages info,
>> memory state of all eligible tasks, per-cpu memory info.
>> But do not dump slab/vmalloc info, sometime, it's not enough to figure out the
>> reason OOM happened.
>>
>> So, my questions are:
>> 1. Should dump slab/vmalloc info when OOM happen? Though we can get these from proc file,
>> but usually we do not monitor the logs and check proc file immediately when OOM happened.
>>
> 

Hi David,
Thank you for your patience to answer!

> The problem is that slabinfo becomes excessively verbose and dumping it 
> all to the kernel log often times causes important messages to be lost.  
> This is why we control things like the tasklist dump with a VM sysctl.  It 
> would be possible to dump, say, the top ten slab caches with the highest 
> memory usage, but it will only be helpful for slab leaks.  Typically there 
> are better debugging tools available than analyzing the kernel log; if you 
> see unusually high slab memory in the meminfo dump, you can enable it.
> 

But, when OOM has happened, we can only use kernel log, slab/vmalloc info from proc
is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 10/20 entrys?

Thanks.

>> 2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be dumped?

>>
> 
> Also very verbose and would cause important messages to be lost, we try to 
> avoid spamming the kernel log with all of this information as much as 
> possible.
> 
>> 3. Without these info, usually how to figure out OOM reason?
>>
> 
> Analyze the memory usage in the meminfo and determine what is unusually 
> high; if it's mostly anonymous memory, you can usually correlate it back 
> to a high rss for a process in the tasklist that you didn't suspect to be 
> using that much memory, for example.
> 
> 




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
@ 2014-01-21 12:40     ` Jianguo Wu
  0 siblings, 0 replies; 12+ messages in thread
From: Jianguo Wu @ 2014-01-21 12:40 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Johannes Weiner, Rik van Riel, linux-mm, linux-kernel

On 2014/1/21 13:34, David Rientjes wrote:

> On Mon, 20 Jan 2014, Jianguo Wu wrote:
> 
>> When OOM happen, will dump buddy free areas info, hugetlb pages info,
>> memory state of all eligible tasks, per-cpu memory info.
>> But do not dump slab/vmalloc info, sometime, it's not enough to figure out the
>> reason OOM happened.
>>
>> So, my questions are:
>> 1. Should dump slab/vmalloc info when OOM happen? Though we can get these from proc file,
>> but usually we do not monitor the logs and check proc file immediately when OOM happened.
>>
> 

Hi David,
Thank you for your patience to answer!

> The problem is that slabinfo becomes excessively verbose and dumping it 
> all to the kernel log often times causes important messages to be lost.  
> This is why we control things like the tasklist dump with a VM sysctl.  It 
> would be possible to dump, say, the top ten slab caches with the highest 
> memory usage, but it will only be helpful for slab leaks.  Typically there 
> are better debugging tools available than analyzing the kernel log; if you 
> see unusually high slab memory in the meminfo dump, you can enable it.
> 

But, when OOM has happened, we can only use kernel log, slab/vmalloc info from proc
is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 10/20 entrys?

Thanks.

>> 2. /proc/$pid/smaps and pagecache info also helpful when OOM, should also be dumped?

>>
> 
> Also very verbose and would cause important messages to be lost, we try to 
> avoid spamming the kernel log with all of this information as much as 
> possible.
> 
>> 3. Without these info, usually how to figure out OOM reason?
>>
> 
> Analyze the memory usage in the meminfo and determine what is unusually 
> high; if it's mostly anonymous memory, you can usually correlate it back 
> to a high rss for a process in the tasklist that you didn't suspect to be 
> using that much memory, for example.
> 
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
  2014-01-21 12:40     ` Jianguo Wu
@ 2014-01-21 20:41       ` David Rientjes
  -1 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2014-01-21 20:41 UTC (permalink / raw)
  To: Jianguo Wu
  Cc: Andrew Morton, Johannes Weiner, Rik van Riel, linux-mm, linux-kernel

On Tue, 21 Jan 2014, Jianguo Wu wrote:

> > The problem is that slabinfo becomes excessively verbose and dumping it 
> > all to the kernel log often times causes important messages to be lost.  
> > This is why we control things like the tasklist dump with a VM sysctl.  It 
> > would be possible to dump, say, the top ten slab caches with the highest 
> > memory usage, but it will only be helpful for slab leaks.  Typically there 
> > are better debugging tools available than analyzing the kernel log; if you 
> > see unusually high slab memory in the meminfo dump, you can enable it.
> > 
> 
> But, when OOM has happened, we can only use kernel log, slab/vmalloc info from proc
> is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 10/20 entrys?
> 

You could, but it's a tradeoff between how much to dump to a general 
resource such as the kernel log and how many sysctls we add that control 
every possible thing.  Slab leaks would definitely be a minority of oom 
conditions and you should normally be able to reproduce them by running 
the same workload; just use slabtop(1) or manually inspect /proc/slabinfo 
while such a workload is running for indicators.  I don't think we want to 
add the information by default, though, nor do we want to add sysctls to 
control the behavior (you'd still need to reproduce the issue after 
enabling it).

We are currently discussing userspace oom handlers, though, that would 
allow you to run a process that would be notified and allowed to allocate 
a small amount of memory on oom conditions.  It would then be trivial to 
dump any information you feel pertinent in userspace prior to killing 
something.  I like to inspect heap profiles for memory hogs while 
debugging our malloc() issues, for example, and you could look more 
closely at kernel memory.

I'll cc you on future discussions of that feature.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
@ 2014-01-21 20:41       ` David Rientjes
  0 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2014-01-21 20:41 UTC (permalink / raw)
  To: Jianguo Wu
  Cc: Andrew Morton, Johannes Weiner, Rik van Riel, linux-mm, linux-kernel

On Tue, 21 Jan 2014, Jianguo Wu wrote:

> > The problem is that slabinfo becomes excessively verbose and dumping it 
> > all to the kernel log often times causes important messages to be lost.  
> > This is why we control things like the tasklist dump with a VM sysctl.  It 
> > would be possible to dump, say, the top ten slab caches with the highest 
> > memory usage, but it will only be helpful for slab leaks.  Typically there 
> > are better debugging tools available than analyzing the kernel log; if you 
> > see unusually high slab memory in the meminfo dump, you can enable it.
> > 
> 
> But, when OOM has happened, we can only use kernel log, slab/vmalloc info from proc
> is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 10/20 entrys?
> 

You could, but it's a tradeoff between how much to dump to a general 
resource such as the kernel log and how many sysctls we add that control 
every possible thing.  Slab leaks would definitely be a minority of oom 
conditions and you should normally be able to reproduce them by running 
the same workload; just use slabtop(1) or manually inspect /proc/slabinfo 
while such a workload is running for indicators.  I don't think we want to 
add the information by default, though, nor do we want to add sysctls to 
control the behavior (you'd still need to reproduce the issue after 
enabling it).

We are currently discussing userspace oom handlers, though, that would 
allow you to run a process that would be notified and allowed to allocate 
a small amount of memory on oom conditions.  It would then be trivial to 
dump any information you feel pertinent in userspace prior to killing 
something.  I like to inspect heap profiles for memory hogs while 
debugging our malloc() issues, for example, and you could look more 
closely at kernel memory.

I'll cc you on future discussions of that feature.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
  2014-01-21 20:41       ` David Rientjes
@ 2014-02-11  4:06         ` Jianguo Wu
  -1 siblings, 0 replies; 12+ messages in thread
From: Jianguo Wu @ 2014-02-11  4:06 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Johannes Weiner, Rik van Riel, linux-mm, linux-kernel

On 2014/1/22 4:41, David Rientjes wrote:

> On Tue, 21 Jan 2014, Jianguo Wu wrote:
> 
>>> The problem is that slabinfo becomes excessively verbose and dumping it 
>>> all to the kernel log often times causes important messages to be lost.  
>>> This is why we control things like the tasklist dump with a VM sysctl.  It 
>>> would be possible to dump, say, the top ten slab caches with the highest 
>>> memory usage, but it will only be helpful for slab leaks.  Typically there 
>>> are better debugging tools available than analyzing the kernel log; if you 
>>> see unusually high slab memory in the meminfo dump, you can enable it.
>>>
>>
>> But, when OOM has happened, we can only use kernel log, slab/vmalloc info from proc
>> is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 10/20 entrys?
>>
> 
> You could, but it's a tradeoff between how much to dump to a general 
> resource such as the kernel log and how many sysctls we add that control 
> every possible thing.  Slab leaks would definitely be a minority of oom 
> conditions and you should normally be able to reproduce them by running 
> the same workload; just use slabtop(1) or manually inspect /proc/slabinfo 
> while such a workload is running for indicators.  I don't think we want to 
> add the information by default, though, nor do we want to add sysctls to 
> control the behavior (you'd still need to reproduce the issue after 
> enabling it).
> 
> We are currently discussing userspace oom handlers, though, that would 
> allow you to run a process that would be notified and allowed to allocate 
> a small amount of memory on oom conditions.  It would then be trivial to 
> dump any information you feel pertinent in userspace prior to killing 
> something.  I like to inspect heap profiles for memory hogs while 
> debugging our malloc() issues, for example, and you could look more 
> closely at kernel memory.
> 
> I'll cc you on future discussions of that feature.
> 

Hi David,

Thanks for your kindly explanation, do you have any specific plans on this?

Thanks,
Jianguo Wu.

> 




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
@ 2014-02-11  4:06         ` Jianguo Wu
  0 siblings, 0 replies; 12+ messages in thread
From: Jianguo Wu @ 2014-02-11  4:06 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Johannes Weiner, Rik van Riel, linux-mm, linux-kernel

On 2014/1/22 4:41, David Rientjes wrote:

> On Tue, 21 Jan 2014, Jianguo Wu wrote:
> 
>>> The problem is that slabinfo becomes excessively verbose and dumping it 
>>> all to the kernel log often times causes important messages to be lost.  
>>> This is why we control things like the tasklist dump with a VM sysctl.  It 
>>> would be possible to dump, say, the top ten slab caches with the highest 
>>> memory usage, but it will only be helpful for slab leaks.  Typically there 
>>> are better debugging tools available than analyzing the kernel log; if you 
>>> see unusually high slab memory in the meminfo dump, you can enable it.
>>>
>>
>> But, when OOM has happened, we can only use kernel log, slab/vmalloc info from proc
>> is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 10/20 entrys?
>>
> 
> You could, but it's a tradeoff between how much to dump to a general 
> resource such as the kernel log and how many sysctls we add that control 
> every possible thing.  Slab leaks would definitely be a minority of oom 
> conditions and you should normally be able to reproduce them by running 
> the same workload; just use slabtop(1) or manually inspect /proc/slabinfo 
> while such a workload is running for indicators.  I don't think we want to 
> add the information by default, though, nor do we want to add sysctls to 
> control the behavior (you'd still need to reproduce the issue after 
> enabling it).
> 
> We are currently discussing userspace oom handlers, though, that would 
> allow you to run a process that would be notified and allowed to allocate 
> a small amount of memory on oom conditions.  It would then be trivial to 
> dump any information you feel pertinent in userspace prior to killing 
> something.  I like to inspect heap profiles for memory hogs while 
> debugging our malloc() issues, for example, and you could look more 
> closely at kernel memory.
> 
> I'll cc you on future discussions of that feature.
> 

Hi David,

Thanks for your kindly explanation, do you have any specific plans on this?

Thanks,
Jianguo Wu.

> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
  2014-02-11  4:06         ` Jianguo Wu
@ 2014-02-12  0:28           ` David Rientjes
  -1 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2014-02-12  0:28 UTC (permalink / raw)
  To: Jianguo Wu
  Cc: Andrew Morton, Johannes Weiner, Rik van Riel, linux-mm, linux-kernel

On Tue, 11 Feb 2014, Jianguo Wu wrote:

> Thanks for your kindly explanation, do you have any specific plans on this?
> 

We're going to be discussing it at the LSF/mm conference at the end of 
March.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?
@ 2014-02-12  0:28           ` David Rientjes
  0 siblings, 0 replies; 12+ messages in thread
From: David Rientjes @ 2014-02-12  0:28 UTC (permalink / raw)
  To: Jianguo Wu
  Cc: Andrew Morton, Johannes Weiner, Rik van Riel, linux-mm, linux-kernel

On Tue, 11 Feb 2014, Jianguo Wu wrote:

> Thanks for your kindly explanation, do you have any specific plans on this?
> 

We're going to be discussing it at the LSF/mm conference at the end of 
March.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-02-12  0:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-20 10:36 [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM? Jianguo Wu
2014-01-20 10:36 ` Jianguo Wu
2014-01-21  5:34 ` David Rientjes
2014-01-21  5:34   ` David Rientjes
2014-01-21 12:40   ` Jianguo Wu
2014-01-21 12:40     ` Jianguo Wu
2014-01-21 20:41     ` David Rientjes
2014-01-21 20:41       ` David Rientjes
2014-02-11  4:06       ` Jianguo Wu
2014-02-11  4:06         ` Jianguo Wu
2014-02-12  0:28         ` David Rientjes
2014-02-12  0:28           ` David Rientjes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.