Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
       [not found] <1806FE86&#45;9508&#45;43BC&#45;8E2F&#45;3620CD243B14@lca.pw>
@ 2019-12-26  4:01   ` Miles Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-26  4:01 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, Michal Hocko, linux-kernel, linux-mm,
	linux-mediatek, wsd_upstream, Miles Chen

> Not sure if you have code that can share but I can't imagine there are many places that would have a single call site in the driver doing alloc_pages() over and over again. For example, there is only two alloc_pages() in intel-iommu.c with one is only in the cold path, so even if alloc_pgtable_page() one do leaking, it is still up to there air if your patch will catch it because it may not a single call site and it needs to leak significant amount of memory to be the greatest consumer where it is just not so realistic. 

That is what the patch does -- targeting on the memory leakage which causes an OOM kernel panic, so the greatest consumer information helps (the amount of leakage is big enough to cause an OOM kernel panic)

I've posted the number of real problems since 2019/5 I solved by this approach.

  Miles

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-26  4:01   ` Miles Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-26  4:01 UTC (permalink / raw)
  To: Qian Cai
  Cc: Michal Hocko, wsd_upstream, linux-kernel, linux-mm, Miles Chen,
	linux-mediatek, Andrew Morton

> Not sure if you have code that can share but I can't imagine there are many places that would have a single call site in the driver doing alloc_pages() over and over again. For example, there is only two alloc_pages() in intel-iommu.c with one is only in the cold path, so even if alloc_pgtable_page() one do leaking, it is still up to there air if your patch will catch it because it may not a single call site and it needs to leak significant amount of memory to be the greatest consumer where it is just not so realistic. 

That is what the patch does -- targeting on the memory leakage which causes an OOM kernel panic, so the greatest consumer information helps (the amount of leakage is big enough to cause an OOM kernel panic)

I've posted the number of real problems since 2019/5 I solved by this approach.

  Miles
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
  2019-12-26  4:01   ` Miles Chen
@ 2019-12-26  5:53     ` Qian Cai
  -1 siblings, 0 replies; 26+ messages in thread
From: Qian Cai @ 2019-12-26  5:53 UTC (permalink / raw)
  To: Miles Chen
  Cc: Andrew Morton, Michal Hocko, linux-kernel, linux-mm,
	linux-mediatek, wsd_upstream



> On Dec 25, 2019, at 11:01 PM, Miles Chen <miles.chen@mediatek.com> wrote:
> 
> That is what the patch does -- targeting on the memory leakage which causes an OOM kernel panic, so the greatest consumer information helps (the amount of leakage is big enough to cause an OOM kernel panic)
> 
> I've posted the number of real problems since 2019/5 I solved by this approach.

The point is in order to make your debugging patch upstream, it has to be general useful. Right now, it feels rather situational for me for the reasons given in the previous emails.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-26  5:53     ` Qian Cai
  0 siblings, 0 replies; 26+ messages in thread
From: Qian Cai @ 2019-12-26  5:53 UTC (permalink / raw)
  To: Miles Chen
  Cc: Michal Hocko, wsd_upstream, linux-kernel, linux-mm,
	linux-mediatek, Andrew Morton

> On Dec 25, 2019, at 11:01 PM, Miles Chen <miles.chen@mediatek.com> wrote:
> 
> That is what the patch does -- targeting on the memory leakage which causes an OOM kernel panic, so the greatest consumer information helps (the amount of leakage is big enough to cause an OOM kernel panic)
> 
> I've posted the number of real problems since 2019/5 I solved by this approach.

The point is in order to make your debugging patch upstream, it has to be general useful. Right now, it feels rather situational for me for the reasons given in the previous emails.
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
  2019-12-26  5:53     ` Qian Cai
@ 2019-12-27  7:44       ` Miles Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-27  7:44 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, Michal Hocko, linux-kernel, linux-mm,
	linux-mediatek, wsd_upstream

On Thu, 2019-12-26 at 00:53 -0500, Qian Cai wrote:
> 
> > On Dec 25, 2019, at 11:01 PM, Miles Chen <miles.chen@mediatek.com> wrote:
> > 
> > That is what the patch does -- targeting on the memory leakage which causes an OOM kernel panic, so the greatest consumer information helps (the amount of leakage is big enough to cause an OOM kernel panic)
> > 
> > I've posted the number of real problems since 2019/5 I solved by this approach.
> 
> The point is in order to make your debugging patch upstream, it has to be general useful. Right now, 

> it feels rather situational for me for the reasons given in the previous emails.

It's not complete situation.

I've listed different OOM panic situations in previous email [1]
and what we can do about them with current information.

There are some cases which cannot be covered by current information
easily.
For example: a memory leakage caused by alloc_pages() or vmalloc() with
a large size.
I keep seeing these issues for years and that's why I built this patch. 
It's like a missing piece of the puzzle.

To prove that the approach is practical and useful, I have collected
real test cases
under real devices and posted the test result in the commit message.
These are real cases, not my imagination.

[1] https://lkml.org/lkml/2019/12/25/53

thanks again for your comments

  Miles

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-27  7:44       ` Miles Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-27  7:44 UTC (permalink / raw)
  To: Qian Cai
  Cc: Michal Hocko, wsd_upstream, linux-kernel, linux-mm,
	linux-mediatek, Andrew Morton

On Thu, 2019-12-26 at 00:53 -0500, Qian Cai wrote:
> 
> > On Dec 25, 2019, at 11:01 PM, Miles Chen <miles.chen@mediatek.com> wrote:
> > 
> > That is what the patch does -- targeting on the memory leakage which causes an OOM kernel panic, so the greatest consumer information helps (the amount of leakage is big enough to cause an OOM kernel panic)
> > 
> > I've posted the number of real problems since 2019/5 I solved by this approach.
> 
> The point is in order to make your debugging patch upstream, it has to be general useful. Right now, 

> it feels rather situational for me for the reasons given in the previous emails.

It's not complete situation.

I've listed different OOM panic situations in previous email [1]
and what we can do about them with current information.

There are some cases which cannot be covered by current information
easily.
For example: a memory leakage caused by alloc_pages() or vmalloc() with
a large size.
I keep seeing these issues for years and that's why I built this patch. 
It's like a missing piece of the puzzle.

To prove that the approach is practical and useful, I have collected
real test cases
under real devices and posted the test result in the commit message.
These are real cases, not my imagination.

[1] https://lkml.org/lkml/2019/12/25/53

thanks again for your comments

  Miles
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
  2019-12-27  7:44       ` Miles Chen
@ 2019-12-27 13:46         ` Qian Cai
  -1 siblings, 0 replies; 26+ messages in thread
From: Qian Cai @ 2019-12-27 13:46 UTC (permalink / raw)
  To: Miles Chen
  Cc: Andrew Morton, Michal Hocko, linux-kernel, linux-mm,
	linux-mediatek, wsd_upstream

> On Dec 27, 2019, at 2:44 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> 
> It's not complete situation.
> 
> I've listed different OOM panic situations in previous email [1]
> and what we can do about them with current information.
> 
> There are some cases which cannot be covered by current information
> easily.
> For example: a memory leakage caused by alloc_pages() or vmalloc() with
> a large size.
> I keep seeing these issues for years and that's why I built this patch. 
> It's like a missing piece of the puzzle.
> 
> To prove that the approach is practical and useful, I have collected
> real test cases
> under real devices and posted the test result in the commit message.
> These are real cases, not my imagination.

Of course this may help debug *your* problems in the past, but if that is the only requirement to merge the debugging patch like this, we would end up with endless of those. If your goal is to stop developers from reproducing issues unnecessarily again using page_owner to debug, then your patch does not help much for the majority of other developers’ issues.

The page_owner is designed to give information about the top candidates that might cause issues, so it make somewhat sense if it dumps the top 10 greatest memory consumer for example, but that also clutter the OOM report so much, so it is no-go.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-27 13:46         ` Qian Cai
  0 siblings, 0 replies; 26+ messages in thread
From: Qian Cai @ 2019-12-27 13:46 UTC (permalink / raw)
  To: Miles Chen
  Cc: Michal Hocko, wsd_upstream, linux-kernel, linux-mm,
	linux-mediatek, Andrew Morton

> On Dec 27, 2019, at 2:44 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> 
> It's not complete situation.
> 
> I've listed different OOM panic situations in previous email [1]
> and what we can do about them with current information.
> 
> There are some cases which cannot be covered by current information
> easily.
> For example: a memory leakage caused by alloc_pages() or vmalloc() with
> a large size.
> I keep seeing these issues for years and that's why I built this patch. 
> It's like a missing piece of the puzzle.
> 
> To prove that the approach is practical and useful, I have collected
> real test cases
> under real devices and posted the test result in the commit message.
> These are real cases, not my imagination.

Of course this may help debug *your* problems in the past, but if that is the only requirement to merge the debugging patch like this, we would end up with endless of those. If your goal is to stop developers from reproducing issues unnecessarily again using page_owner to debug, then your patch does not help much for the majority of other developers’ issues.

The page_owner is designed to give information about the top candidates that might cause issues, so it make somewhat sense if it dumps the top 10 greatest memory consumer for example, but that also clutter the OOM report so much, so it is no-go.
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
  2019-12-27 13:46         ` Qian Cai
@ 2019-12-30  1:30           ` Miles Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-30  1:30 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, Michal Hocko, linux-kernel, linux-mm,
	linux-mediatek, wsd_upstream

On Fri, 2019-12-27 at 08:46 -0500, Qian Cai wrote:
> 
> > On Dec 27, 2019, at 2:44 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> > 
> > It's not complete situation.
> > 
> > I've listed different OOM panic situations in previous email [1]
> > and what we can do about them with current information.
> > 
> > There are some cases which cannot be covered by current information
> > easily.
> > For example: a memory leakage caused by alloc_pages() or vmalloc() with
> > a large size.
> > I keep seeing these issues for years and that's why I built this patch. 
> > It's like a missing piece of the puzzle.
> > 
> > To prove that the approach is practical and useful, I have collected
> > real test cases
> > under real devices and posted the test result in the commit message.
> > These are real cases, not my imagination.
> 
> Of course this may help debug *your* problems in the past, but if that is the only requirement to merge the debugging patch like this, we would end up with endless of those. If your goal is to stop developers from reproducing issues unnecessarily again using page_owner to debug, then your patch does not help much for the majority of other developers’ issues.
> 
> The page_owner is designed to give information about the top candidates that might cause issues, so it make somewhat sense if it dumps the top 10 greatest memory consumer for example, but that also clutter the OOM report so much, so it is no-go.

Yes, printing top 10 will be too much. That's why I print only the
greatest consumer, and test if this approach works.

I will resend this patch after the break. Let's wait for others'
comments?


   Miles

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-30  1:30           ` Miles Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-30  1:30 UTC (permalink / raw)
  To: Qian Cai
  Cc: Michal Hocko, wsd_upstream, linux-kernel, linux-mm,
	linux-mediatek, Andrew Morton

On Fri, 2019-12-27 at 08:46 -0500, Qian Cai wrote:
> 
> > On Dec 27, 2019, at 2:44 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> > 
> > It's not complete situation.
> > 
> > I've listed different OOM panic situations in previous email [1]
> > and what we can do about them with current information.
> > 
> > There are some cases which cannot be covered by current information
> > easily.
> > For example: a memory leakage caused by alloc_pages() or vmalloc() with
> > a large size.
> > I keep seeing these issues for years and that's why I built this patch. 
> > It's like a missing piece of the puzzle.
> > 
> > To prove that the approach is practical and useful, I have collected
> > real test cases
> > under real devices and posted the test result in the commit message.
> > These are real cases, not my imagination.
> 
> Of course this may help debug *your* problems in the past, but if that is the only requirement to merge the debugging patch like this, we would end up with endless of those. If your goal is to stop developers from reproducing issues unnecessarily again using page_owner to debug, then your patch does not help much for the majority of other developers’ issues.
> 
> The page_owner is designed to give information about the top candidates that might cause issues, so it make somewhat sense if it dumps the top 10 greatest memory consumer for example, but that also clutter the OOM report so much, so it is no-go.

Yes, printing top 10 will be too much. That's why I print only the
greatest consumer, and test if this approach works.

I will resend this patch after the break. Let's wait for others'
comments?


   Miles
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
  2019-12-30  1:30           ` Miles Chen
@ 2019-12-30  1:51             ` Qian Cai
  -1 siblings, 0 replies; 26+ messages in thread
From: Qian Cai @ 2019-12-30  1:51 UTC (permalink / raw)
  To: Miles Chen
  Cc: Andrew Morton, Michal Hocko, linux-kernel, linux-mm,
	linux-mediatek, wsd_upstream



> On Dec 29, 2019, at 8:30 PM, Miles Chen <miles.chen@mediatek.com> wrote:
> 
> Yes, printing top 10 will be too much. That's why I print only the
> greatest consumer, and test if this approach works.
> 
> I will resend this patch after the break. Let's wait for others'
> comments?

Sure, but to make my point clear.

Nacked-by: Qian Cai <cai@lca.pw>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-30  1:51             ` Qian Cai
  0 siblings, 0 replies; 26+ messages in thread
From: Qian Cai @ 2019-12-30  1:51 UTC (permalink / raw)
  To: Miles Chen
  Cc: Michal Hocko, wsd_upstream, linux-kernel, linux-mm,
	linux-mediatek, Andrew Morton



> On Dec 29, 2019, at 8:30 PM, Miles Chen <miles.chen@mediatek.com> wrote:
> 
> Yes, printing top 10 will be too much. That's why I print only the
> greatest consumer, and test if this approach works.
> 
> I will resend this patch after the break. Let's wait for others'
> comments?

Sure, but to make my point clear.

Nacked-by: Qian Cai <cai@lca.pw>

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
  2019-12-30  1:51             ` Qian Cai
@ 2019-12-30  3:28               ` Miles Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-30  3:28 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, Michal Hocko, linux-kernel, linux-mm,
	linux-mediatek, wsd_upstream

On Sun, 2019-12-29 at 20:51 -0500, Qian Cai wrote:
> 
> > On Dec 29, 2019, at 8:30 PM, Miles Chen <miles.chen@mediatek.com> wrote:
> > 
> > Yes, printing top 10 will be too much. That's why I print only the
> > greatest consumer, and test if this approach works.
> > 
> > I will resend this patch after the break. Let's wait for others'
> > comments?
> 
> Sure, but to make my point clear.

No problem. Thanks for your replies.

  Miles
> 
> Nacked-by: Qian Cai <cai@lca.pw>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-30  3:28               ` Miles Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-30  3:28 UTC (permalink / raw)
  To: Qian Cai
  Cc: Michal Hocko, wsd_upstream, linux-kernel, linux-mm,
	linux-mediatek, Andrew Morton

On Sun, 2019-12-29 at 20:51 -0500, Qian Cai wrote:
> 
> > On Dec 29, 2019, at 8:30 PM, Miles Chen <miles.chen@mediatek.com> wrote:
> > 
> > Yes, printing top 10 will be too much. That's why I print only the
> > greatest consumer, and test if this approach works.
> > 
> > I will resend this patch after the break. Let's wait for others'
> > comments?
> 
> Sure, but to make my point clear.

No problem. Thanks for your replies.

  Miles
> 
> Nacked-by: Qian Cai <cai@lca.pw>


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
  2019-12-25  9:29         ` Miles Chen
@ 2019-12-25 13:53           ` Qian Cai
  -1 siblings, 0 replies; 26+ messages in thread
From: Qian Cai @ 2019-12-25 13:53 UTC (permalink / raw)
  To: Miles Chen
  Cc: Andrew Morton, Michal Hocko, linux-kernel, linux-mm,
	linux-mediatek, wsd_upstream

> On Dec 25, 2019, at 4:29 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> 
> For example, we're implementing our iommu driver and there are many
> alloc_pages() in drivers/iommu.
> This approach helps us located some memory leakages in our
> implementation.

Not sure if you have code that can share but I can’t imagine there are many places that would have a single call site in the driver doing alloc_pages() over and over again. For example, there is only two alloc_pages() in intel-iommu.c with one is only in the cold path, so even if alloc_pgtable_page() one do leaking, it is still up to there air if your patch will catch it because it may not a single call site and it needs to leak significant amount of memory to be the greatest consumer where it is just not so realistic. For debugging point of view, IMO it is better to annotate this one alloc_pages() call when in doubt, so that kmemleak would catch it instead.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-25 13:53           ` Qian Cai
  0 siblings, 0 replies; 26+ messages in thread
From: Qian Cai @ 2019-12-25 13:53 UTC (permalink / raw)
  To: Miles Chen
  Cc: Michal Hocko, wsd_upstream, linux-kernel, linux-mm,
	linux-mediatek, Andrew Morton

> On Dec 25, 2019, at 4:29 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> 
> For example, we're implementing our iommu driver and there are many
> alloc_pages() in drivers/iommu.
> This approach helps us located some memory leakages in our
> implementation.

Not sure if you have code that can share but I can’t imagine there are many places that would have a single call site in the driver doing alloc_pages() over and over again. For example, there is only two alloc_pages() in intel-iommu.c with one is only in the cold path, so even if alloc_pgtable_page() one do leaking, it is still up to there air if your patch will catch it because it may not a single call site and it needs to leak significant amount of memory to be the greatest consumer where it is just not so realistic. For debugging point of view, IMO it is better to annotate this one alloc_pages() call when in doubt, so that kmemleak would catch it instead.
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
  2019-12-24 13:47       ` Qian Cai
@ 2019-12-25  9:29         ` Miles Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-25  9:29 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, Michal Hocko, linux-kernel, linux-mm,
	linux-mediatek, wsd_upstream

On Tue, 2019-12-24 at 08:47 -0500, Qian Cai wrote:
> 
> > On Dec 24, 2019, at 1:45 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> > 
> > We use kmemleak too, but a memory leakage which is caused by
> > alloc_pages() in a kernel device driver cannot be caught by kmemleak.
> > We have fought against this kind of real problems for a few years and 
> > find a way to make the debugging easier.
> > 
> > We currently have information during OOM: process Node, zone, swap, 
> > process (pid, rss, name), slab usage, and the backtrace, order, and
> > gfp flags of the OOM backtrace. 
> > We can tell many different types of OOM problems by the information
> > above except the alloc_pages() leakage.
> > 
> > The patch does work and save a lot of debugging time.
> > Could we consider the "greatest memory consumer" as another useful 
> > OOM information?
> 
> This is rather situational considering there are memory leaks here and there but it is not necessary that straightforward as a single place of greatest consumer.

Agreed, but having the greatest memory consumer information does no harm
here.
Maybe you can share some cases to me?

The greatest memory consumer provides a strong clue of of a memory
leakage.
I have seen some different types of OOM issues.

1. task leakage, we can observe these by the kernel_stack numbers

2. memory fragmentation, check the ZONE memory status and the allocation
order

3. kmalloc leakage, check the slab numbers and let's say the number
kamlloc-512 is abnormal,
and we can enable kmemleak, reproduce the issue. Most of the time, I saw
a single backtrace of that leak.
It's helpful to have the greatest memory consumer in this case.

4. vmalloc leakage, we have no vmalloc numbers now. And I saw a case
that we pass a large number
into vmalloc() in a fuzzing test and it causes OOM kernel panic.
It is hard to reproduce the issue and kmemleak can do little help here
because it is a OOM kernel panic.
That is the issue which inspires me to create this patch. We found the
root cause by this approach.

5. OOM due to out of normal memory (in 32bit kernel), we can check the
allocate flags and the
zone memory status. In this case, we can try to check the memory
allocations and see if they can
use highmem. Knowing the greatest memory consumer may or may not be
useful here.

6. OOM caused by 2 or more different backtraces. I saw this once and we
enable PAGE_OWNER and
get the complete information of memory usage and locate the root cause.
Again, knowing the greatest
memory consumer is still a help in this issue.

7. OOM cause by alloc_pages(). There are no existing useful information
for this issue. 
CONFIG_PAGE_OWNER is useful and we can do more base on
CONFIG_PAGE_OWNER. (this patch)

> 
> The other question is why the offensive drivers that use alloc_pages() repeatedly without using any object allocator? Do you have examples of this in drivers that could happen?

For example, we're implementing our iommu driver and there are many
alloc_pages() in drivers/iommu.
This approach helps us located some memory leakages in our
implementation.

Thanks again for your comments
It's Christmas now so I think we can discuss after the Christmas break?

I have posted the number of issues addressed by this approach (7 real
problems since 2019/5) 
I think this approach can help people. :)

  Miles

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-25  9:29         ` Miles Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-25  9:29 UTC (permalink / raw)
  To: Qian Cai
  Cc: Michal Hocko, wsd_upstream, linux-kernel, linux-mm,
	linux-mediatek, Andrew Morton

On Tue, 2019-12-24 at 08:47 -0500, Qian Cai wrote:
> 
> > On Dec 24, 2019, at 1:45 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> > 
> > We use kmemleak too, but a memory leakage which is caused by
> > alloc_pages() in a kernel device driver cannot be caught by kmemleak.
> > We have fought against this kind of real problems for a few years and 
> > find a way to make the debugging easier.
> > 
> > We currently have information during OOM: process Node, zone, swap, 
> > process (pid, rss, name), slab usage, and the backtrace, order, and
> > gfp flags of the OOM backtrace. 
> > We can tell many different types of OOM problems by the information
> > above except the alloc_pages() leakage.
> > 
> > The patch does work and save a lot of debugging time.
> > Could we consider the "greatest memory consumer" as another useful 
> > OOM information?
> 
> This is rather situational considering there are memory leaks here and there but it is not necessary that straightforward as a single place of greatest consumer.

Agreed, but having the greatest memory consumer information does no harm
here.
Maybe you can share some cases to me?

The greatest memory consumer provides a strong clue of of a memory
leakage.
I have seen some different types of OOM issues.

1. task leakage, we can observe these by the kernel_stack numbers

2. memory fragmentation, check the ZONE memory status and the allocation
order

3. kmalloc leakage, check the slab numbers and let's say the number
kamlloc-512 is abnormal,
and we can enable kmemleak, reproduce the issue. Most of the time, I saw
a single backtrace of that leak.
It's helpful to have the greatest memory consumer in this case.

4. vmalloc leakage, we have no vmalloc numbers now. And I saw a case
that we pass a large number
into vmalloc() in a fuzzing test and it causes OOM kernel panic.
It is hard to reproduce the issue and kmemleak can do little help here
because it is a OOM kernel panic.
That is the issue which inspires me to create this patch. We found the
root cause by this approach.

5. OOM due to out of normal memory (in 32bit kernel), we can check the
allocate flags and the
zone memory status. In this case, we can try to check the memory
allocations and see if they can
use highmem. Knowing the greatest memory consumer may or may not be
useful here.

6. OOM caused by 2 or more different backtraces. I saw this once and we
enable PAGE_OWNER and
get the complete information of memory usage and locate the root cause.
Again, knowing the greatest
memory consumer is still a help in this issue.

7. OOM cause by alloc_pages(). There are no existing useful information
for this issue. 
CONFIG_PAGE_OWNER is useful and we can do more base on
CONFIG_PAGE_OWNER. (this patch)

> 
> The other question is why the offensive drivers that use alloc_pages() repeatedly without using any object allocator? Do you have examples of this in drivers that could happen?

For example, we're implementing our iommu driver and there are many
alloc_pages() in drivers/iommu.
This approach helps us located some memory leakages in our
implementation.

Thanks again for your comments
It's Christmas now so I think we can discuss after the Christmas break?

I have posted the number of issues addressed by this approach (7 real
problems since 2019/5) 
I think this approach can help people. :)

  Miles

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
  2019-12-24  6:45     ` Miles Chen
@ 2019-12-24 13:47       ` Qian Cai
  -1 siblings, 0 replies; 26+ messages in thread
From: Qian Cai @ 2019-12-24 13:47 UTC (permalink / raw)
  To: Miles Chen
  Cc: Andrew Morton, Michal Hocko, linux-kernel, linux-mm,
	linux-mediatek, wsd_upstream



> On Dec 24, 2019, at 1:45 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> 
> We use kmemleak too, but a memory leakage which is caused by
> alloc_pages() in a kernel device driver cannot be caught by kmemleak.
> We have fought against this kind of real problems for a few years and 
> find a way to make the debugging easier.
> 
> We currently have information during OOM: process Node, zone, swap, 
> process (pid, rss, name), slab usage, and the backtrace, order, and
> gfp flags of the OOM backtrace. 
> We can tell many different types of OOM problems by the information
> above except the alloc_pages() leakage.
> 
> The patch does work and save a lot of debugging time.
> Could we consider the "greatest memory consumer" as another useful 
> OOM information?

This is rather situational considering there are memory leaks here and there but it is not necessary that straightforward as a single place of greatest consumer.

The other question is why the offensive drivers that use alloc_pages() repeatedly without using any object allocator? Do you have examples of this in drivers that could happen?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-24 13:47       ` Qian Cai
  0 siblings, 0 replies; 26+ messages in thread
From: Qian Cai @ 2019-12-24 13:47 UTC (permalink / raw)
  To: Miles Chen
  Cc: Michal Hocko, wsd_upstream, linux-kernel, linux-mm,
	linux-mediatek, Andrew Morton

> On Dec 24, 2019, at 1:45 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> 
> We use kmemleak too, but a memory leakage which is caused by
> alloc_pages() in a kernel device driver cannot be caught by kmemleak.
> We have fought against this kind of real problems for a few years and 
> find a way to make the debugging easier.
> 
> We currently have information during OOM: process Node, zone, swap, 
> process (pid, rss, name), slab usage, and the backtrace, order, and
> gfp flags of the OOM backtrace. 
> We can tell many different types of OOM problems by the information
> above except the alloc_pages() leakage.
> 
> The patch does work and save a lot of debugging time.
> Could we consider the "greatest memory consumer" as another useful 
> OOM information?

This is rather situational considering there are memory leaks here and there but it is not necessary that straightforward as a single place of greatest consumer.

The other question is why the offensive drivers that use alloc_pages() repeatedly without using any object allocator? Do you have examples of this in drivers that could happen?
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
  2019-12-23 12:32   ` Qian Cai
@ 2019-12-24  6:45     ` Miles Chen
  -1 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-24  6:45 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, Michal Hocko, linux-kernel, linux-mm,
	linux-mediatek, wsd_upstream

On Mon, 2019-12-23 at 07:32 -0500, Qian Cai wrote:
> 
> > On Dec 23, 2019, at 6:33 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> > 
> > Motivation:
> > -----------
> > 
> > When debug with a OOM kernel panic, it is difficult to know the
> > memory allocated by kernel drivers of vmalloc() by checking the
> > Mem-Info or Node/Zone info. For example:
> > 
> >  Mem-Info:
> >  active_anon:5144 inactive_anon:16120 isolated_anon:0
> >   active_file:0 inactive_file:0 isolated_file:0
> >   unevictable:0 dirty:0 writeback:0 unstable:0
> >   slab_reclaimable:739 slab_unreclaimable:442469
> >   mapped:534 shmem:21050 pagetables:21 bounce:0
> >   free:14808 free_pcp:3389 free_cma:8128
> > 
> >  Node 0 active_anon:20576kB inactive_anon:64480kB active_file:0kB
> >  inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> >  mapped:2136kB dirty:0kB writeback:0kB shmem:84200kB shmem_thp: 0kB
> >  shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
> >  all_unr eclaimable? yes
> > 
> >  Node 0 DMA free:14476kB min:21512kB low:26888kB high:32264kB
> >  reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> >  active_file: 0kB inactive_file:0kB unevictable:0kB writepending:0kB
> >  present:1048576kB managed:952736kB mlocked:0kB kernel_stack:0kB
> >  pagetables:0kB bounce:0kB free_pcp:2716kB local_pcp:0kB free_cma:0kB
> > 
> > The information above tells us the memory usage of the known memory
> > categories and we can check the abnormal large numbers. However, if a
> > memory leakage cannot be observed in the categories above, we need to
> > reproduce the issue with CONFIG_PAGE_OWNER.
> > 
> > It is possible to read the page owner information from coredump files.
> > However, coredump files may not always be available, so my approach is
> > to print out the largest page consumer when OOM kernel panic occurs.
> 
> Many of those patches helping debugging special cases had been shot down in the past. I don’t see much difference this time. If you worry about memory leak, enable kmemleak and then to reproduce. Otherwise, we will end up with too many heuristics just for debugging.
> 

Thanks for your comment.

We use kmemleak too, but a memory leakage which is caused by
alloc_pages() in a kernel device driver cannot be caught by kmemleak.
We have fought against this kind of real problems for a few years and 
find a way to make the debugging easier.

We currently have information during OOM: process Node, zone, swap, 
process (pid, rss, name), slab usage, and the backtrace, order, and
gfp flags of the OOM backtrace. 
We can tell many different types of OOM problems by the information
above except the alloc_pages() leakage.

The patch does work and save a lot of debugging time.
Could we consider the "greatest memory consumer" as another useful 
OOM information?


Miles
> > 
> > The heuristic approach assumes that the OOM kernel panic is caused by
> > a single backtrace. The assumption is not always true but it works in
> > many cases during our test.
> > 
> > We have tested this heuristic approach since 2019/5 on android devices.
> > In 38 internal OOM kernel panic reports:
> > 
> > 31/38: can be analyzed by using existing information
> > 7/38: need page owner formatino and the heuristic approach in this patch
> > prints the correct backtraces of abnormal memory allocations. No need to
> > reproduce the issues.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-24  6:45     ` Miles Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-24  6:45 UTC (permalink / raw)
  To: Qian Cai
  Cc: Michal Hocko, wsd_upstream, linux-kernel, linux-mm,
	linux-mediatek, Andrew Morton

On Mon, 2019-12-23 at 07:32 -0500, Qian Cai wrote:
> 
> > On Dec 23, 2019, at 6:33 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> > 
> > Motivation:
> > -----------
> > 
> > When debug with a OOM kernel panic, it is difficult to know the
> > memory allocated by kernel drivers of vmalloc() by checking the
> > Mem-Info or Node/Zone info. For example:
> > 
> >  Mem-Info:
> >  active_anon:5144 inactive_anon:16120 isolated_anon:0
> >   active_file:0 inactive_file:0 isolated_file:0
> >   unevictable:0 dirty:0 writeback:0 unstable:0
> >   slab_reclaimable:739 slab_unreclaimable:442469
> >   mapped:534 shmem:21050 pagetables:21 bounce:0
> >   free:14808 free_pcp:3389 free_cma:8128
> > 
> >  Node 0 active_anon:20576kB inactive_anon:64480kB active_file:0kB
> >  inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> >  mapped:2136kB dirty:0kB writeback:0kB shmem:84200kB shmem_thp: 0kB
> >  shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
> >  all_unr eclaimable? yes
> > 
> >  Node 0 DMA free:14476kB min:21512kB low:26888kB high:32264kB
> >  reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> >  active_file: 0kB inactive_file:0kB unevictable:0kB writepending:0kB
> >  present:1048576kB managed:952736kB mlocked:0kB kernel_stack:0kB
> >  pagetables:0kB bounce:0kB free_pcp:2716kB local_pcp:0kB free_cma:0kB
> > 
> > The information above tells us the memory usage of the known memory
> > categories and we can check the abnormal large numbers. However, if a
> > memory leakage cannot be observed in the categories above, we need to
> > reproduce the issue with CONFIG_PAGE_OWNER.
> > 
> > It is possible to read the page owner information from coredump files.
> > However, coredump files may not always be available, so my approach is
> > to print out the largest page consumer when OOM kernel panic occurs.
> 
> Many of those patches helping debugging special cases had been shot down in the past. I don’t see much difference this time. If you worry about memory leak, enable kmemleak and then to reproduce. Otherwise, we will end up with too many heuristics just for debugging.
> 

Thanks for your comment.

We use kmemleak too, but a memory leakage which is caused by
alloc_pages() in a kernel device driver cannot be caught by kmemleak.
We have fought against this kind of real problems for a few years and 
find a way to make the debugging easier.

We currently have information during OOM: process Node, zone, swap, 
process (pid, rss, name), slab usage, and the backtrace, order, and
gfp flags of the OOM backtrace. 
We can tell many different types of OOM problems by the information
above except the alloc_pages() leakage.

The patch does work and save a lot of debugging time.
Could we consider the "greatest memory consumer" as another useful 
OOM information?


Miles
> > 
> > The heuristic approach assumes that the OOM kernel panic is caused by
> > a single backtrace. The assumption is not always true but it works in
> > many cases during our test.
> > 
> > We have tested this heuristic approach since 2019/5 on android devices.
> > In 38 internal OOM kernel panic reports:
> > 
> > 31/38: can be analyzed by using existing information
> > 7/38: need page owner formatino and the heuristic approach in this patch
> > prints the correct backtraces of abnormal memory allocations. No need to
> > reproduce the issues.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
  2019-12-23 11:33 ` Miles Chen
@ 2019-12-23 12:32   ` Qian Cai
  -1 siblings, 0 replies; 26+ messages in thread
From: Qian Cai @ 2019-12-23 12:32 UTC (permalink / raw)
  To: Miles Chen
  Cc: Andrew Morton, Michal Hocko, linux-kernel, linux-mm,
	linux-mediatek, wsd_upstream



> On Dec 23, 2019, at 6:33 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> 
> Motivation:
> -----------
> 
> When debug with a OOM kernel panic, it is difficult to know the
> memory allocated by kernel drivers of vmalloc() by checking the
> Mem-Info or Node/Zone info. For example:
> 
>  Mem-Info:
>  active_anon:5144 inactive_anon:16120 isolated_anon:0
>   active_file:0 inactive_file:0 isolated_file:0
>   unevictable:0 dirty:0 writeback:0 unstable:0
>   slab_reclaimable:739 slab_unreclaimable:442469
>   mapped:534 shmem:21050 pagetables:21 bounce:0
>   free:14808 free_pcp:3389 free_cma:8128
> 
>  Node 0 active_anon:20576kB inactive_anon:64480kB active_file:0kB
>  inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
>  mapped:2136kB dirty:0kB writeback:0kB shmem:84200kB shmem_thp: 0kB
>  shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
>  all_unr eclaimable? yes
> 
>  Node 0 DMA free:14476kB min:21512kB low:26888kB high:32264kB
>  reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>  active_file: 0kB inactive_file:0kB unevictable:0kB writepending:0kB
>  present:1048576kB managed:952736kB mlocked:0kB kernel_stack:0kB
>  pagetables:0kB bounce:0kB free_pcp:2716kB local_pcp:0kB free_cma:0kB
> 
> The information above tells us the memory usage of the known memory
> categories and we can check the abnormal large numbers. However, if a
> memory leakage cannot be observed in the categories above, we need to
> reproduce the issue with CONFIG_PAGE_OWNER.
> 
> It is possible to read the page owner information from coredump files.
> However, coredump files may not always be available, so my approach is
> to print out the largest page consumer when OOM kernel panic occurs.

Many of those patches helping debugging special cases had been shot down in the past. I don’t see much difference this time. If you worry about memory leak, enable kmemleak and then to reproduce. Otherwise, we will end up with too many heuristics just for debugging.

> 
> The heuristic approach assumes that the OOM kernel panic is caused by
> a single backtrace. The assumption is not always true but it works in
> many cases during our test.
> 
> We have tested this heuristic approach since 2019/5 on android devices.
> In 38 internal OOM kernel panic reports:
> 
> 31/38: can be analyzed by using existing information
> 7/38: need page owner formatino and the heuristic approach in this patch
> prints the correct backtraces of abnormal memory allocations. No need to
> reproduce the issues.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-23 12:32   ` Qian Cai
  0 siblings, 0 replies; 26+ messages in thread
From: Qian Cai @ 2019-12-23 12:32 UTC (permalink / raw)
  To: Miles Chen
  Cc: Michal Hocko, wsd_upstream, linux-kernel, linux-mm,
	linux-mediatek, Andrew Morton



> On Dec 23, 2019, at 6:33 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> 
> Motivation:
> -----------
> 
> When debug with a OOM kernel panic, it is difficult to know the
> memory allocated by kernel drivers of vmalloc() by checking the
> Mem-Info or Node/Zone info. For example:
> 
>  Mem-Info:
>  active_anon:5144 inactive_anon:16120 isolated_anon:0
>   active_file:0 inactive_file:0 isolated_file:0
>   unevictable:0 dirty:0 writeback:0 unstable:0
>   slab_reclaimable:739 slab_unreclaimable:442469
>   mapped:534 shmem:21050 pagetables:21 bounce:0
>   free:14808 free_pcp:3389 free_cma:8128
> 
>  Node 0 active_anon:20576kB inactive_anon:64480kB active_file:0kB
>  inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
>  mapped:2136kB dirty:0kB writeback:0kB shmem:84200kB shmem_thp: 0kB
>  shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
>  all_unr eclaimable? yes
> 
>  Node 0 DMA free:14476kB min:21512kB low:26888kB high:32264kB
>  reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>  active_file: 0kB inactive_file:0kB unevictable:0kB writepending:0kB
>  present:1048576kB managed:952736kB mlocked:0kB kernel_stack:0kB
>  pagetables:0kB bounce:0kB free_pcp:2716kB local_pcp:0kB free_cma:0kB
> 
> The information above tells us the memory usage of the known memory
> categories and we can check the abnormal large numbers. However, if a
> memory leakage cannot be observed in the categories above, we need to
> reproduce the issue with CONFIG_PAGE_OWNER.
> 
> It is possible to read the page owner information from coredump files.
> However, coredump files may not always be available, so my approach is
> to print out the largest page consumer when OOM kernel panic occurs.

Many of those patches helping debugging special cases had been shot down in the past. I don’t see much difference this time. If you worry about memory leak, enable kmemleak and then to reproduce. Otherwise, we will end up with too many heuristics just for debugging.

> 
> The heuristic approach assumes that the OOM kernel panic is caused by
> a single backtrace. The assumption is not always true but it works in
> many cases during our test.
> 
> We have tested this heuristic approach since 2019/5 on android devices.
> In 38 internal OOM kernel panic reports:
> 
> 31/38: can be analyzed by using existing information
> 7/38: need page owner formatino and the heuristic approach in this patch
> prints the correct backtraces of abnormal memory allocations. No need to
> reproduce the issues.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-23 11:33 ` Miles Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-23 11:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, linux-kernel, linux-mm, linux-mediatek,
	wsd_upstream, Miles Chen

Motivation:
-----------

When debug with a OOM kernel panic, it is difficult to know the
memory allocated by kernel drivers of vmalloc() by checking the
Mem-Info or Node/Zone info. For example:

  Mem-Info:
  active_anon:5144 inactive_anon:16120 isolated_anon:0
   active_file:0 inactive_file:0 isolated_file:0
   unevictable:0 dirty:0 writeback:0 unstable:0
   slab_reclaimable:739 slab_unreclaimable:442469
   mapped:534 shmem:21050 pagetables:21 bounce:0
   free:14808 free_pcp:3389 free_cma:8128

  Node 0 active_anon:20576kB inactive_anon:64480kB active_file:0kB
  inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
  mapped:2136kB dirty:0kB writeback:0kB shmem:84200kB shmem_thp: 0kB
  shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
  all_unr eclaimable? yes

  Node 0 DMA free:14476kB min:21512kB low:26888kB high:32264kB
  reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
  active_file: 0kB inactive_file:0kB unevictable:0kB writepending:0kB
  present:1048576kB managed:952736kB mlocked:0kB kernel_stack:0kB
  pagetables:0kB bounce:0kB free_pcp:2716kB local_pcp:0kB free_cma:0kB

The information above tells us the memory usage of the known memory
categories and we can check the abnormal large numbers. However, if a
memory leakage cannot be observed in the categories above, we need to
reproduce the issue with CONFIG_PAGE_OWNER.

It is possible to read the page owner information from coredump files.
However, coredump files may not always be available, so my approach is
to print out the largest page consumer when OOM kernel panic occurs.

The heuristic approach assumes that the OOM kernel panic is caused by
a single backtrace. The assumption is not always true but it works in
many cases during our test.

We have tested this heuristic approach since 2019/5 on android devices.
In 38 internal OOM kernel panic reports:

31/38: can be analyzed by using existing information
7/38: need page owner formatino and the heuristic approach in this patch
prints the correct backtraces of abnormal memory allocations. No need to
reproduce the issues.

Output:
-------

This output below is generated by a dummy infinite
kmalloc(256, GFP_KERNEL) loop:

[   49.691027] OOM: largest memory consumer: 428468 pages are allocated from:
[   49.691278]  prep_new_page+0x198/0x19c
[   49.691390]  get_page_from_freelist+0x1cb4/0x1e54
[   49.691500]  __alloc_pages_nodemask+0x16c/0xe10
[   49.691599]  alloc_pages_current+0x104/0x190
[   49.691697]  alloc_slab_page+0x160/0x4e8
[   49.691782]  new_slab+0xb8/0x510
[   49.691866]  ___slab_alloc+0x294/0x3dc
[   49.691957]  kmem_cache_alloc+0x1f0/0x250
[   49.692047]  meminfo_proc_show+0x68/0x8fc
[   49.692135]  seq_read+0x1dc/0x47c
[   49.692217]  proc_reg_read+0x5c/0xb4
[   49.692303]  do_iter_read+0xdc/0x1c0
[   49.692389]  vfs_readv+0x60/0xa8
[   49.692471]  default_file_splice_read+0x1f0/0x304
[   49.692582]  splice_direct_to_actor+0x100/0x294
[   49.692679]  do_splice_direct+0x78/0xc8
[   39.328607] Kernel panic - not syncing: System is deadlocked on memory

Signed-off-by: Miles Chen <miles.chen@mediatek.com>
---
 include/linux/oom.h |   1 +
 mm/oom_kill.c       |   4 ++
 mm/page_owner.c     | 135 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 140 insertions(+)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index c696c265f019..fe3c923ac8f3 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -121,6 +121,7 @@ extern bool oom_killer_disable(signed long timeout);
 extern void oom_killer_enable(void);
 
 extern struct task_struct *find_lock_task_mm(struct task_struct *p);
+extern void report_largest_page_consumer(void);
 
 /* sysctls */
 extern int sysctl_oom_dump_tasks;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 71e3acea7817..9b069b5a4aff 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -42,6 +42,7 @@
 #include <linux/kthread.h>
 #include <linux/init.h>
 #include <linux/mmu_notifier.h>
+#include <linux/once.h>
 
 #include <asm/tlb.h>
 #include "internal.h"
@@ -1099,6 +1100,9 @@ bool out_of_memory(struct oom_control *oc)
 	if (!oc->chosen) {
 		dump_header(oc, NULL);
 		pr_warn("Out of memory and no killable processes...\n");
+#ifdef CONFIG_PAGE_OWNER
+		DO_ONCE(report_largest_page_consumer);
+#endif
 		/*
 		 * If we got here due to an actual allocation at the
 		 * system level, we cannot survive this and will enter
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 18ecde9f45b2..b23e5fe35dad 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -10,6 +10,8 @@
 #include <linux/migrate.h>
 #include <linux/stackdepot.h>
 #include <linux/seq_file.h>
+#include <linux/stacktrace.h>
+#include <linux/hashtable.h>
 
 #include "internal.h"
 
@@ -19,12 +21,16 @@
  */
 #define PAGE_OWNER_STACK_DEPTH (16)
 
+#define OOM_HANDLE_HASH_BITS	10
+
 struct page_owner {
 	unsigned short order;
 	short last_migrate_reason;
 	gfp_t gfp_mask;
 	depot_stack_handle_t handle;
 	depot_stack_handle_t free_handle;
+	struct hlist_node node;
+	unsigned long page_count; /* number of pages points to this handle */
 };
 
 static bool page_owner_enabled = false;
@@ -33,6 +39,8 @@ DEFINE_STATIC_KEY_FALSE(page_owner_inited);
 static depot_stack_handle_t dummy_handle;
 static depot_stack_handle_t failure_handle;
 static depot_stack_handle_t early_handle;
+static DEFINE_HASHTABLE(oom_handle_hash, OOM_HANDLE_HASH_BITS);
+static struct page_owner *most_referenced_page_owner;
 
 static void init_early_allocated_pages(void);
 
@@ -48,6 +56,57 @@ static int __init early_page_owner_param(char *buf)
 }
 early_param("page_owner", early_page_owner_param);
 
+static struct hlist_head *get_bucket(depot_stack_handle_t handle)
+{
+	unsigned long hash;
+
+	hash = hash_long(handle, OOM_HANDLE_HASH_BITS);
+	return &oom_handle_hash[hash];
+}
+
+/*
+ * lookup a page_owner in the hash bucket
+ */
+static struct page_owner *lookup_page_owner(depot_stack_handle_t handle,
+						struct hlist_head *b)
+{
+	struct page_owner *page_owner;
+
+	hlist_for_each_entry(page_owner, b, node) {
+		if (page_owner->handle == handle)
+			return page_owner;
+	}
+
+	return NULL;
+}
+
+/*
+ * Increase the page_owner->page_count in the handle_hash by (1 << order)
+ */
+static void increase_handle_count(struct page_owner *page_owner)
+{
+	struct hlist_head *bucket;
+	struct page_owner *owner;
+
+	bucket = get_bucket(page_owner->handle);
+
+	owner = lookup_page_owner(page_owner->handle, bucket);
+
+	if (!owner) {
+		owner = page_owner;
+		hlist_add_head(&page_owner->node, bucket);
+	}
+
+	/* increase page counter */
+	owner->page_count += (1 << owner->order);
+
+	/* update most_referenced_page_owner */
+	if (!most_referenced_page_owner)
+		most_referenced_page_owner = owner;
+	if (most_referenced_page_owner->page_count < owner->page_count)
+		most_referenced_page_owner = owner;
+}
+
 static bool need_page_owner(void)
 {
 	return page_owner_enabled;
@@ -172,6 +231,7 @@ static inline void __set_page_owner_handle(struct page *page,
 		page_owner->order = order;
 		page_owner->gfp_mask = gfp_mask;
 		page_owner->last_migrate_reason = -1;
+		page_owner->page_count = 0;
 		__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
 		__set_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
 
@@ -216,6 +276,7 @@ void __split_page_owner(struct page *page, unsigned int order)
 	for (i = 0; i < (1 << order); i++) {
 		page_owner = get_page_owner(page_ext);
 		page_owner->order = 0;
+		page_owner->page_count = 0;
 		page_ext = page_ext_next(page_ext);
 	}
 }
@@ -236,6 +297,7 @@ void __copy_page_owner(struct page *oldpage, struct page *newpage)
 	new_page_owner->last_migrate_reason =
 		old_page_owner->last_migrate_reason;
 	new_page_owner->handle = old_page_owner->handle;
+	new_page_owner->page_count = new_page_owner->page_count;
 
 	/*
 	 * We don't clear the bit on the oldpage as it's going to be freed
@@ -615,6 +677,79 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
 		pgdat->node_id, zone->name, count);
 }
 
+static void __report_largest_page_consumer(struct page_owner *page_owner)
+{
+	unsigned long *entries = NULL;
+	unsigned int nr_entries;
+
+	nr_entries = stack_depot_fetch(page_owner->handle, &entries);
+	pr_info("OOM: largest memory consumer: %lu pages are allocated from:\n",
+			page_owner->page_count);
+	stack_trace_print(entries, nr_entries, 0);
+}
+
+void report_largest_page_consumer(void)
+{
+	unsigned long pfn;
+	struct page *page;
+	struct page_ext *page_ext;
+	struct page_owner *page_owner;
+	depot_stack_handle_t handle;
+
+	pfn = min_low_pfn;
+
+	if (!static_branch_unlikely(&page_owner_inited))
+		return;
+
+	/* Find a valid PFN or the start of a MAX_ORDER_NR_PAGES area */
+	while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0)
+		pfn++;
+
+	/* Find an allocated page */
+	for (; pfn < max_pfn; pfn++) {
+		if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && !pfn_valid(pfn)) {
+			pfn += MAX_ORDER_NR_PAGES - 1;
+			continue;
+		}
+
+		if (!pfn_valid_within(pfn))
+			continue;
+
+		page = pfn_to_page(pfn);
+		if (PageBuddy(page)) {
+			unsigned long freepage_order = page_order_unsafe(page);
+
+			if (freepage_order < MAX_ORDER)
+				pfn += (1UL << freepage_order) - 1;
+			continue;
+		}
+
+		if (PageReserved(page))
+			continue;
+
+		page_ext = lookup_page_ext(page);
+		if (unlikely(!page_ext))
+			continue;
+
+		if (!test_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags))
+			continue;
+
+		page_owner = get_page_owner(page_ext);
+
+		if (!IS_ALIGNED(pfn, 1 << page_owner->order))
+			continue;
+
+		handle = READ_ONCE(page_owner->handle);
+		if (!handle)
+			continue;
+
+		increase_handle_count(page_owner);
+	}
+
+	__report_largest_page_consumer(most_referenced_page_owner);
+}
+
+
 static void init_zones_in_node(pg_data_t *pgdat)
 {
 	struct zone *zone;
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
@ 2019-12-23 11:33 ` Miles Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Miles Chen @ 2019-12-23 11:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, wsd_upstream, linux-kernel, linux-mm, Miles Chen,
	linux-mediatek

Motivation:
-----------

When debug with a OOM kernel panic, it is difficult to know the
memory allocated by kernel drivers of vmalloc() by checking the
Mem-Info or Node/Zone info. For example:

  Mem-Info:
  active_anon:5144 inactive_anon:16120 isolated_anon:0
   active_file:0 inactive_file:0 isolated_file:0
   unevictable:0 dirty:0 writeback:0 unstable:0
   slab_reclaimable:739 slab_unreclaimable:442469
   mapped:534 shmem:21050 pagetables:21 bounce:0
   free:14808 free_pcp:3389 free_cma:8128

  Node 0 active_anon:20576kB inactive_anon:64480kB active_file:0kB
  inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
  mapped:2136kB dirty:0kB writeback:0kB shmem:84200kB shmem_thp: 0kB
  shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
  all_unr eclaimable? yes

  Node 0 DMA free:14476kB min:21512kB low:26888kB high:32264kB
  reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
  active_file: 0kB inactive_file:0kB unevictable:0kB writepending:0kB
  present:1048576kB managed:952736kB mlocked:0kB kernel_stack:0kB
  pagetables:0kB bounce:0kB free_pcp:2716kB local_pcp:0kB free_cma:0kB

The information above tells us the memory usage of the known memory
categories and we can check the abnormal large numbers. However, if a
memory leakage cannot be observed in the categories above, we need to
reproduce the issue with CONFIG_PAGE_OWNER.

It is possible to read the page owner information from coredump files.
However, coredump files may not always be available, so my approach is
to print out the largest page consumer when OOM kernel panic occurs.

The heuristic approach assumes that the OOM kernel panic is caused by
a single backtrace. The assumption is not always true but it works in
many cases during our test.

We have tested this heuristic approach since 2019/5 on android devices.
In 38 internal OOM kernel panic reports:

31/38: can be analyzed by using existing information
7/38: need page owner formatino and the heuristic approach in this patch
prints the correct backtraces of abnormal memory allocations. No need to
reproduce the issues.

Output:
-------

This output below is generated by a dummy infinite
kmalloc(256, GFP_KERNEL) loop:

[   49.691027] OOM: largest memory consumer: 428468 pages are allocated from:
[   49.691278]  prep_new_page+0x198/0x19c
[   49.691390]  get_page_from_freelist+0x1cb4/0x1e54
[   49.691500]  __alloc_pages_nodemask+0x16c/0xe10
[   49.691599]  alloc_pages_current+0x104/0x190
[   49.691697]  alloc_slab_page+0x160/0x4e8
[   49.691782]  new_slab+0xb8/0x510
[   49.691866]  ___slab_alloc+0x294/0x3dc
[   49.691957]  kmem_cache_alloc+0x1f0/0x250
[   49.692047]  meminfo_proc_show+0x68/0x8fc
[   49.692135]  seq_read+0x1dc/0x47c
[   49.692217]  proc_reg_read+0x5c/0xb4
[   49.692303]  do_iter_read+0xdc/0x1c0
[   49.692389]  vfs_readv+0x60/0xa8
[   49.692471]  default_file_splice_read+0x1f0/0x304
[   49.692582]  splice_direct_to_actor+0x100/0x294
[   49.692679]  do_splice_direct+0x78/0xc8
[   39.328607] Kernel panic - not syncing: System is deadlocked on memory

Signed-off-by: Miles Chen <miles.chen@mediatek.com>
---
 include/linux/oom.h |   1 +
 mm/oom_kill.c       |   4 ++
 mm/page_owner.c     | 135 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 140 insertions(+)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index c696c265f019..fe3c923ac8f3 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -121,6 +121,7 @@ extern bool oom_killer_disable(signed long timeout);
 extern void oom_killer_enable(void);
 
 extern struct task_struct *find_lock_task_mm(struct task_struct *p);
+extern void report_largest_page_consumer(void);
 
 /* sysctls */
 extern int sysctl_oom_dump_tasks;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 71e3acea7817..9b069b5a4aff 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -42,6 +42,7 @@
 #include <linux/kthread.h>
 #include <linux/init.h>
 #include <linux/mmu_notifier.h>
+#include <linux/once.h>
 
 #include <asm/tlb.h>
 #include "internal.h"
@@ -1099,6 +1100,9 @@ bool out_of_memory(struct oom_control *oc)
 	if (!oc->chosen) {
 		dump_header(oc, NULL);
 		pr_warn("Out of memory and no killable processes...\n");
+#ifdef CONFIG_PAGE_OWNER
+		DO_ONCE(report_largest_page_consumer);
+#endif
 		/*
 		 * If we got here due to an actual allocation at the
 		 * system level, we cannot survive this and will enter
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 18ecde9f45b2..b23e5fe35dad 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -10,6 +10,8 @@
 #include <linux/migrate.h>
 #include <linux/stackdepot.h>
 #include <linux/seq_file.h>
+#include <linux/stacktrace.h>
+#include <linux/hashtable.h>
 
 #include "internal.h"
 
@@ -19,12 +21,16 @@
  */
 #define PAGE_OWNER_STACK_DEPTH (16)
 
+#define OOM_HANDLE_HASH_BITS	10
+
 struct page_owner {
 	unsigned short order;
 	short last_migrate_reason;
 	gfp_t gfp_mask;
 	depot_stack_handle_t handle;
 	depot_stack_handle_t free_handle;
+	struct hlist_node node;
+	unsigned long page_count; /* number of pages points to this handle */
 };
 
 static bool page_owner_enabled = false;
@@ -33,6 +39,8 @@ DEFINE_STATIC_KEY_FALSE(page_owner_inited);
 static depot_stack_handle_t dummy_handle;
 static depot_stack_handle_t failure_handle;
 static depot_stack_handle_t early_handle;
+static DEFINE_HASHTABLE(oom_handle_hash, OOM_HANDLE_HASH_BITS);
+static struct page_owner *most_referenced_page_owner;
 
 static void init_early_allocated_pages(void);
 
@@ -48,6 +56,57 @@ static int __init early_page_owner_param(char *buf)
 }
 early_param("page_owner", early_page_owner_param);
 
+static struct hlist_head *get_bucket(depot_stack_handle_t handle)
+{
+	unsigned long hash;
+
+	hash = hash_long(handle, OOM_HANDLE_HASH_BITS);
+	return &oom_handle_hash[hash];
+}
+
+/*
+ * lookup a page_owner in the hash bucket
+ */
+static struct page_owner *lookup_page_owner(depot_stack_handle_t handle,
+						struct hlist_head *b)
+{
+	struct page_owner *page_owner;
+
+	hlist_for_each_entry(page_owner, b, node) {
+		if (page_owner->handle == handle)
+			return page_owner;
+	}
+
+	return NULL;
+}
+
+/*
+ * Increase the page_owner->page_count in the handle_hash by (1 << order)
+ */
+static void increase_handle_count(struct page_owner *page_owner)
+{
+	struct hlist_head *bucket;
+	struct page_owner *owner;
+
+	bucket = get_bucket(page_owner->handle);
+
+	owner = lookup_page_owner(page_owner->handle, bucket);
+
+	if (!owner) {
+		owner = page_owner;
+		hlist_add_head(&page_owner->node, bucket);
+	}
+
+	/* increase page counter */
+	owner->page_count += (1 << owner->order);
+
+	/* update most_referenced_page_owner */
+	if (!most_referenced_page_owner)
+		most_referenced_page_owner = owner;
+	if (most_referenced_page_owner->page_count < owner->page_count)
+		most_referenced_page_owner = owner;
+}
+
 static bool need_page_owner(void)
 {
 	return page_owner_enabled;
@@ -172,6 +231,7 @@ static inline void __set_page_owner_handle(struct page *page,
 		page_owner->order = order;
 		page_owner->gfp_mask = gfp_mask;
 		page_owner->last_migrate_reason = -1;
+		page_owner->page_count = 0;
 		__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
 		__set_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
 
@@ -216,6 +276,7 @@ void __split_page_owner(struct page *page, unsigned int order)
 	for (i = 0; i < (1 << order); i++) {
 		page_owner = get_page_owner(page_ext);
 		page_owner->order = 0;
+		page_owner->page_count = 0;
 		page_ext = page_ext_next(page_ext);
 	}
 }
@@ -236,6 +297,7 @@ void __copy_page_owner(struct page *oldpage, struct page *newpage)
 	new_page_owner->last_migrate_reason =
 		old_page_owner->last_migrate_reason;
 	new_page_owner->handle = old_page_owner->handle;
+	new_page_owner->page_count = new_page_owner->page_count;
 
 	/*
 	 * We don't clear the bit on the oldpage as it's going to be freed
@@ -615,6 +677,79 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
 		pgdat->node_id, zone->name, count);
 }
 
+static void __report_largest_page_consumer(struct page_owner *page_owner)
+{
+	unsigned long *entries = NULL;
+	unsigned int nr_entries;
+
+	nr_entries = stack_depot_fetch(page_owner->handle, &entries);
+	pr_info("OOM: largest memory consumer: %lu pages are allocated from:\n",
+			page_owner->page_count);
+	stack_trace_print(entries, nr_entries, 0);
+}
+
+void report_largest_page_consumer(void)
+{
+	unsigned long pfn;
+	struct page *page;
+	struct page_ext *page_ext;
+	struct page_owner *page_owner;
+	depot_stack_handle_t handle;
+
+	pfn = min_low_pfn;
+
+	if (!static_branch_unlikely(&page_owner_inited))
+		return;
+
+	/* Find a valid PFN or the start of a MAX_ORDER_NR_PAGES area */
+	while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0)
+		pfn++;
+
+	/* Find an allocated page */
+	for (; pfn < max_pfn; pfn++) {
+		if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && !pfn_valid(pfn)) {
+			pfn += MAX_ORDER_NR_PAGES - 1;
+			continue;
+		}
+
+		if (!pfn_valid_within(pfn))
+			continue;
+
+		page = pfn_to_page(pfn);
+		if (PageBuddy(page)) {
+			unsigned long freepage_order = page_order_unsafe(page);
+
+			if (freepage_order < MAX_ORDER)
+				pfn += (1UL << freepage_order) - 1;
+			continue;
+		}
+
+		if (PageReserved(page))
+			continue;
+
+		page_ext = lookup_page_ext(page);
+		if (unlikely(!page_ext))
+			continue;
+
+		if (!test_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags))
+			continue;
+
+		page_owner = get_page_owner(page_ext);
+
+		if (!IS_ALIGNED(pfn, 1 << page_owner->order))
+			continue;
+
+		handle = READ_ONCE(page_owner->handle);
+		if (!handle)
+			continue;
+
+		increase_handle_count(page_owner);
+	}
+
+	__report_largest_page_consumer(most_referenced_page_owner);
+}
+
+
 static void init_zones_in_node(pg_data_t *pgdat)
 {
 	struct zone *zone;
-- 
2.18.0
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-12-30  3:39 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1806FE86&#45;9508&#45;43BC&#45;8E2F&#45;3620CD243B14@lca.pw>
2019-12-26  4:01 ` [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs Miles Chen
2019-12-26  4:01   ` Miles Chen
2019-12-26  5:53   ` Qian Cai
2019-12-26  5:53     ` Qian Cai
2019-12-27  7:44     ` Miles Chen
2019-12-27  7:44       ` Miles Chen
2019-12-27 13:46       ` Qian Cai
2019-12-27 13:46         ` Qian Cai
2019-12-30  1:30         ` Miles Chen
2019-12-30  1:30           ` Miles Chen
2019-12-30  1:51           ` Qian Cai
2019-12-30  1:51             ` Qian Cai
2019-12-30  3:28             ` Miles Chen
2019-12-30  3:28               ` Miles Chen
2019-12-23 11:33 Miles Chen
2019-12-23 11:33 ` Miles Chen
2019-12-23 12:32 ` Qian Cai
2019-12-23 12:32   ` Qian Cai
2019-12-24  6:45   ` Miles Chen
2019-12-24  6:45     ` Miles Chen
2019-12-24 13:47     ` Qian Cai
2019-12-24 13:47       ` Qian Cai
2019-12-25  9:29       ` Miles Chen
2019-12-25  9:29         ` Miles Chen
2019-12-25 13:53         ` Qian Cai
2019-12-25 13:53           ` Qian Cai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.