All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miles Chen <miles.chen@mediatek.com>
To: Qian Cai <cai@lca.pw>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>, <linux-mediatek@lists.infradead.org>,
	<wsd_upstream@mediatek.com>
Subject: Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
Date: Tue, 24 Dec 2019 14:45:46 +0800	[thread overview]
Message-ID: <1577169946.4959.4.camel@mtkswgap22> (raw)
In-Reply-To: <2B938D94-FFBB-4A3D-AD07-D7D04A4D4161@lca.pw>

On Mon, 2019-12-23 at 07:32 -0500, Qian Cai wrote:
> 
> > On Dec 23, 2019, at 6:33 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> > 
> > Motivation:
> > -----------
> > 
> > When debug with a OOM kernel panic, it is difficult to know the
> > memory allocated by kernel drivers of vmalloc() by checking the
> > Mem-Info or Node/Zone info. For example:
> > 
> >  Mem-Info:
> >  active_anon:5144 inactive_anon:16120 isolated_anon:0
> >   active_file:0 inactive_file:0 isolated_file:0
> >   unevictable:0 dirty:0 writeback:0 unstable:0
> >   slab_reclaimable:739 slab_unreclaimable:442469
> >   mapped:534 shmem:21050 pagetables:21 bounce:0
> >   free:14808 free_pcp:3389 free_cma:8128
> > 
> >  Node 0 active_anon:20576kB inactive_anon:64480kB active_file:0kB
> >  inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> >  mapped:2136kB dirty:0kB writeback:0kB shmem:84200kB shmem_thp: 0kB
> >  shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
> >  all_unr eclaimable? yes
> > 
> >  Node 0 DMA free:14476kB min:21512kB low:26888kB high:32264kB
> >  reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> >  active_file: 0kB inactive_file:0kB unevictable:0kB writepending:0kB
> >  present:1048576kB managed:952736kB mlocked:0kB kernel_stack:0kB
> >  pagetables:0kB bounce:0kB free_pcp:2716kB local_pcp:0kB free_cma:0kB
> > 
> > The information above tells us the memory usage of the known memory
> > categories and we can check the abnormal large numbers. However, if a
> > memory leakage cannot be observed in the categories above, we need to
> > reproduce the issue with CONFIG_PAGE_OWNER.
> > 
> > It is possible to read the page owner information from coredump files.
> > However, coredump files may not always be available, so my approach is
> > to print out the largest page consumer when OOM kernel panic occurs.
> 
> Many of those patches helping debugging special cases had been shot down in the past. I don’t see much difference this time. If you worry about memory leak, enable kmemleak and then to reproduce. Otherwise, we will end up with too many heuristics just for debugging.
> 

Thanks for your comment.

We use kmemleak too, but a memory leakage which is caused by
alloc_pages() in a kernel device driver cannot be caught by kmemleak.
We have fought against this kind of real problems for a few years and 
find a way to make the debugging easier.

We currently have information during OOM: process Node, zone, swap, 
process (pid, rss, name), slab usage, and the backtrace, order, and
gfp flags of the OOM backtrace. 
We can tell many different types of OOM problems by the information
above except the alloc_pages() leakage.

The patch does work and save a lot of debugging time.
Could we consider the "greatest memory consumer" as another useful 
OOM information?


Miles
> > 
> > The heuristic approach assumes that the OOM kernel panic is caused by
> > a single backtrace. The assumption is not always true but it works in
> > many cases during our test.
> > 
> > We have tested this heuristic approach since 2019/5 on android devices.
> > In 38 internal OOM kernel panic reports:
> > 
> > 31/38: can be analyzed by using existing information
> > 7/38: need page owner formatino and the heuristic approach in this patch
> > prints the correct backtraces of abnormal memory allocations. No need to
> > reproduce the issues.


WARNING: multiple messages have this Message-ID (diff)
From: Miles Chen <miles.chen@mediatek.com>
To: Qian Cai <cai@lca.pw>
Cc: Michal Hocko <mhocko@suse.com>,
	wsd_upstream@mediatek.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-mediatek@lists.infradead.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs
Date: Tue, 24 Dec 2019 14:45:46 +0800	[thread overview]
Message-ID: <1577169946.4959.4.camel@mtkswgap22> (raw)
In-Reply-To: <2B938D94-FFBB-4A3D-AD07-D7D04A4D4161@lca.pw>

On Mon, 2019-12-23 at 07:32 -0500, Qian Cai wrote:
> 
> > On Dec 23, 2019, at 6:33 AM, Miles Chen <miles.chen@mediatek.com> wrote:
> > 
> > Motivation:
> > -----------
> > 
> > When debug with a OOM kernel panic, it is difficult to know the
> > memory allocated by kernel drivers of vmalloc() by checking the
> > Mem-Info or Node/Zone info. For example:
> > 
> >  Mem-Info:
> >  active_anon:5144 inactive_anon:16120 isolated_anon:0
> >   active_file:0 inactive_file:0 isolated_file:0
> >   unevictable:0 dirty:0 writeback:0 unstable:0
> >   slab_reclaimable:739 slab_unreclaimable:442469
> >   mapped:534 shmem:21050 pagetables:21 bounce:0
> >   free:14808 free_pcp:3389 free_cma:8128
> > 
> >  Node 0 active_anon:20576kB inactive_anon:64480kB active_file:0kB
> >  inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> >  mapped:2136kB dirty:0kB writeback:0kB shmem:84200kB shmem_thp: 0kB
> >  shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
> >  all_unr eclaimable? yes
> > 
> >  Node 0 DMA free:14476kB min:21512kB low:26888kB high:32264kB
> >  reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> >  active_file: 0kB inactive_file:0kB unevictable:0kB writepending:0kB
> >  present:1048576kB managed:952736kB mlocked:0kB kernel_stack:0kB
> >  pagetables:0kB bounce:0kB free_pcp:2716kB local_pcp:0kB free_cma:0kB
> > 
> > The information above tells us the memory usage of the known memory
> > categories and we can check the abnormal large numbers. However, if a
> > memory leakage cannot be observed in the categories above, we need to
> > reproduce the issue with CONFIG_PAGE_OWNER.
> > 
> > It is possible to read the page owner information from coredump files.
> > However, coredump files may not always be available, so my approach is
> > to print out the largest page consumer when OOM kernel panic occurs.
> 
> Many of those patches helping debugging special cases had been shot down in the past. I don’t see much difference this time. If you worry about memory leak, enable kmemleak and then to reproduce. Otherwise, we will end up with too many heuristics just for debugging.
> 

Thanks for your comment.

We use kmemleak too, but a memory leakage which is caused by
alloc_pages() in a kernel device driver cannot be caught by kmemleak.
We have fought against this kind of real problems for a few years and 
find a way to make the debugging easier.

We currently have information during OOM: process Node, zone, swap, 
process (pid, rss, name), slab usage, and the backtrace, order, and
gfp flags of the OOM backtrace. 
We can tell many different types of OOM problems by the information
above except the alloc_pages() leakage.

The patch does work and save a lot of debugging time.
Could we consider the "greatest memory consumer" as another useful 
OOM information?


Miles
> > 
> > The heuristic approach assumes that the OOM kernel panic is caused by
> > a single backtrace. The assumption is not always true but it works in
> > many cases during our test.
> > 
> > We have tested this heuristic approach since 2019/5 on android devices.
> > In 38 internal OOM kernel panic reports:
> > 
> > 31/38: can be analyzed by using existing information
> > 7/38: need page owner formatino and the heuristic approach in this patch
> > prints the correct backtraces of abnormal memory allocations. No need to
> > reproduce the issues.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

  reply	other threads:[~2019-12-24  6:45 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-23 11:33 [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs Miles Chen
2019-12-23 11:33 ` Miles Chen
2019-12-23 12:32 ` Qian Cai
2019-12-23 12:32   ` Qian Cai
2019-12-24  6:45   ` Miles Chen [this message]
2019-12-24  6:45     ` Miles Chen
2019-12-24 13:47     ` Qian Cai
2019-12-24 13:47       ` Qian Cai
2019-12-25  9:29       ` Miles Chen
2019-12-25  9:29         ` Miles Chen
2019-12-25 13:53         ` Qian Cai
2019-12-25 13:53           ` Qian Cai
     [not found] <1806FE86&#45;9508&#45;43BC&#45;8E2F&#45;3620CD243B14@lca.pw>
2019-12-26  4:01 ` Miles Chen
2019-12-26  4:01   ` Miles Chen
2019-12-26  5:53   ` Qian Cai
2019-12-26  5:53     ` Qian Cai
2019-12-27  7:44     ` Miles Chen
2019-12-27  7:44       ` Miles Chen
2019-12-27 13:46       ` Qian Cai
2019-12-27 13:46         ` Qian Cai
2019-12-30  1:30         ` Miles Chen
2019-12-30  1:30           ` Miles Chen
2019-12-30  1:51           ` Qian Cai
2019-12-30  1:51             ` Qian Cai
2019-12-30  3:28             ` Miles Chen
2019-12-30  3:28               ` Miles Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1577169946.4959.4.camel@mtkswgap22 \
    --to=miles.chen@mediatek.com \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=wsd_upstream@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.