From: Miles Chen <miles.chen@mediatek.com> To: Qian Cai <cai@lca.pw> Cc: Andrew Morton <akpm@linux-foundation.org>, Michal Hocko <mhocko@suse.com>, <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>, <linux-mediatek@lists.infradead.org>, <wsd_upstream@mediatek.com> Subject: Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs Date: Tue, 24 Dec 2019 14:45:46 +0800 [thread overview] Message-ID: <1577169946.4959.4.camel@mtkswgap22> (raw) In-Reply-To: <2B938D94-FFBB-4A3D-AD07-D7D04A4D4161@lca.pw> On Mon, 2019-12-23 at 07:32 -0500, Qian Cai wrote: > > > On Dec 23, 2019, at 6:33 AM, Miles Chen <miles.chen@mediatek.com> wrote: > > > > Motivation: > > ----------- > > > > When debug with a OOM kernel panic, it is difficult to know the > > memory allocated by kernel drivers of vmalloc() by checking the > > Mem-Info or Node/Zone info. For example: > > > > Mem-Info: > > active_anon:5144 inactive_anon:16120 isolated_anon:0 > > active_file:0 inactive_file:0 isolated_file:0 > > unevictable:0 dirty:0 writeback:0 unstable:0 > > slab_reclaimable:739 slab_unreclaimable:442469 > > mapped:534 shmem:21050 pagetables:21 bounce:0 > > free:14808 free_pcp:3389 free_cma:8128 > > > > Node 0 active_anon:20576kB inactive_anon:64480kB active_file:0kB > > inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB > > mapped:2136kB dirty:0kB writeback:0kB shmem:84200kB shmem_thp: 0kB > > shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB > > all_unr eclaimable? yes > > > > Node 0 DMA free:14476kB min:21512kB low:26888kB high:32264kB > > reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB > > active_file: 0kB inactive_file:0kB unevictable:0kB writepending:0kB > > present:1048576kB managed:952736kB mlocked:0kB kernel_stack:0kB > > pagetables:0kB bounce:0kB free_pcp:2716kB local_pcp:0kB free_cma:0kB > > > > The information above tells us the memory usage of the known memory > > categories and we can check the abnormal large numbers. However, if a > > memory leakage cannot be observed in the categories above, we need to > > reproduce the issue with CONFIG_PAGE_OWNER. > > > > It is possible to read the page owner information from coredump files. > > However, coredump files may not always be available, so my approach is > > to print out the largest page consumer when OOM kernel panic occurs. > > Many of those patches helping debugging special cases had been shot down in the past. I don’t see much difference this time. If you worry about memory leak, enable kmemleak and then to reproduce. Otherwise, we will end up with too many heuristics just for debugging. > Thanks for your comment. We use kmemleak too, but a memory leakage which is caused by alloc_pages() in a kernel device driver cannot be caught by kmemleak. We have fought against this kind of real problems for a few years and find a way to make the debugging easier. We currently have information during OOM: process Node, zone, swap, process (pid, rss, name), slab usage, and the backtrace, order, and gfp flags of the OOM backtrace. We can tell many different types of OOM problems by the information above except the alloc_pages() leakage. The patch does work and save a lot of debugging time. Could we consider the "greatest memory consumer" as another useful OOM information? Miles > > > > The heuristic approach assumes that the OOM kernel panic is caused by > > a single backtrace. The assumption is not always true but it works in > > many cases during our test. > > > > We have tested this heuristic approach since 2019/5 on android devices. > > In 38 internal OOM kernel panic reports: > > > > 31/38: can be analyzed by using existing information > > 7/38: need page owner formatino and the heuristic approach in this patch > > prints the correct backtraces of abnormal memory allocations. No need to > > reproduce the issues.
WARNING: multiple messages have this Message-ID (diff)
From: Miles Chen <miles.chen@mediatek.com> To: Qian Cai <cai@lca.pw> Cc: Michal Hocko <mhocko@suse.com>, wsd_upstream@mediatek.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-mediatek@lists.infradead.org, Andrew Morton <akpm@linux-foundation.org> Subject: Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs Date: Tue, 24 Dec 2019 14:45:46 +0800 [thread overview] Message-ID: <1577169946.4959.4.camel@mtkswgap22> (raw) In-Reply-To: <2B938D94-FFBB-4A3D-AD07-D7D04A4D4161@lca.pw> On Mon, 2019-12-23 at 07:32 -0500, Qian Cai wrote: > > > On Dec 23, 2019, at 6:33 AM, Miles Chen <miles.chen@mediatek.com> wrote: > > > > Motivation: > > ----------- > > > > When debug with a OOM kernel panic, it is difficult to know the > > memory allocated by kernel drivers of vmalloc() by checking the > > Mem-Info or Node/Zone info. For example: > > > > Mem-Info: > > active_anon:5144 inactive_anon:16120 isolated_anon:0 > > active_file:0 inactive_file:0 isolated_file:0 > > unevictable:0 dirty:0 writeback:0 unstable:0 > > slab_reclaimable:739 slab_unreclaimable:442469 > > mapped:534 shmem:21050 pagetables:21 bounce:0 > > free:14808 free_pcp:3389 free_cma:8128 > > > > Node 0 active_anon:20576kB inactive_anon:64480kB active_file:0kB > > inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB > > mapped:2136kB dirty:0kB writeback:0kB shmem:84200kB shmem_thp: 0kB > > shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB > > all_unr eclaimable? yes > > > > Node 0 DMA free:14476kB min:21512kB low:26888kB high:32264kB > > reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB > > active_file: 0kB inactive_file:0kB unevictable:0kB writepending:0kB > > present:1048576kB managed:952736kB mlocked:0kB kernel_stack:0kB > > pagetables:0kB bounce:0kB free_pcp:2716kB local_pcp:0kB free_cma:0kB > > > > The information above tells us the memory usage of the known memory > > categories and we can check the abnormal large numbers. However, if a > > memory leakage cannot be observed in the categories above, we need to > > reproduce the issue with CONFIG_PAGE_OWNER. > > > > It is possible to read the page owner information from coredump files. > > However, coredump files may not always be available, so my approach is > > to print out the largest page consumer when OOM kernel panic occurs. > > Many of those patches helping debugging special cases had been shot down in the past. I don’t see much difference this time. If you worry about memory leak, enable kmemleak and then to reproduce. Otherwise, we will end up with too many heuristics just for debugging. > Thanks for your comment. We use kmemleak too, but a memory leakage which is caused by alloc_pages() in a kernel device driver cannot be caught by kmemleak. We have fought against this kind of real problems for a few years and find a way to make the debugging easier. We currently have information during OOM: process Node, zone, swap, process (pid, rss, name), slab usage, and the backtrace, order, and gfp flags of the OOM backtrace. We can tell many different types of OOM problems by the information above except the alloc_pages() leakage. The patch does work and save a lot of debugging time. Could we consider the "greatest memory consumer" as another useful OOM information? Miles > > > > The heuristic approach assumes that the OOM kernel panic is caused by > > a single backtrace. The assumption is not always true but it works in > > many cases during our test. > > > > We have tested this heuristic approach since 2019/5 on android devices. > > In 38 internal OOM kernel panic reports: > > > > 31/38: can be analyzed by using existing information > > 7/38: need page owner formatino and the heuristic approach in this patch > > prints the correct backtraces of abnormal memory allocations. No need to > > reproduce the issues. _______________________________________________ Linux-mediatek mailing list Linux-mediatek@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-mediatek
next prev parent reply other threads:[~2019-12-24 6:45 UTC|newest] Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-12-23 11:33 [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs Miles Chen 2019-12-23 11:33 ` Miles Chen 2019-12-23 12:32 ` Qian Cai 2019-12-23 12:32 ` Qian Cai 2019-12-24 6:45 ` Miles Chen [this message] 2019-12-24 6:45 ` Miles Chen 2019-12-24 13:47 ` Qian Cai 2019-12-24 13:47 ` Qian Cai 2019-12-25 9:29 ` Miles Chen 2019-12-25 9:29 ` Miles Chen 2019-12-25 13:53 ` Qian Cai 2019-12-25 13:53 ` Qian Cai [not found] <1806FE86-9508-43BC-8E2F-3620CD243B14@lca.pw> 2019-12-26 4:01 ` Miles Chen 2019-12-26 4:01 ` Miles Chen 2019-12-26 5:53 ` Qian Cai 2019-12-26 5:53 ` Qian Cai 2019-12-27 7:44 ` Miles Chen 2019-12-27 7:44 ` Miles Chen 2019-12-27 13:46 ` Qian Cai 2019-12-27 13:46 ` Qian Cai 2019-12-30 1:30 ` Miles Chen 2019-12-30 1:30 ` Miles Chen 2019-12-30 1:51 ` Qian Cai 2019-12-30 1:51 ` Qian Cai 2019-12-30 3:28 ` Miles Chen 2019-12-30 3:28 ` Miles Chen
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1577169946.4959.4.camel@mtkswgap22 \ --to=miles.chen@mediatek.com \ --cc=akpm@linux-foundation.org \ --cc=cai@lca.pw \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mediatek@lists.infradead.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@suse.com \ --cc=wsd_upstream@mediatek.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.