From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751224AbdFDLd0 (ORCPT ); Sun, 4 Jun 2017 07:33:26 -0400 Received: from gum.cmpxchg.org ([85.214.110.215]:43334 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750847AbdFDLdT (ORCPT ); Sun, 4 Jun 2017 07:33:19 -0400 Date: Sun, 4 Jun 2017 07:32:57 -0400 From: Johannes Weiner To: Russell King - ARM Linux Cc: Tony Lindgren , Andrew Morton , Josef Bacik , Michal Hocko , Vladimir Davydov , Rik van Riel , Mark Brown , linux-kernel@vger.kernel.org Subject: Re: Regression on ARMs in next-20170531 Message-ID: <20170604113257.GA1095@cmpxchg.org> References: <20170531164544.GF3730@atomide.com> <20170531174333.GA27796@n2100.armlinux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170531174333.GA27796@n2100.armlinux.org.uk> User-Agent: Mutt/1.8.2 (2017-04-18) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 31, 2017 at 06:43:33PM +0100, Russell King - ARM Linux wrote: > On Wed, May 31, 2017 at 09:45:45AM -0700, Tony Lindgren wrote: > > Mark Brown noticed that the so far the only booting > > ARMs are all with CONFIG_SMP disabled and I just > > confirmed that's the case. > > > 8< -------------------- > > Unable to handle kernel paging request at virtual address 2e116007 > > pgd = c0004000 > > [2e116007] *pgd=00000000 > > Internal error: Oops: 5 [#1] SMP ARM > > Modules linked in: > > CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200 > > Hardware name: Generic DRA74X (Flattened Device Tree) > > task: c0d0adc0 task.stack: c0d00000 > > PC is at __mod_node_page_state+0x2c/0xc8 > > LR is at __per_cpu_offset+0x0/0x8 > > pc : [] lr : [] psr: 600000d3 > > sp : c0d01eec ip : 00000000 fp : c15782f4 > > r10: 00000000 r9 : c1591280 r8 : 00004000 > > r7 : 00000001 r6 : 00000006 r5 : 2e116000 r4 : 00000007 > > r3 : 00000007 r2 : 00000001 r1 : 00000006 r0 : c0dc27c0 > > Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none > ... > > Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5) > > This disassembles to: > > 0: e79e5103 ldr r5, [lr, r3, lsl #2] > 4: e28c3001 add r3, ip, #1 > 8: e0833001 add r3, r3, r1 > c: e1a04003 mov r4, r3 > 10: e19440d5 ldrsb r4, [r4, r5] > > I don't have a similarly configured kernel, but here I have for the > start of this function: > > 00000680 <__mod_node_page_state>: > 680: e1a0c00d mov ip, sp > 684: e92dd870 push {r4, r5, r6, fp, ip, lr, pc} > 688: e24cb004 sub fp, ip, #4 > 68c: e590cc00 ldr ip, [r0, #3072] ; 0xc00 > 690: e1a0400d mov r4, sp > 694: ee1d6f90 mrc 15, 0, r6, cr13, cr0, {4} > 698: e08c5001 add r5, ip, r1 > 69c: e2855001 add r5, r5, #1 > 6a0: e1a03005 mov r3, r5 > 6a4: e196c0dc ldrsb ip, [r6, ip] > 6a8: e19630d3 ldrsb r3, [r6, r3] > > r5 in your code is the equivalent of r6, r4 => r3, r3 -> r5. > lr is the __per_cpu_offset array, so the first instruction is > trying to load the percpu offset. > > The faulting code is: > > x = delta + __this_cpu_read(*p); > > specifically "__this_cpu_read(*p)". > > "ip" holds "pcp" from: > > struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats; > > and you may notice that it's zero in the register dump. So, > pgdat->per_cpu_nodestats is NULL here. > > This seems to be setup in setup_per_cpu_pageset(), which in the init > order, happens way after mm_init() (which contains kmem_cache_init()). Thanks for the analysis, Russell. I think it's NULL because the slab allocation happens before even the root_mem_cgroup is set up, and so root_mem_cgroup -> lruvec -> pgdat gives us garbage. Tony, Josef, since the patches are dropped from -next, could you test the -mm tree at git://git.cmpxchg.org/linux-mmots.git and verify that this patch below fixes the issue? --- >>From 47007dfcd7873cb93d11466a93b1f41f6a7a434f Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Sun, 4 Jun 2017 07:02:44 -0400 Subject: [PATCH] mm: memcontrol: per-lruvec stats infrastructure fix 2 Even with the previous fix routing !page->mem_cgroup stats to the root cgroup, we still see crashes in certain configurations as the root is not initialized for the earliest possible accounting sites in certain configurations. Don't track uncharged pages at all, not even in the root. This takes care of early accounting as well as special pages that aren't tracked. Because we still need to account at the pgdat level, we can no longer implement the lruvec_page_state functions on top of the lruvec_state ones. But that's okay, it was a little silly to look up the nodeinfo and descend to the lruvec, only to container_of() back to the nodeinfo where the lruvec_stat structure is sitting. Signed-off-by: Johannes Weiner --- include/linux/memcontrol.h | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index bea6f08e9e16..da9360885260 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -585,27 +585,27 @@ static inline void mod_lruvec_state(struct lruvec *lruvec, static inline void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx, int val) { - struct mem_cgroup *memcg; - struct lruvec *lruvec; - - /* Special pages in the VM aren't charged, use root */ - memcg = page->mem_cgroup ? : root_mem_cgroup; + struct mem_cgroup_per_node *pn; - lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg); - __mod_lruvec_state(lruvec, idx, val); + __mod_node_page_state(page_pgdat(page), idx, val); + if (mem_cgroup_disabled() || !page->mem_cgroup) + return; + __mod_memcg_state(page->mem_cgroup, idx, val); + pn = page->mem_cgroup->nodeinfo[page_to_nid(page)]; + __this_cpu_add(pn->lruvec_stat->count[idx], val); } static inline void mod_lruvec_page_state(struct page *page, enum node_stat_item idx, int val) { - struct mem_cgroup *memcg; - struct lruvec *lruvec; - - /* Special pages in the VM aren't charged, use root */ - memcg = page->mem_cgroup ? : root_mem_cgroup; + struct mem_cgroup_per_node *pn; - lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg); - mod_lruvec_state(lruvec, idx, val); + mod_node_page_state(page_pgdat(page), idx, val); + if (mem_cgroup_disabled() || !page->mem_cgroup) + return; + mod_memcg_state(page->mem_cgroup, idx, val); + pn = page->mem_cgroup->nodeinfo[page_to_nid(page)]; + this_cpu_add(pn->lruvec_stat->count[idx], val); } unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, -- 2.13.0