From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id 92B9E2802FE for ; Fri, 30 Jun 2017 10:18:51 -0400 (EDT) Received: by mail-wr0-f200.google.com with SMTP id z1so38459802wrz.10 for ; Fri, 30 Jun 2017 07:18:51 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id p91si6107862wrc.257.2017.06.30.07.18.49 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 30 Jun 2017 07:18:50 -0700 (PDT) Date: Fri, 30 Jun 2017 16:18:47 +0200 From: Michal Hocko Subject: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? Message-ID: <20170630141847.GN22917@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Yang Shi Cc: Joonsoo Kim , Mel Gorman , Andrew Morton , Vlastimil Babka , linux-mm@kvack.org, LKML fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init") seem to silently depend on CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID resp. CONFIG_HAVE_MEMBLOCK_NODE_MAP. early_pfn_to_nid is returning zero with !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) I am not sure how widely is this used but such a code is tricky. I see how catching early allocations during defered initialization might be useful but a subtly broken code sounds like a problem to me. So is fe53ca54270a worth this or we should revert it? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id 525D82802FE for ; Fri, 30 Jun 2017 11:42:30 -0400 (EDT) Received: by mail-wm0-f70.google.com with SMTP id 130so3165278wmq.4 for ; Fri, 30 Jun 2017 08:42:30 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id 2si6121119wrd.260.2017.06.30.08.42.28 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 30 Jun 2017 08:42:28 -0700 (PDT) Date: Fri, 30 Jun 2017 17:42:24 +0200 From: Michal Hocko Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? Message-ID: <20170630154224.GA9714@dhcp22.suse.cz> References: <20170630141847.GN22917@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170630141847.GN22917@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Yang Shi Cc: Joonsoo Kim , Mel Gorman , Andrew Morton , Vlastimil Babka , linux-mm@kvack.org, LKML On Fri 30-06-17 16:18:47, Michal Hocko wrote: > fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init") seem > to silently depend on CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID resp. > CONFIG_HAVE_MEMBLOCK_NODE_MAP. early_pfn_to_nid is returning zero with > !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) > I am not sure how widely is this used but such a code is tricky. I see > how catching early allocations during defered initialization might be > useful but a subtly broken code sounds like a problem to me. So is > fe53ca54270a worth this or we should revert it? I've dug little bit further. It seems that only s390 and ia64 select HAVE_ARCH_EARLY_PFN_TO_NID. Much more architectures enabled HAVE_MEMBLOCK_NODE_MAP though but still alpha, arc, arm, avr32, blackfin, c6x, cris, frv, h8300, hexagon, Kconfig, m32r, m68k, mn10300, nios2, openrisc, parisc, tile, um, unicore32, xtensa do not. I can only see alpha having NUMA and even that is marked BROKEN. So it seems that this is not a real problem after all. Still subtle, so I guess we want to have the following. What do you think? diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 16532fa0bb64..894697c1e6f5 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1055,6 +1055,7 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) static inline unsigned long early_pfn_to_nid(unsigned long pfn) { + BUILD_BUG_ON(!IS_ENABLED(CONFIG_NUMA)); return 0; } #endif -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id 152122802FE for ; Fri, 30 Jun 2017 11:44:20 -0400 (EDT) Received: by mail-wr0-f200.google.com with SMTP id x23so38711604wrb.6 for ; Fri, 30 Jun 2017 08:44:20 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id j198si11187038wmg.14.2017.06.30.08.44.18 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 30 Jun 2017 08:44:19 -0700 (PDT) Date: Fri, 30 Jun 2017 17:44:16 +0200 From: Michal Hocko Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? Message-ID: <20170630154416.GB9714@dhcp22.suse.cz> References: <20170630141847.GN22917@dhcp22.suse.cz> <20170630154224.GA9714@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170630154224.GA9714@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Yang Shi Cc: Joonsoo Kim , Mel Gorman , Andrew Morton , Vlastimil Babka , linux-mm@kvack.org, LKML On Fri 30-06-17 17:42:24, Michal Hocko wrote: [...] > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 16532fa0bb64..894697c1e6f5 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -1055,6 +1055,7 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, > !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) > static inline unsigned long early_pfn_to_nid(unsigned long pfn) > { > + BUILD_BUG_ON(!IS_ENABLED(CONFIG_NUMA)); Err, this should read BUILD_BUG_ON(IS_ENABLED(CONFIG_NUMA)) of course > return 0; > } > #endif -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id 342E76B0279 for ; Mon, 3 Jul 2017 07:48:09 -0400 (EDT) Received: by mail-wr0-f197.google.com with SMTP id v60so43002505wrc.7 for ; Mon, 03 Jul 2017 04:48:09 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id l11si13034616wrl.217.2017.07.03.04.48.07 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 03 Jul 2017 04:48:07 -0700 (PDT) Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? References: <20170630141847.GN22917@dhcp22.suse.cz> From: Vlastimil Babka Message-ID: <54336b9a-6dc7-890f-1900-c4188fb6cf1a@suse.cz> Date: Mon, 3 Jul 2017 13:48:05 +0200 MIME-Version: 1.0 In-Reply-To: <20170630141847.GN22917@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko , Yang Shi Cc: Joonsoo Kim , Mel Gorman , Andrew Morton , linux-mm@kvack.org, LKML On 06/30/2017 04:18 PM, Michal Hocko wrote: > fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init") seem > to silently depend on CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID resp. > CONFIG_HAVE_MEMBLOCK_NODE_MAP. early_pfn_to_nid is returning zero with > !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) > I am not sure how widely is this used but such a code is tricky. I see > how catching early allocations during defered initialization might be > useful but a subtly broken code sounds like a problem to me. So is > fe53ca54270a worth this or we should revert it? There might be more issues with fe53ca54270a, I think. This I've observed on our 4.4-based kernel, which has deferred page struct init, but doesn't have b8f1a75d61d8 ("mm: call page_ext_init() after all struct pages are initialized") nor aforementioned fe53ca54270a: [ 0.000000] allocated 421003264 bytes of page_ext [ 0.000000] Node 0, zone DMA: page owner found early allocated 0 pages [ 0.000000] Node 0, zone DMA32: page owner found early allocated 33 pages [ 0.000000] Node 0, zone Normal: page owner found early allocated 2842622 pages [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null) [ 0.000000] IP: [] init_page_owner+0x12a/0x240 [ 0.000000] PGD 0 [ 0.000000] Oops: 0000 [#1] SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.74+ #7 [ 0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 [ 0.000000] task: ffffffff81e104c0 ti: ffffffff81e00000 task.ti: ffffffff81e00000 [ 0.000000] RIP: 0010:[] [] init_page_owner+0x12a/0x240 [ 0.000000] RSP: 0000:ffffffff81e03ed0 EFLAGS: 00010046 [ 0.000000] RAX: 0000000000000000 RBX: ffff88083ffe0210 RCX: ffffea0013000000 [ 0.000000] RDX: 0000000000000300 RSI: ffffffff81f57437 RDI: 00000000004c0000 [ 0.000000] RBP: ffffffff81e03f20 R08: ffffffff81e03e90 R09: 0000000000000000 [ 0.000000] R10: 00000000004c0200 R11: 0000000000000000 R12: ffffea0000000000 [ 0.000000] R13: 00000000004c0200 R14: 00000000004c0000 R15: 0000000000840000 [ 0.000000] FS: 0000000000000000(0000) GS:ffff88042fc00000(0000) knlGS:0000000000000000 [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.000000] CR2: 0000000000000000 CR3: 0000000001e0b000 CR4: 00000000000406b0 [ 0.000000] Stack: [ 0.000000] 0000000000000206 ffff88083ffe0f90 ffff88083ffdf000 0000000000003181 [ 0.000000] ffffea0013000000 0000000000000040 ffffea0000000000 0000000000840000 [ 0.000000] 0000000000840000 000000008e000010 ffffffff81e03f50 ffffffff81f84145 [ 0.000000] Call Trace: [ 0.000000] [] page_ext_init+0x15e/0x167 [ 0.000000] [] start_kernel+0x351/0x418 [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 [ 0.000000] [] x86_64_start_reservations+0x2a/0x2c [ 0.000000] [] x86_64_start_kernel+0x12c/0x13b [ 0.000000] Code: 81 e2 00 fe ff ff 4d 39 fa 4d 0f 47 d7 4d 39 f2 4d 89 d5 77 34 eb 5e 48 8b 01 f6 c4 04 75 21 48 89 cf 48 89 4d d0 e8 b6 35 00 00 <48> 8b 00 a8 04 75 0e 48 8b 4d d0 e9 c2 00 00 00 48 83 45 c8 01 [ 0.000000] RIP [] init_page_owner+0x12a/0x240 [ 0.000000] RSP [ 0.000000] CR2: 0000000000000000 [ 0.000000] ---[ end trace 19e05592f03a690f ]--- Note that this is different backtrace than in b8f1a75d61d8 log. Still, backporting b8f1a75d61d8 fixes this: [ 1.538379] allocated 738197504 bytes of page_ext [ 1.539340] Node 0, zone DMA: page owner found early allocated 0 pages [ 1.540179] Node 0, zone DMA32: page owner found early allocated 33 pages [ 1.611173] Node 0, zone Normal: page owner found early allocated 96755 pages [ 1.683167] Node 1, zone Normal: page owner found early allocated 96575 pages No panic, notice how it allocated more for page_ext, and found smaller number of early allocated pages. Now backporting fe53ca54270a on top: [ 0.000000] allocated 738197504 bytes of page_ext [ 0.000000] Node 0, zone DMA: page owner found early allocated 0 pages [ 0.000000] Node 0, zone DMA32: page owner found early allocated 33 pages [ 0.000000] Node 0, zone Normal: page owner found early allocated 2842622 pages [ 0.000000] Node 1, zone Normal: page owner found early allocated 3694362 pages Again no panic, and same amount of page_ext usage. But the "early allocated" numbers seem bogus to me. I think it's because init_pages_in_zone() is running and inspecting struct pages that have not been yet initialized. It doesn't end up crashing, but still doesn't seem correct? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id AFFE26B0292 for ; Mon, 3 Jul 2017 10:18:05 -0400 (EDT) Received: by mail-wm0-f72.google.com with SMTP id 130so14792045wmq.4 for ; Mon, 03 Jul 2017 07:18:05 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id r41si11998747wrb.298.2017.07.03.07.18.03 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 03 Jul 2017 07:18:03 -0700 (PDT) Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? From: Vlastimil Babka References: <20170630141847.GN22917@dhcp22.suse.cz> <54336b9a-6dc7-890f-1900-c4188fb6cf1a@suse.cz> Message-ID: Date: Mon, 3 Jul 2017 16:18:01 +0200 MIME-Version: 1.0 In-Reply-To: <54336b9a-6dc7-890f-1900-c4188fb6cf1a@suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko , Yang Shi Cc: Joonsoo Kim , Mel Gorman , Andrew Morton , linux-mm@kvack.org, LKML On 07/03/2017 01:48 PM, Vlastimil Babka wrote: > On 06/30/2017 04:18 PM, Michal Hocko wrote: >> fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init") seem >> to silently depend on CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID resp. >> CONFIG_HAVE_MEMBLOCK_NODE_MAP. early_pfn_to_nid is returning zero with >> !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) >> I am not sure how widely is this used but such a code is tricky. I see >> how catching early allocations during defered initialization might be >> useful but a subtly broken code sounds like a problem to me. So is >> fe53ca54270a worth this or we should revert it? > > There might be more issues with fe53ca54270a, I think. This I've > observed on our 4.4-based kernel, which has deferred page struct init, > but doesn't have b8f1a75d61d8 ("mm: call page_ext_init() after all > struct pages are initialized") nor aforementioned fe53ca54270a: > > [ 0.000000] allocated 421003264 bytes of page_ext > [ 0.000000] Node 0, zone DMA: page owner found early allocated 0 pages > [ 0.000000] Node 0, zone DMA32: page owner found early allocated 33 pages > [ 0.000000] Node 0, zone Normal: page owner found early allocated 2842622 pages > [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null) > [ 0.000000] IP: [] init_page_owner+0x12a/0x240 > [ 0.000000] PGD 0 > [ 0.000000] Oops: 0000 [#1] SMP > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.74+ #7 > [ 0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 > [ 0.000000] task: ffffffff81e104c0 ti: ffffffff81e00000 task.ti: ffffffff81e00000 > [ 0.000000] RIP: 0010:[] [] init_page_owner+0x12a/0x240 > [ 0.000000] RSP: 0000:ffffffff81e03ed0 EFLAGS: 00010046 > [ 0.000000] RAX: 0000000000000000 RBX: ffff88083ffe0210 RCX: ffffea0013000000 > [ 0.000000] RDX: 0000000000000300 RSI: ffffffff81f57437 RDI: 00000000004c0000 > [ 0.000000] RBP: ffffffff81e03f20 R08: ffffffff81e03e90 R09: 0000000000000000 > [ 0.000000] R10: 00000000004c0200 R11: 0000000000000000 R12: ffffea0000000000 > [ 0.000000] R13: 00000000004c0200 R14: 00000000004c0000 R15: 0000000000840000 > [ 0.000000] FS: 0000000000000000(0000) GS:ffff88042fc00000(0000) knlGS:0000000000000000 > [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.000000] CR2: 0000000000000000 CR3: 0000000001e0b000 CR4: 00000000000406b0 > [ 0.000000] Stack: > [ 0.000000] 0000000000000206 ffff88083ffe0f90 ffff88083ffdf000 0000000000003181 > [ 0.000000] ffffea0013000000 0000000000000040 ffffea0000000000 0000000000840000 > [ 0.000000] 0000000000840000 000000008e000010 ffffffff81e03f50 ffffffff81f84145 > [ 0.000000] Call Trace: > [ 0.000000] [] page_ext_init+0x15e/0x167 > [ 0.000000] [] start_kernel+0x351/0x418 > [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 > [ 0.000000] [] x86_64_start_reservations+0x2a/0x2c > [ 0.000000] [] x86_64_start_kernel+0x12c/0x13b > [ 0.000000] Code: 81 e2 00 fe ff ff 4d 39 fa 4d 0f 47 d7 4d 39 f2 4d 89 d5 77 34 eb 5e 48 8b 01 f6 c4 04 75 21 48 89 cf 48 89 4d d0 e8 b6 35 00 00 <48> 8b 00 a8 04 75 0e 48 8b 4d d0 e9 c2 00 00 00 48 83 45 c8 01 > [ 0.000000] RIP [] init_page_owner+0x12a/0x240 > [ 0.000000] RSP > [ 0.000000] CR2: 0000000000000000 > [ 0.000000] ---[ end trace 19e05592f03a690f ]--- > > Note that this is different backtrace than in b8f1a75d61d8 log. > > Still, backporting b8f1a75d61d8 fixes this: > > [ 1.538379] allocated 738197504 bytes of page_ext > [ 1.539340] Node 0, zone DMA: page owner found early allocated 0 pages > [ 1.540179] Node 0, zone DMA32: page owner found early allocated 33 pages > [ 1.611173] Node 0, zone Normal: page owner found early allocated 96755 pages > [ 1.683167] Node 1, zone Normal: page owner found early allocated 96575 pages > > No panic, notice how it allocated more for page_ext, and found smaller number of > early allocated pages. > > Now backporting fe53ca54270a on top: > > [ 0.000000] allocated 738197504 bytes of page_ext > [ 0.000000] Node 0, zone DMA: page owner found early allocated 0 pages > [ 0.000000] Node 0, zone DMA32: page owner found early allocated 33 pages > [ 0.000000] Node 0, zone Normal: page owner found early allocated 2842622 pages > [ 0.000000] Node 1, zone Normal: page owner found early allocated 3694362 pages > > Again no panic, and same amount of page_ext usage. But the "early allocated" numbers > seem bogus to me. I think it's because init_pages_in_zone() is running and inspecting > struct pages that have not been yet initialized. It doesn't end up crashing, but > still doesn't seem correct? Hmm, and for the record, this is recent mainline. Wonder why the "early allocated" looks much more sane there. But there's a warning nevertheless. [ 0.001000] allocated 201326592 bytes of page_ext [ 0.001000] ------------[ cut here ]------------ [ 0.001000] WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:2467 drain_all_pages+0x1a9/0x1d0 [ 0.001000] Modules linked in: [ 0.001000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.0-rc7+ #179 [ 0.001000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 [ 0.001000] task: ffffffff8200f4c0 task.stack: ffffffff82000000 [ 0.001000] RIP: 0010:drain_all_pages+0x1a9/0x1d0 [ 0.001000] RSP: 0000:ffffffff82003e18 EFLAGS: 00010246 [ 0.001000] RAX: ffffffff8200f4c0 RBX: 0000000000000040 RCX: 0000000000000000 [ 0.001000] RDX: ffffffff8200f4c0 RSI: 0000000000000000 RDI: 0000000000000000 [ 0.001000] RBP: ffffffff82003e80 R08: 0000000000000040 R09: 0000000000000001 [ 0.001000] R10: 0000000000000080 R11: ffffffff81086356 R12: 0000000000840000 [ 0.001000] R13: 0000000000000040 R14: 0000000000840000 R15: 000000008e000010 [ 0.001000] FS: 0000000000000000(0000) GS:ffff88042fc00000(0000) knlGS:0000000000000000 [ 0.001000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.001000] CR2: 00000000ffffffff CR3: 000000000200a000 CR4: 00000000000406b0 [ 0.001000] Call Trace: [ 0.001000] ? init_page_owner+0x39/0x250 [ 0.001000] ? printk+0x3e/0x46 [ 0.001000] page_ext_init+0x195/0x19e [ 0.001000] start_kernel+0x31a/0x3dc [ 0.001000] ? early_idt_handler_array+0x120/0x120 [ 0.001000] x86_64_start_reservations+0x2a/0x2c [ 0.001000] x86_64_start_kernel+0x12d/0x13c [ 0.001000] secondary_startup_64+0x9f/0x9f [ 0.001000] Code: 52 82 48 63 d2 e8 a8 09 26 00 3b 05 d6 fb fa 00 89 c3 7c cd 48 c7 c7 c0 02 06 82 e8 b2 05 87 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> ff c3 4d 85 e4 74 ed 48 c7 c7 c0 02 06 82 e8 83 0b 87 00 e9 [ 0.001000] ---[ end trace 392dd6a55122ccf6 ]--- [ 0.001000] Node 0, zone DMA: page owner found early allocated 0 pages [ 0.001000] Node 0, zone DMA32: page owner found early allocated 33 pages [ 0.001000] Node 0, zone Normal: page owner found early allocated 24732 pages [ 0.001000] Node 1, zone Normal: page owner found early allocated 24577 pages -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 9CAEB6B0279 for ; Tue, 4 Jul 2017 01:11:51 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id j79so218937142pfj.9 for ; Mon, 03 Jul 2017 22:11:51 -0700 (PDT) Received: from mail-pg0-x243.google.com (mail-pg0-x243.google.com. [2607:f8b0:400e:c05::243]) by mx.google.com with ESMTPS id g9si9820665plk.482.2017.07.03.22.11.50 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Jul 2017 22:11:50 -0700 (PDT) Received: by mail-pg0-x243.google.com with SMTP id u36so25138405pgn.3 for ; Mon, 03 Jul 2017 22:11:50 -0700 (PDT) Date: Tue, 4 Jul 2017 14:11:41 +0900 From: Joonsoo Kim Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? Message-ID: <20170704051138.GA28589@js1304-desktop> References: <20170630141847.GN22917@dhcp22.suse.cz> <20170630154224.GA9714@dhcp22.suse.cz> <20170630154416.GB9714@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170630154416.GB9714@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Yang Shi , Mel Gorman , Andrew Morton , Vlastimil Babka , linux-mm@kvack.org, LKML On Fri, Jun 30, 2017 at 05:44:16PM +0200, Michal Hocko wrote: > On Fri 30-06-17 17:42:24, Michal Hocko wrote: > [...] > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > index 16532fa0bb64..894697c1e6f5 100644 > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -1055,6 +1055,7 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, > > !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) > > static inline unsigned long early_pfn_to_nid(unsigned long pfn) > > { > > + BUILD_BUG_ON(!IS_ENABLED(CONFIG_NUMA)); > > Err, this should read BUILD_BUG_ON(IS_ENABLED(CONFIG_NUMA)) of course Agreed. However, AFAIK, ARM can set CONFIG_NUMA but it doesn't have CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID and CONFIG_HAVE_MEMBLOCK_NODE_MAP. If page_ext uses early_pfn_to_nid(), it will cause build error in ARM. Therefore, I suggest following change. CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on proper early_pfn_to_nid(). So, following code will always work as long as CONFIG_DEFERRED_STRUCT_PAGE_INIT works. Thanks. ----------->8--------------- diff --git a/mm/page_ext.c b/mm/page_ext.c index 88ccc044..e3db259 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -384,6 +384,7 @@ void __init page_ext_init(void) for_each_node_state(nid, N_MEMORY) { unsigned long start_pfn, end_pfn; + int page_nid; start_pfn = node_start_pfn(nid); end_pfn = node_end_pfn(nid); @@ -405,8 +406,15 @@ void __init page_ext_init(void) * * Take into account DEFERRED_STRUCT_PAGE_INIT. */ - if (early_pfn_to_nid(pfn) != nid) +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT + page_nid = early_pfn_to_nid(pfn); +#else + page_nid = pfn_to_nid(pfn); +#endif + + if (page_nid != nid) continue; + if (init_section_page_ext(pfn, nid)) goto oom; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id E97D26B0279 for ; Tue, 4 Jul 2017 01:17:21 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id e3so219741446pfc.4 for ; Mon, 03 Jul 2017 22:17:21 -0700 (PDT) Received: from mail-pg0-x235.google.com (mail-pg0-x235.google.com. [2607:f8b0:400e:c05::235]) by mx.google.com with ESMTPS id d23si14762318pli.425.2017.07.03.22.17.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Jul 2017 22:17:21 -0700 (PDT) Received: by mail-pg0-x235.google.com with SMTP id j186so104484493pge.2 for ; Mon, 03 Jul 2017 22:17:21 -0700 (PDT) Date: Tue, 4 Jul 2017 14:17:14 +0900 From: Joonsoo Kim Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? Message-ID: <20170704051713.GB28589@js1304-desktop> References: <20170630141847.GN22917@dhcp22.suse.cz> <54336b9a-6dc7-890f-1900-c4188fb6cf1a@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54336b9a-6dc7-890f-1900-c4188fb6cf1a@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: Michal Hocko , Yang Shi , Mel Gorman , Andrew Morton , linux-mm@kvack.org, LKML On Mon, Jul 03, 2017 at 01:48:05PM +0200, Vlastimil Babka wrote: > On 06/30/2017 04:18 PM, Michal Hocko wrote: > > fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init") seem > > to silently depend on CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID resp. > > CONFIG_HAVE_MEMBLOCK_NODE_MAP. early_pfn_to_nid is returning zero with > > !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) > > I am not sure how widely is this used but such a code is tricky. I see > > how catching early allocations during defered initialization might be > > useful but a subtly broken code sounds like a problem to me. So is > > fe53ca54270a worth this or we should revert it? > > There might be more issues with fe53ca54270a, I think. This I've > observed on our 4.4-based kernel, which has deferred page struct init, > but doesn't have b8f1a75d61d8 ("mm: call page_ext_init() after all > struct pages are initialized") nor aforementioned fe53ca54270a: > > [ 0.000000] allocated 421003264 bytes of page_ext > [ 0.000000] Node 0, zone DMA: page owner found early allocated 0 pages > [ 0.000000] Node 0, zone DMA32: page owner found early allocated 33 pages > [ 0.000000] Node 0, zone Normal: page owner found early allocated 2842622 pages > [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null) > [ 0.000000] IP: [] init_page_owner+0x12a/0x240 > [ 0.000000] PGD 0 > [ 0.000000] Oops: 0000 [#1] SMP > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.74+ #7 > [ 0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 > [ 0.000000] task: ffffffff81e104c0 ti: ffffffff81e00000 task.ti: ffffffff81e00000 > [ 0.000000] RIP: 0010:[] [] init_page_owner+0x12a/0x240 > [ 0.000000] RSP: 0000:ffffffff81e03ed0 EFLAGS: 00010046 > [ 0.000000] RAX: 0000000000000000 RBX: ffff88083ffe0210 RCX: ffffea0013000000 > [ 0.000000] RDX: 0000000000000300 RSI: ffffffff81f57437 RDI: 00000000004c0000 > [ 0.000000] RBP: ffffffff81e03f20 R08: ffffffff81e03e90 R09: 0000000000000000 > [ 0.000000] R10: 00000000004c0200 R11: 0000000000000000 R12: ffffea0000000000 > [ 0.000000] R13: 00000000004c0200 R14: 00000000004c0000 R15: 0000000000840000 > [ 0.000000] FS: 0000000000000000(0000) GS:ffff88042fc00000(0000) knlGS:0000000000000000 > [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.000000] CR2: 0000000000000000 CR3: 0000000001e0b000 CR4: 00000000000406b0 > [ 0.000000] Stack: > [ 0.000000] 0000000000000206 ffff88083ffe0f90 ffff88083ffdf000 0000000000003181 > [ 0.000000] ffffea0013000000 0000000000000040 ffffea0000000000 0000000000840000 > [ 0.000000] 0000000000840000 000000008e000010 ffffffff81e03f50 ffffffff81f84145 > [ 0.000000] Call Trace: > [ 0.000000] [] page_ext_init+0x15e/0x167 > [ 0.000000] [] start_kernel+0x351/0x418 > [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 > [ 0.000000] [] x86_64_start_reservations+0x2a/0x2c > [ 0.000000] [] x86_64_start_kernel+0x12c/0x13b > [ 0.000000] Code: 81 e2 00 fe ff ff 4d 39 fa 4d 0f 47 d7 4d 39 f2 4d 89 d5 77 34 eb 5e 48 8b 01 f6 c4 04 75 21 48 89 cf 48 89 4d d0 e8 b6 35 00 00 <48> 8b 00 a8 04 75 0e 48 8b 4d d0 e9 c2 00 00 00 48 83 45 c8 01 > [ 0.000000] RIP [] init_page_owner+0x12a/0x240 > [ 0.000000] RSP > [ 0.000000] CR2: 0000000000000000 > [ 0.000000] ---[ end trace 19e05592f03a690f ]--- > > Note that this is different backtrace than in b8f1a75d61d8 log. > > Still, backporting b8f1a75d61d8 fixes this: > > [ 1.538379] allocated 738197504 bytes of page_ext > [ 1.539340] Node 0, zone DMA: page owner found early allocated 0 pages > [ 1.540179] Node 0, zone DMA32: page owner found early allocated 33 pages > [ 1.611173] Node 0, zone Normal: page owner found early allocated 96755 pages > [ 1.683167] Node 1, zone Normal: page owner found early allocated 96575 pages > > No panic, notice how it allocated more for page_ext, and found smaller number of > early allocated pages. > > Now backporting fe53ca54270a on top: > > [ 0.000000] allocated 738197504 bytes of page_ext > [ 0.000000] Node 0, zone DMA: page owner found early allocated 0 pages > [ 0.000000] Node 0, zone DMA32: page owner found early allocated 33 pages > [ 0.000000] Node 0, zone Normal: page owner found early allocated 2842622 pages > [ 0.000000] Node 1, zone Normal: page owner found early allocated 3694362 pages > > Again no panic, and same amount of page_ext usage. But the "early allocated" numbers > seem bogus to me. I think it's because init_pages_in_zone() is running and inspecting > struct pages that have not been yet initialized. It doesn't end up crashing, but > still doesn't seem correct? Numbers looks sane to me. fe53ca54270a makes init_pages_in_zone() called before page_alloc_init_late(). So, there would be many uninitialized pages with PageReserved(). Page owner regarded these PageReserved() page as allocated page. We can change the message to "page owner found early reserved N pages" Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 5C5B66B0279 for ; Tue, 4 Jul 2017 01:23:13 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id g27so219772608pfj.6 for ; Mon, 03 Jul 2017 22:23:13 -0700 (PDT) Received: from mail-pg0-x236.google.com (mail-pg0-x236.google.com. [2607:f8b0:400e:c05::236]) by mx.google.com with ESMTPS id c1si14456299pld.330.2017.07.03.22.23.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Jul 2017 22:23:12 -0700 (PDT) Received: by mail-pg0-x236.google.com with SMTP id u62so104538693pgb.3 for ; Mon, 03 Jul 2017 22:23:12 -0700 (PDT) Date: Tue, 4 Jul 2017 14:23:06 +0900 From: Joonsoo Kim Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? Message-ID: <20170704052304.GC28589@js1304-desktop> References: <20170630141847.GN22917@dhcp22.suse.cz> <54336b9a-6dc7-890f-1900-c4188fb6cf1a@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: Michal Hocko , Yang Shi , Mel Gorman , Andrew Morton , linux-mm@kvack.org, LKML On Mon, Jul 03, 2017 at 04:18:01PM +0200, Vlastimil Babka wrote: > On 07/03/2017 01:48 PM, Vlastimil Babka wrote: > > On 06/30/2017 04:18 PM, Michal Hocko wrote: > >> fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init") seem > >> to silently depend on CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID resp. > >> CONFIG_HAVE_MEMBLOCK_NODE_MAP. early_pfn_to_nid is returning zero with > >> !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) > >> I am not sure how widely is this used but such a code is tricky. I see > >> how catching early allocations during defered initialization might be > >> useful but a subtly broken code sounds like a problem to me. So is > >> fe53ca54270a worth this or we should revert it? > > > > There might be more issues with fe53ca54270a, I think. This I've > > observed on our 4.4-based kernel, which has deferred page struct init, > > but doesn't have b8f1a75d61d8 ("mm: call page_ext_init() after all > > struct pages are initialized") nor aforementioned fe53ca54270a: > > > > [ 0.000000] allocated 421003264 bytes of page_ext > > [ 0.000000] Node 0, zone DMA: page owner found early allocated 0 pages > > [ 0.000000] Node 0, zone DMA32: page owner found early allocated 33 pages > > [ 0.000000] Node 0, zone Normal: page owner found early allocated 2842622 pages > > [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null) > > [ 0.000000] IP: [] init_page_owner+0x12a/0x240 > > [ 0.000000] PGD 0 > > [ 0.000000] Oops: 0000 [#1] SMP > > [ 0.000000] Modules linked in: > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.74+ #7 > > [ 0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 > > [ 0.000000] task: ffffffff81e104c0 ti: ffffffff81e00000 task.ti: ffffffff81e00000 > > [ 0.000000] RIP: 0010:[] [] init_page_owner+0x12a/0x240 > > [ 0.000000] RSP: 0000:ffffffff81e03ed0 EFLAGS: 00010046 > > [ 0.000000] RAX: 0000000000000000 RBX: ffff88083ffe0210 RCX: ffffea0013000000 > > [ 0.000000] RDX: 0000000000000300 RSI: ffffffff81f57437 RDI: 00000000004c0000 > > [ 0.000000] RBP: ffffffff81e03f20 R08: ffffffff81e03e90 R09: 0000000000000000 > > [ 0.000000] R10: 00000000004c0200 R11: 0000000000000000 R12: ffffea0000000000 > > [ 0.000000] R13: 00000000004c0200 R14: 00000000004c0000 R15: 0000000000840000 > > [ 0.000000] FS: 0000000000000000(0000) GS:ffff88042fc00000(0000) knlGS:0000000000000000 > > [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 0.000000] CR2: 0000000000000000 CR3: 0000000001e0b000 CR4: 00000000000406b0 > > [ 0.000000] Stack: > > [ 0.000000] 0000000000000206 ffff88083ffe0f90 ffff88083ffdf000 0000000000003181 > > [ 0.000000] ffffea0013000000 0000000000000040 ffffea0000000000 0000000000840000 > > [ 0.000000] 0000000000840000 000000008e000010 ffffffff81e03f50 ffffffff81f84145 > > [ 0.000000] Call Trace: > > [ 0.000000] [] page_ext_init+0x15e/0x167 > > [ 0.000000] [] start_kernel+0x351/0x418 > > [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 > > [ 0.000000] [] x86_64_start_reservations+0x2a/0x2c > > [ 0.000000] [] x86_64_start_kernel+0x12c/0x13b > > [ 0.000000] Code: 81 e2 00 fe ff ff 4d 39 fa 4d 0f 47 d7 4d 39 f2 4d 89 d5 77 34 eb 5e 48 8b 01 f6 c4 04 75 21 48 89 cf 48 89 4d d0 e8 b6 35 00 00 <48> 8b 00 a8 04 75 0e 48 8b 4d d0 e9 c2 00 00 00 48 83 45 c8 01 > > [ 0.000000] RIP [] init_page_owner+0x12a/0x240 > > [ 0.000000] RSP > > [ 0.000000] CR2: 0000000000000000 > > [ 0.000000] ---[ end trace 19e05592f03a690f ]--- > > > > Note that this is different backtrace than in b8f1a75d61d8 log. > > > > Still, backporting b8f1a75d61d8 fixes this: > > > > [ 1.538379] allocated 738197504 bytes of page_ext > > [ 1.539340] Node 0, zone DMA: page owner found early allocated 0 pages > > [ 1.540179] Node 0, zone DMA32: page owner found early allocated 33 pages > > [ 1.611173] Node 0, zone Normal: page owner found early allocated 96755 pages > > [ 1.683167] Node 1, zone Normal: page owner found early allocated 96575 pages > > > > No panic, notice how it allocated more for page_ext, and found smaller number of > > early allocated pages. > > > > Now backporting fe53ca54270a on top: > > > > [ 0.000000] allocated 738197504 bytes of page_ext > > [ 0.000000] Node 0, zone DMA: page owner found early allocated 0 pages > > [ 0.000000] Node 0, zone DMA32: page owner found early allocated 33 pages > > [ 0.000000] Node 0, zone Normal: page owner found early allocated 2842622 pages > > [ 0.000000] Node 1, zone Normal: page owner found early allocated 3694362 pages > > > > Again no panic, and same amount of page_ext usage. But the "early allocated" numbers > > seem bogus to me. I think it's because init_pages_in_zone() is running and inspecting > > struct pages that have not been yet initialized. It doesn't end up crashing, but > > still doesn't seem correct? > > Hmm, and for the record, this is recent mainline. Wonder why the "early Hmm... I don't know the reason. > allocated" looks much more sane there. But there's a warning nevertheless. Warning would comes from the fact that drain_all_pages() is called before mm_percpu_wq is initialised. We could remove WARN_ON_ONCE() and add drain_local_page(zone) to fix the problem. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id 51ACC6B0279 for ; Tue, 4 Jul 2017 02:51:17 -0400 (EDT) Received: by mail-wr0-f197.google.com with SMTP id v60so44627415wrc.7 for ; Mon, 03 Jul 2017 23:51:17 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id u143si6598741wmu.44.2017.07.03.23.51.15 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 03 Jul 2017 23:51:16 -0700 (PDT) Date: Tue, 4 Jul 2017 08:51:12 +0200 From: Michal Hocko Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? Message-ID: <20170704065112.GA12068@dhcp22.suse.cz> References: <20170630141847.GN22917@dhcp22.suse.cz> <20170630154224.GA9714@dhcp22.suse.cz> <20170630154416.GB9714@dhcp22.suse.cz> <20170704051138.GA28589@js1304-desktop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170704051138.GA28589@js1304-desktop> Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim Cc: Yang Shi , Mel Gorman , Andrew Morton , Vlastimil Babka , linux-mm@kvack.org, LKML On Tue 04-07-17 14:11:41, Joonsoo Kim wrote: > On Fri, Jun 30, 2017 at 05:44:16PM +0200, Michal Hocko wrote: > > On Fri 30-06-17 17:42:24, Michal Hocko wrote: > > [...] > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > > index 16532fa0bb64..894697c1e6f5 100644 > > > --- a/include/linux/mmzone.h > > > +++ b/include/linux/mmzone.h > > > @@ -1055,6 +1055,7 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, > > > !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) > > > static inline unsigned long early_pfn_to_nid(unsigned long pfn) > > > { > > > + BUILD_BUG_ON(!IS_ENABLED(CONFIG_NUMA)); > > > > Err, this should read BUILD_BUG_ON(IS_ENABLED(CONFIG_NUMA)) of course > > Agreed. > > However, AFAIK, ARM can set CONFIG_NUMA but it doesn't have > CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID and CONFIG_HAVE_MEMBLOCK_NODE_MAP. $ git grep "config NUMA\|select NUMA" arch/arm $ Did you mean arch64? If yes this one looks ok $ git grep "HAVE_MEMBLOCK_NODE_MAP\|HAVE_ARCH_EARLY_PFN_TO_NID" arch/arm64/ arch/arm64/Kconfig: select HAVE_MEMBLOCK_NODE_MAP if NUMA > If page_ext uses early_pfn_to_nid(), it will cause build error in ARM. Which would be intentional if it doesn't provide a proper implementation of the function. > Therefore, I suggest following change. > CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on proper early_pfn_to_nid(). > So, following code will always work as long as > CONFIG_DEFERRED_STRUCT_PAGE_INIT works. I haven't checked all other callers of early_pfn_to_nid yet but I have run my original patch (with !IS_ENABLED...) just to see whether anybody actually uses this function from an innvalid context and it hasn't blown up. So I suspect that all current users simply use the function from the proper context. So if nobody objects I would just post the patch for inclusion. If the compilation breaks we can think of a proper implementation. > > Thanks. > > ----------->8--------------- > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 88ccc044..e3db259 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -384,6 +384,7 @@ void __init page_ext_init(void) > > for_each_node_state(nid, N_MEMORY) { > unsigned long start_pfn, end_pfn; > + int page_nid; > > start_pfn = node_start_pfn(nid); > end_pfn = node_end_pfn(nid); > @@ -405,8 +406,15 @@ void __init page_ext_init(void) > * > * Take into account DEFERRED_STRUCT_PAGE_INIT. > */ > - if (early_pfn_to_nid(pfn) != nid) > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > + page_nid = early_pfn_to_nid(pfn); > +#else > + page_nid = pfn_to_nid(pfn); > +#endif I cannot say I would be happy about this ifdefery. Especially when there is no existing user which would need it. > + > + if (page_nid != nid) > continue; > + > if (init_section_page_ext(pfn, nid)) > goto oom; > } -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id 915B36B0313 for ; Tue, 4 Jul 2017 05:39:59 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id v60so45013269wrc.7 for ; Tue, 04 Jul 2017 02:39:59 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id z50si13966344wrz.25.2017.07.04.02.39.58 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 04 Jul 2017 02:39:58 -0700 (PDT) Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? References: <20170630141847.GN22917@dhcp22.suse.cz> <54336b9a-6dc7-890f-1900-c4188fb6cf1a@suse.cz> <20170704052304.GC28589@js1304-desktop> From: Vlastimil Babka Message-ID: <45268b53-88dc-4129-725c-df5849e494db@suse.cz> Date: Tue, 4 Jul 2017 11:39:57 +0200 MIME-Version: 1.0 In-Reply-To: <20170704052304.GC28589@js1304-desktop> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim Cc: Michal Hocko , Yang Shi , Mel Gorman , Andrew Morton , linux-mm@kvack.org, LKML On 07/04/2017 07:23 AM, Joonsoo Kim wrote: > On Mon, Jul 03, 2017 at 04:18:01PM +0200, Vlastimil Babka wrote: >> allocated" looks much more sane there. But there's a warning nevertheless. > > Warning would comes from the fact that drain_all_pages() is called > before mm_percpu_wq is initialised. We could remove WARN_ON_ONCE() and add > drain_local_page(zone) to fix the problem. Wouldn't that still leave some period during boot where kernel already runs on multiple CPU's, but mm_percpu_wq is not yet initialized and somebody tries to use it? We want to catch such cases, right? > Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id 227B46B0279 for ; Tue, 4 Jul 2017 06:53:52 -0400 (EDT) Received: by mail-wr0-f200.google.com with SMTP id v60so45174630wrc.7 for ; Tue, 04 Jul 2017 03:53:52 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id k186si20367249wmg.36.2017.07.04.03.53.44 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 04 Jul 2017 03:53:45 -0700 (PDT) Date: Tue, 4 Jul 2017 12:53:42 +0200 From: Michal Hocko Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? Message-ID: <20170704105342.GJ14722@dhcp22.suse.cz> References: <20170630141847.GN22917@dhcp22.suse.cz> <54336b9a-6dc7-890f-1900-c4188fb6cf1a@suse.cz> <20170704052304.GC28589@js1304-desktop> <45268b53-88dc-4129-725c-df5849e494db@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45268b53-88dc-4129-725c-df5849e494db@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: Joonsoo Kim , Yang Shi , Mel Gorman , Andrew Morton , linux-mm@kvack.org, LKML On Tue 04-07-17 11:39:57, Vlastimil Babka wrote: > On 07/04/2017 07:23 AM, Joonsoo Kim wrote: > > On Mon, Jul 03, 2017 at 04:18:01PM +0200, Vlastimil Babka wrote: > >> allocated" looks much more sane there. But there's a warning nevertheless. > > > > Warning would comes from the fact that drain_all_pages() is called > > before mm_percpu_wq is initialised. We could remove WARN_ON_ONCE() and add > > drain_local_page(zone) to fix the problem. > > Wouldn't that still leave some period during boot where kernel already > runs on multiple CPU's, but mm_percpu_wq is not yet initialized and > somebody tries to use it? We want to catch such cases, right? I haven't checked the boot sequence but if we know that we need mm_percpu_wq initialized earlier than moving it should be not a big deal I guess. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id 5B4076B0279 for ; Fri, 7 Jul 2017 08:00:07 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id g46so7430103wrd.3 for ; Fri, 07 Jul 2017 05:00:07 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id p91si2035920wrc.257.2017.07.07.05.00.05 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 07 Jul 2017 05:00:05 -0700 (PDT) Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? References: <20170630141847.GN22917@dhcp22.suse.cz> <54336b9a-6dc7-890f-1900-c4188fb6cf1a@suse.cz> <20170704051713.GB28589@js1304-desktop> From: Vlastimil Babka Message-ID: <31ca76ee-fd1a-236b-2b9d-fa205202c1ac@suse.cz> Date: Fri, 7 Jul 2017 14:00:03 +0200 MIME-Version: 1.0 In-Reply-To: <20170704051713.GB28589@js1304-desktop> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim Cc: Michal Hocko , Yang Shi , Mel Gorman , Andrew Morton , linux-mm@kvack.org, LKML On 07/04/2017 07:17 AM, Joonsoo Kim wrote: >> >> Still, backporting b8f1a75d61d8 fixes this: >> >> [ 1.538379] allocated 738197504 bytes of page_ext >> [ 1.539340] Node 0, zone DMA: page owner found early allocated 0 pages >> [ 1.540179] Node 0, zone DMA32: page owner found early allocated 33 pages >> [ 1.611173] Node 0, zone Normal: page owner found early allocated 96755 pages >> [ 1.683167] Node 1, zone Normal: page owner found early allocated 96575 pages >> >> No panic, notice how it allocated more for page_ext, and found smaller number of >> early allocated pages. >> >> Now backporting fe53ca54270a on top: >> >> [ 0.000000] allocated 738197504 bytes of page_ext >> [ 0.000000] Node 0, zone DMA: page owner found early allocated 0 pages >> [ 0.000000] Node 0, zone DMA32: page owner found early allocated 33 pages >> [ 0.000000] Node 0, zone Normal: page owner found early allocated 2842622 pages >> [ 0.000000] Node 1, zone Normal: page owner found early allocated 3694362 pages >> >> Again no panic, and same amount of page_ext usage. But the "early allocated" numbers >> seem bogus to me. I think it's because init_pages_in_zone() is running and inspecting >> struct pages that have not been yet initialized. It doesn't end up crashing, but >> still doesn't seem correct? > > Numbers looks sane to me. fe53ca54270a makes init_pages_in_zone() > called before page_alloc_init_late(). So, there would be many > uninitialized pages with PageReserved(). Page owner regarded these > PageReserved() page as allocated page. That seems incorrect for two reasons: - init_pages_in_zone() actually skips PageReserved() pages - the pages don't have PageReserved() flag, until the deferred struct page init thread processes them via deferred_init_memmap() -> __init_single_page() AFAICS Now I've found out why upstream reports much less early allocated pages than our kernel. We're missing 9d43f5aec950 ("mm/page_owner: add zone range overlapping check") which adds a "page_zone(page) != zone" check. I think this only works because the pages are not initialized and thus have no nid/zone links. Probably page_zone() only doesn't break because it's all zeroed. I don't think it's safe to rely on this? > We can change the message to "page owner found early reserved N pages" > > Thanks. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 7CFC9440905 for ; Fri, 14 Jul 2017 05:13:08 -0400 (EDT) Received: by mail-wm0-f72.google.com with SMTP id j85so8301037wmj.2 for ; Fri, 14 Jul 2017 02:13:08 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id h87si1807341wmi.137.2017.07.14.02.13.06 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 14 Jul 2017 02:13:07 -0700 (PDT) Date: Fri, 14 Jul 2017 11:13:04 +0200 From: Michal Hocko Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? Message-ID: <20170714091304.GC2618@dhcp22.suse.cz> References: <20170630141847.GN22917@dhcp22.suse.cz> <54336b9a-6dc7-890f-1900-c4188fb6cf1a@suse.cz> <20170704051713.GB28589@js1304-desktop> <31ca76ee-fd1a-236b-2b9d-fa205202c1ac@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <31ca76ee-fd1a-236b-2b9d-fa205202c1ac@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: Joonsoo Kim , Yang Shi , Mel Gorman , Andrew Morton , linux-mm@kvack.org, LKML On Fri 07-07-17 14:00:03, Vlastimil Babka wrote: > On 07/04/2017 07:17 AM, Joonsoo Kim wrote: > >> > >> Still, backporting b8f1a75d61d8 fixes this: > >> > >> [ 1.538379] allocated 738197504 bytes of page_ext > >> [ 1.539340] Node 0, zone DMA: page owner found early allocated 0 pages > >> [ 1.540179] Node 0, zone DMA32: page owner found early allocated 33 pages > >> [ 1.611173] Node 0, zone Normal: page owner found early allocated 96755 pages > >> [ 1.683167] Node 1, zone Normal: page owner found early allocated 96575 pages > >> > >> No panic, notice how it allocated more for page_ext, and found smaller number of > >> early allocated pages. > >> > >> Now backporting fe53ca54270a on top: > >> > >> [ 0.000000] allocated 738197504 bytes of page_ext > >> [ 0.000000] Node 0, zone DMA: page owner found early allocated 0 pages > >> [ 0.000000] Node 0, zone DMA32: page owner found early allocated 33 pages > >> [ 0.000000] Node 0, zone Normal: page owner found early allocated 2842622 pages > >> [ 0.000000] Node 1, zone Normal: page owner found early allocated 3694362 pages > >> > >> Again no panic, and same amount of page_ext usage. But the "early allocated" numbers > >> seem bogus to me. I think it's because init_pages_in_zone() is running and inspecting > >> struct pages that have not been yet initialized. It doesn't end up crashing, but > >> still doesn't seem correct? > > > > Numbers looks sane to me. fe53ca54270a makes init_pages_in_zone() > > called before page_alloc_init_late(). So, there would be many > > uninitialized pages with PageReserved(). Page owner regarded these > > PageReserved() page as allocated page. > > That seems incorrect for two reasons: > - init_pages_in_zone() actually skips PageReserved() pages > - the pages don't have PageReserved() flag, until the deferred struct page init > thread processes them via deferred_init_memmap() -> __init_single_page() AFAICS > > Now I've found out why upstream reports much less early allocated pages than our > kernel. We're missing 9d43f5aec950 ("mm/page_owner: add zone range overlapping > check") which adds a "page_zone(page) != zone" check. I think this only works > because the pages are not initialized and thus have no nid/zone links. Probably > page_zone() only doesn't break because it's all zeroed. I don't think it's safe > to rely on this? Yes, if anything PageReserved should be checked before the zone check. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id D2F92440905 for ; Fri, 14 Jul 2017 05:34:34 -0400 (EDT) Received: by mail-wm0-f71.google.com with SMTP id b11so8382345wmh.0 for ; Fri, 14 Jul 2017 02:34:34 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id u62si5961251wrc.312.2017.07.14.02.34.33 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 14 Jul 2017 02:34:33 -0700 (PDT) Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? References: <20170630141847.GN22917@dhcp22.suse.cz> <54336b9a-6dc7-890f-1900-c4188fb6cf1a@suse.cz> <20170704051713.GB28589@js1304-desktop> <31ca76ee-fd1a-236b-2b9d-fa205202c1ac@suse.cz> <20170714091304.GC2618@dhcp22.suse.cz> From: Vlastimil Babka Message-ID: <18f28347-af10-0726-5a62-0dd1afdbd2a9@suse.cz> Date: Fri, 14 Jul 2017 11:34:31 +0200 MIME-Version: 1.0 In-Reply-To: <20170714091304.GC2618@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Joonsoo Kim , Yang Shi , Mel Gorman , Andrew Morton , linux-mm@kvack.org, LKML On 07/14/2017 11:13 AM, Michal Hocko wrote: > On Fri 07-07-17 14:00:03, Vlastimil Babka wrote: >> On 07/04/2017 07:17 AM, Joonsoo Kim wrote: >>>> >>>> Still, backporting b8f1a75d61d8 fixes this: >>>> >>>> [ 1.538379] allocated 738197504 bytes of page_ext >>>> [ 1.539340] Node 0, zone DMA: page owner found early allocated 0 pages >>>> [ 1.540179] Node 0, zone DMA32: page owner found early allocated 33 pages >>>> [ 1.611173] Node 0, zone Normal: page owner found early allocated 96755 pages >>>> [ 1.683167] Node 1, zone Normal: page owner found early allocated 96575 pages >>>> >>>> No panic, notice how it allocated more for page_ext, and found smaller number of >>>> early allocated pages. >>>> >>>> Now backporting fe53ca54270a on top: >>>> >>>> [ 0.000000] allocated 738197504 bytes of page_ext >>>> [ 0.000000] Node 0, zone DMA: page owner found early allocated 0 pages >>>> [ 0.000000] Node 0, zone DMA32: page owner found early allocated 33 pages >>>> [ 0.000000] Node 0, zone Normal: page owner found early allocated 2842622 pages >>>> [ 0.000000] Node 1, zone Normal: page owner found early allocated 3694362 pages >>>> >>>> Again no panic, and same amount of page_ext usage. But the "early allocated" numbers >>>> seem bogus to me. I think it's because init_pages_in_zone() is running and inspecting >>>> struct pages that have not been yet initialized. It doesn't end up crashing, but >>>> still doesn't seem correct? >>> >>> Numbers looks sane to me. fe53ca54270a makes init_pages_in_zone() >>> called before page_alloc_init_late(). So, there would be many >>> uninitialized pages with PageReserved(). Page owner regarded these >>> PageReserved() page as allocated page. >> >> That seems incorrect for two reasons: >> - init_pages_in_zone() actually skips PageReserved() pages >> - the pages don't have PageReserved() flag, until the deferred struct page init >> thread processes them via deferred_init_memmap() -> __init_single_page() AFAICS >> >> Now I've found out why upstream reports much less early allocated pages than our >> kernel. We're missing 9d43f5aec950 ("mm/page_owner: add zone range overlapping >> check") which adds a "page_zone(page) != zone" check. I think this only works >> because the pages are not initialized and thus have no nid/zone links. Probably >> page_zone() only doesn't break because it's all zeroed. I don't think it's safe >> to rely on this? > > Yes, if anything PageReserved should be checked before the zone check. That wouldn't change anything, because we skip PageReserved and it's not set. Perhaps we could skip pages that have the raw page flags value zero, but then a) we should make sure that the allocation of the struct page array zeroes the range, and b) the first modification of struct page in the initialization is setting the PageReserved flag. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id 869EE440905 for ; Fri, 14 Jul 2017 07:35:15 -0400 (EDT) Received: by mail-wm0-f71.google.com with SMTP id t3so8801540wme.9 for ; Fri, 14 Jul 2017 04:35:15 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id h84si2031675wme.104.2017.07.14.04.35.13 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 14 Jul 2017 04:35:14 -0700 (PDT) Date: Fri, 14 Jul 2017 13:35:08 +0200 From: Michal Hocko Subject: Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations? Message-ID: <20170714113508.GH2618@dhcp22.suse.cz> References: <20170630141847.GN22917@dhcp22.suse.cz> <54336b9a-6dc7-890f-1900-c4188fb6cf1a@suse.cz> <20170704051713.GB28589@js1304-desktop> <31ca76ee-fd1a-236b-2b9d-fa205202c1ac@suse.cz> <20170714091304.GC2618@dhcp22.suse.cz> <18f28347-af10-0726-5a62-0dd1afdbd2a9@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <18f28347-af10-0726-5a62-0dd1afdbd2a9@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: Joonsoo Kim , Yang Shi , Mel Gorman , Andrew Morton , linux-mm@kvack.org, LKML On Fri 14-07-17 11:34:31, Vlastimil Babka wrote: > On 07/14/2017 11:13 AM, Michal Hocko wrote: > > On Fri 07-07-17 14:00:03, Vlastimil Babka wrote: > >> On 07/04/2017 07:17 AM, Joonsoo Kim wrote: > >>>> > >>>> Still, backporting b8f1a75d61d8 fixes this: > >>>> > >>>> [ 1.538379] allocated 738197504 bytes of page_ext > >>>> [ 1.539340] Node 0, zone DMA: page owner found early allocated 0 pages > >>>> [ 1.540179] Node 0, zone DMA32: page owner found early allocated 33 pages > >>>> [ 1.611173] Node 0, zone Normal: page owner found early allocated 96755 pages > >>>> [ 1.683167] Node 1, zone Normal: page owner found early allocated 96575 pages > >>>> > >>>> No panic, notice how it allocated more for page_ext, and found smaller number of > >>>> early allocated pages. > >>>> > >>>> Now backporting fe53ca54270a on top: > >>>> > >>>> [ 0.000000] allocated 738197504 bytes of page_ext > >>>> [ 0.000000] Node 0, zone DMA: page owner found early allocated 0 pages > >>>> [ 0.000000] Node 0, zone DMA32: page owner found early allocated 33 pages > >>>> [ 0.000000] Node 0, zone Normal: page owner found early allocated 2842622 pages > >>>> [ 0.000000] Node 1, zone Normal: page owner found early allocated 3694362 pages > >>>> > >>>> Again no panic, and same amount of page_ext usage. But the "early allocated" numbers > >>>> seem bogus to me. I think it's because init_pages_in_zone() is running and inspecting > >>>> struct pages that have not been yet initialized. It doesn't end up crashing, but > >>>> still doesn't seem correct? > >>> > >>> Numbers looks sane to me. fe53ca54270a makes init_pages_in_zone() > >>> called before page_alloc_init_late(). So, there would be many > >>> uninitialized pages with PageReserved(). Page owner regarded these > >>> PageReserved() page as allocated page. > >> > >> That seems incorrect for two reasons: > >> - init_pages_in_zone() actually skips PageReserved() pages > >> - the pages don't have PageReserved() flag, until the deferred struct page init > >> thread processes them via deferred_init_memmap() -> __init_single_page() AFAICS > >> > >> Now I've found out why upstream reports much less early allocated pages than our > >> kernel. We're missing 9d43f5aec950 ("mm/page_owner: add zone range overlapping > >> check") which adds a "page_zone(page) != zone" check. I think this only works > >> because the pages are not initialized and thus have no nid/zone links. Probably > >> page_zone() only doesn't break because it's all zeroed. I don't think it's safe > >> to rely on this? > > > > Yes, if anything PageReserved should be checked before the zone check. > > That wouldn't change anything, because we skip PageReserved and it's not > set. I thought they were still marked reserved from the bootmem allocator I would have to go through the initialization code again to be sure. > Perhaps we could skip pages that have the raw page flags value > zero, but then a) we should make sure that the allocation of the struct > page array zeroes the range, and b) the first modification of struct > page in the initialization is setting the PageReserved flag. I would rather not depend on the page state. There are plans to not initialize the struct page (even to 0 during memmap init) until __init_single_page. Either the page is fully initialized or we are touching invalid pfn range. end_pfn = pfn + zone->spanned_pages but I guess we should in fact consider first_deferred_pfn as well (calculate_node_totalpages is not deffered initialization aware). -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org