archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <>
	Joonsoo Kim <>,
	Andrew Morton <>,
	Michal Hocko <>,
	Mel Gorman <>,
	Yang Shi <>, Laura Abbott <>,
	Vinayak Menon <>,
	zhong jiang <>,
	Vlastimil Babka <>
Subject: [RFC PATCH 4/4] mm, page_ext: move page_ext_init() after page_alloc_init_late()
Date: Thu, 20 Jul 2017 15:40:29 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Commit b8f1a75d61d8 ("mm: call page_ext_init() after all struct pages are
initialized") has avoided a a NULL pointer dereference due to
DEFERRED_STRUCT_PAGE_INIT clashing with page_ext, by calling page_ext_init()
only after the deferred struct page init has finished. Later commit
fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init") avoided the
underlying issue differently and moved the page_ext_init() call back to where
it was before.

However, there are two problems with the current code:
- on very large machines, page_ext_init() may fail to allocate the page_ext
structures, because deferred struct page init hasn't yet started, and the
pre-inited part might be too small.
This has been observed with a 3TB machine with page_owner=on. Although it
was an older kernel where page_owner hasn't yet been converted to stack depot,
thus page_ext was larger, the fundamental problem is still in mainline.
- page_owner's init_pages_in_zone() is called before deferred struct page init
has started, so it will encounter unitialized struct pages. This currently
happens to cause no harm, because the memmap array is are pre-zeroed on
allocation and thus the "if (page_zone(page) != zone)" check is negative, but
that pre-zeroing guarantee might change soon.

The second problem could be also solved by limiting init_page_in_zone() by
pgdat->first_deferred_pfn, but fixing the first issue would be more
problematic. So this patch again moves page_ext_init() to wait for deferred
struct page init to finish. This has some performance implications for boot
time, which should be acceptable when enabling debugging functionality. We
however keep the benefits of parallel initialization (one kthread per node) so
it's better than e.g. disabling DEFERRED_STRUCT_PAGE_INIT completely when
page_ext is being used.

This effectively reverts commit fe53ca54270a757f0a28ee6bf3a54d952b550ed0.

Signed-off-by: Vlastimil Babka <>
 init/main.c   | 3 ++-
 mm/page_ext.c | 4 +---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/init/main.c b/init/main.c
index f866510472d7..7b6517fe0980 100644
--- a/init/main.c
+++ b/init/main.c
@@ -628,7 +628,6 @@ asmlinkage __visible void __init start_kernel(void)
 		initrd_start = 0;
-	page_ext_init();
@@ -1035,6 +1034,8 @@ static noinline void __init kernel_init_freeable(void)
+	/* Initialize page ext after all struct pages are initializaed */
+	page_ext_init();
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 24cf8abefc8d..8522ebd784ac 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -402,10 +402,8 @@ void __init page_ext_init(void)
 			 * We know some arch can have a nodes layout such as
 			 * -------------pfn-------------->
 			 * N0 | N1 | N2 | N0 | N1 | N2|....
-			 *
-			 * Take into account DEFERRED_STRUCT_PAGE_INIT.
-			if (early_pfn_to_nid(pfn) != nid)
+			if (pfn_to_nid(pfn) != nid)
 			if (init_section_page_ext(pfn, nid))
 				goto oom;

To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to  For more info on Linux MM,
see: .
Don't email: <a href=mailto:""> </a>

  parent reply	other threads:[~2017-07-20 13:40 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-20 13:40 [PATCH 0/4] page_ext/page_owner init fixes Vlastimil Babka
2017-07-20 13:40 ` [PATCH 1/4] mm, page_owner: make init_pages_in_zone() faster Vlastimil Babka
2017-07-24 12:38   ` Michal Hocko
2017-08-23  6:47     ` Vlastimil Babka
2017-08-24  7:01       ` Vlastimil Babka
2017-09-06 13:38         ` Michal Hocko
2017-08-31  7:55       ` Vlastimil Babka
2017-09-06 13:49         ` Michal Hocko
2017-09-06 13:55           ` Vlastimil Babka
2017-09-06 14:32             ` Michal Hocko
2017-07-20 13:40 ` [PATCH 2/4] mm, page_ext: periodically reschedule during page_ext_init() Vlastimil Babka
2017-07-24 12:45   ` Michal Hocko
2017-07-20 13:40 ` [PATCH 3/4] mm, page_owner: don't grab zone->lock for init_pages_in_zone() Vlastimil Babka
2017-07-24 12:50   ` Michal Hocko
2017-07-20 13:40 ` Vlastimil Babka [this message]
2017-07-24 13:06   ` [RFC PATCH 4/4] mm, page_ext: move page_ext_init() after page_alloc_init_late() Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).