All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] optimize memory hotplug
@ 2018-01-31 21:02 ` Pavel Tatashin
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Tatashin @ 2018-01-31 21:02 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

Changelog:
	v1 - v2
	Added struct page poisoning checking in order to verify that
	struct pages are never accessed until initialized during memory
	hotplug

Pavel Tatashin (2):
  mm: uninitialized struct page poisoning sanity checking
  mm, memory_hotplug: optimize memory hotplug

 drivers/base/memory.c          | 38 +++++++++++++++++++++-----------------
 drivers/base/node.c            | 17 ++++++++++-------
 include/linux/memory_hotplug.h |  2 ++
 include/linux/mm.h             |  4 +++-
 include/linux/node.h           |  4 ++--
 include/linux/page-flags.h     | 22 +++++++++++++++++-----
 mm/memblock.c                  |  2 +-
 mm/memory_hotplug.c            | 21 ++-------------------
 mm/page_alloc.c                | 28 ++++++++++------------------
 mm/sparse.c                    | 29 ++++++++++++++++++++++++++---
 10 files changed, 94 insertions(+), 73 deletions(-)

-- 
2.16.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 0/2] optimize memory hotplug
@ 2018-01-31 21:02 ` Pavel Tatashin
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Tatashin @ 2018-01-31 21:02 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

Changelog:
	v1 - v2
	Added struct page poisoning checking in order to verify that
	struct pages are never accessed until initialized during memory
	hotplug

Pavel Tatashin (2):
  mm: uninitialized struct page poisoning sanity checking
  mm, memory_hotplug: optimize memory hotplug

 drivers/base/memory.c          | 38 +++++++++++++++++++++-----------------
 drivers/base/node.c            | 17 ++++++++++-------
 include/linux/memory_hotplug.h |  2 ++
 include/linux/mm.h             |  4 +++-
 include/linux/node.h           |  4 ++--
 include/linux/page-flags.h     | 22 +++++++++++++++++-----
 mm/memblock.c                  |  2 +-
 mm/memory_hotplug.c            | 21 ++-------------------
 mm/page_alloc.c                | 28 ++++++++++------------------
 mm/sparse.c                    | 29 ++++++++++++++++++++++++++---
 10 files changed, 94 insertions(+), 73 deletions(-)

-- 
2.16.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
  2018-01-31 21:02 ` Pavel Tatashin
@ 2018-01-31 21:02   ` Pavel Tatashin
  -1 siblings, 0 replies; 19+ messages in thread
From: Pavel Tatashin @ 2018-01-31 21:02 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

During boot we poison struct page memory in order to ensure that no one is
accessing this memory until the struct pages are initialized in
__init_single_page().

This patch adds more scrutiny to this checking, by making sure that flags
do not equal to poison pattern when the are accessed. The pattern is all
ones.

Since, node id is also stored in struct page, and may be accessed quiet
early we add the enforcement into page_to_nid() function as well.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 include/linux/mm.h         |  4 +++-
 include/linux/page-flags.h | 22 +++++++++++++++++-----
 mm/memblock.c              |  2 +-
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 42b7154291e4..e0281de2cb13 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -896,7 +896,9 @@ extern int page_to_nid(const struct page *page);
 #else
 static inline int page_to_nid(const struct page *page)
 {
-	return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
+	struct page *p = (struct page *)page;
+
+	return (PF_POISONED_CHECK(p)->flags >> NODES_PGSHIFT) & NODES_MASK;
 }
 #endif
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 3ec44e27aa9d..743644b73359 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -161,9 +161,18 @@ static __always_inline int PageCompound(struct page *page)
 	return test_bit(PG_head, &page->flags) || PageTail(page);
 }
 
+#define	PAGE_POISON_PATTERN	~0ul
+static inline int PagePoisoned(const struct page *page)
+{
+	return page->flags == PAGE_POISON_PATTERN;
+}
+
 /*
  * Page flags policies wrt compound pages
  *
+ * PF_POISONED_CHECK
+ *     check if this struct page poisoned/uninitialized
+ *
  * PF_ANY:
  *     the page flag is relevant for small, head and tail pages.
  *
@@ -181,17 +190,20 @@ static __always_inline int PageCompound(struct page *page)
  * PF_NO_COMPOUND:
  *     the page flag is not relevant for compound pages.
  */
-#define PF_ANY(page, enforce)	page
-#define PF_HEAD(page, enforce)	compound_head(page)
+#define PF_POISONED_CHECK(page) ({					\
+		VM_BUG_ON_PGFLAGS(PagePoisoned(page), page);		\
+		page; })
+#define PF_ANY(page, enforce)	PF_POISONED_CHECK(page)
+#define PF_HEAD(page, enforce)	PF_POISONED_CHECK(compound_head(page))
 #define PF_ONLY_HEAD(page, enforce) ({					\
 		VM_BUG_ON_PGFLAGS(PageTail(page), page);		\
-		page;})
+		PF_POISONED_CHECK(page); })
 #define PF_NO_TAIL(page, enforce) ({					\
 		VM_BUG_ON_PGFLAGS(enforce && PageTail(page), page);	\
-		compound_head(page);})
+		PF_POISONED_CHECK(compound_head(page)); })
 #define PF_NO_COMPOUND(page, enforce) ({				\
 		VM_BUG_ON_PGFLAGS(enforce && PageCompound(page), page);	\
-		page;})
+		PF_POISONED_CHECK(page); })
 
 /*
  * Macros to create function definitions for page flags
diff --git a/mm/memblock.c b/mm/memblock.c
index 5a9ca2a1751b..d85c8754e0ce 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1373,7 +1373,7 @@ void * __init memblock_virt_alloc_try_nid_raw(
 					   min_addr, max_addr, nid);
 #ifdef CONFIG_DEBUG_VM
 	if (ptr && size > 0)
-		memset(ptr, 0xff, size);
+		memset(ptr, PAGE_POISON_PATTERN, size);
 #endif
 	return ptr;
 }
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
@ 2018-01-31 21:02   ` Pavel Tatashin
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Tatashin @ 2018-01-31 21:02 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

During boot we poison struct page memory in order to ensure that no one is
accessing this memory until the struct pages are initialized in
__init_single_page().

This patch adds more scrutiny to this checking, by making sure that flags
do not equal to poison pattern when the are accessed. The pattern is all
ones.

Since, node id is also stored in struct page, and may be accessed quiet
early we add the enforcement into page_to_nid() function as well.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 include/linux/mm.h         |  4 +++-
 include/linux/page-flags.h | 22 +++++++++++++++++-----
 mm/memblock.c              |  2 +-
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 42b7154291e4..e0281de2cb13 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -896,7 +896,9 @@ extern int page_to_nid(const struct page *page);
 #else
 static inline int page_to_nid(const struct page *page)
 {
-	return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
+	struct page *p = (struct page *)page;
+
+	return (PF_POISONED_CHECK(p)->flags >> NODES_PGSHIFT) & NODES_MASK;
 }
 #endif
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 3ec44e27aa9d..743644b73359 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -161,9 +161,18 @@ static __always_inline int PageCompound(struct page *page)
 	return test_bit(PG_head, &page->flags) || PageTail(page);
 }
 
+#define	PAGE_POISON_PATTERN	~0ul
+static inline int PagePoisoned(const struct page *page)
+{
+	return page->flags == PAGE_POISON_PATTERN;
+}
+
 /*
  * Page flags policies wrt compound pages
  *
+ * PF_POISONED_CHECK
+ *     check if this struct page poisoned/uninitialized
+ *
  * PF_ANY:
  *     the page flag is relevant for small, head and tail pages.
  *
@@ -181,17 +190,20 @@ static __always_inline int PageCompound(struct page *page)
  * PF_NO_COMPOUND:
  *     the page flag is not relevant for compound pages.
  */
-#define PF_ANY(page, enforce)	page
-#define PF_HEAD(page, enforce)	compound_head(page)
+#define PF_POISONED_CHECK(page) ({					\
+		VM_BUG_ON_PGFLAGS(PagePoisoned(page), page);		\
+		page; })
+#define PF_ANY(page, enforce)	PF_POISONED_CHECK(page)
+#define PF_HEAD(page, enforce)	PF_POISONED_CHECK(compound_head(page))
 #define PF_ONLY_HEAD(page, enforce) ({					\
 		VM_BUG_ON_PGFLAGS(PageTail(page), page);		\
-		page;})
+		PF_POISONED_CHECK(page); })
 #define PF_NO_TAIL(page, enforce) ({					\
 		VM_BUG_ON_PGFLAGS(enforce && PageTail(page), page);	\
-		compound_head(page);})
+		PF_POISONED_CHECK(compound_head(page)); })
 #define PF_NO_COMPOUND(page, enforce) ({				\
 		VM_BUG_ON_PGFLAGS(enforce && PageCompound(page), page);	\
-		page;})
+		PF_POISONED_CHECK(page); })
 
 /*
  * Macros to create function definitions for page flags
diff --git a/mm/memblock.c b/mm/memblock.c
index 5a9ca2a1751b..d85c8754e0ce 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1373,7 +1373,7 @@ void * __init memblock_virt_alloc_try_nid_raw(
 					   min_addr, max_addr, nid);
 #ifdef CONFIG_DEBUG_VM
 	if (ptr && size > 0)
-		memset(ptr, 0xff, size);
+		memset(ptr, PAGE_POISON_PATTERN, size);
 #endif
 	return ptr;
 }
-- 
2.16.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/2] mm, memory_hotplug: optimize memory hotplug
  2018-01-31 21:02 ` Pavel Tatashin
@ 2018-01-31 21:03   ` Pavel Tatashin
  -1 siblings, 0 replies; 19+ messages in thread
From: Pavel Tatashin @ 2018-01-31 21:03 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

This patch was inspired by the discussion of this problem:
http://lkml.kernel.org/r/20180130083006.GB1245@in.ibm.com

Currently, during memory hotplugging we traverse struct pages several
times:

1. memset(0) in sparse_add_one_section()
2. loop in __add_section() to set do: set_page_node(page, nid); and
   SetPageReserved(page);
3. loop in pages_correctly_reserved() to check that SetPageReserved is set.
4. loop in memmap_init_zone() to call __init_single_pfn()

This patch removes loops 1, 2, and 3 and only leaves the loop 4, where all
struct page fields are initialized in one go, the same as it is now done
during boot.

The benefits:
- We improve the memory hotplug performance because we are not evicting
  cache several times and also reduce loop branching overheads.

- Remove condition from hotpath in __init_single_pfn(), that was added in
  order to fix the problem that was reported by Bharata in the above email
  thread, thus also improve the performance during normal boot.

- Make memory hotplug more similar to boot memory initialization path
  because we zero and initialize struct pages only in one function.

- Simplifies memory hotplug strut page initialization code, and thus
  enables future improvements, such as multi-threading the initialization
  of struct pages in order to improve the hotplug performance even further
  on larger machines.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 drivers/base/memory.c          | 38 +++++++++++++++++++++-----------------
 drivers/base/node.c            | 17 ++++++++++-------
 include/linux/memory_hotplug.h |  2 ++
 include/linux/node.h           |  4 ++--
 mm/memory_hotplug.c            | 21 ++-------------------
 mm/page_alloc.c                | 28 ++++++++++------------------
 mm/sparse.c                    | 29 ++++++++++++++++++++++++++---
 7 files changed, 73 insertions(+), 66 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index fe4b24f05f6a..a14fb0cd424a 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -187,13 +187,14 @@ int memory_isolate_notify(unsigned long val, void *v)
 }
 
 /*
- * The probe routines leave the pages reserved, just as the bootmem code does.
- * Make sure they're still that way.
+ * The probe routines leave the pages uninitialized, just as the bootmem code
+ * does. Make sure we do not access them, but instead use only information from
+ * within sections.
  */
-static bool pages_correctly_reserved(unsigned long start_pfn)
+static bool pages_correctly_probed(unsigned long start_pfn)
 {
-	int i, j;
-	struct page *page;
+	unsigned long section_nr = pfn_to_section_nr(start_pfn);
+	unsigned long section_nr_end = section_nr + sections_per_block;
 	unsigned long pfn = start_pfn;
 
 	/*
@@ -201,21 +202,24 @@ static bool pages_correctly_reserved(unsigned long start_pfn)
 	 * SPARSEMEM_VMEMMAP. We lookup the page once per section
 	 * and assume memmap is contiguous within each section
 	 */
-	for (i = 0; i < sections_per_block; i++, pfn += PAGES_PER_SECTION) {
+	for (; section_nr < section_nr_end; section_nr++) {
 		if (WARN_ON_ONCE(!pfn_valid(pfn)))
 			return false;
-		page = pfn_to_page(pfn);
-
-		for (j = 0; j < PAGES_PER_SECTION; j++) {
-			if (PageReserved(page + j))
-				continue;
-
-			printk(KERN_WARNING "section number %ld page number %d "
-				"not reserved, was it already online?\n",
-				pfn_to_section_nr(pfn), j);
 
+		if (!present_section_nr(section_nr)) {
+			pr_warn("section %ld pfn[%lx, %lx) not present",
+				section_nr, pfn, pfn + PAGES_PER_SECTION);
+			return false;
+		} else if (!valid_section_nr(section_nr)) {
+			pr_warn("section %ld pfn[%lx, %lx) no valid memmap",
+				section_nr, pfn, pfn + PAGES_PER_SECTION);
+			return false;
+		} else if (online_section_nr(section_nr)) {
+			pr_warn("section %ld pfn[%lx, %lx) is already online",
+				section_nr, pfn, pfn + PAGES_PER_SECTION);
 			return false;
 		}
+		pfn += PAGES_PER_SECTION;
 	}
 
 	return true;
@@ -237,7 +241,7 @@ memory_block_action(unsigned long phys_index, unsigned long action, int online_t
 
 	switch (action) {
 	case MEM_ONLINE:
-		if (!pages_correctly_reserved(start_pfn))
+		if (!pages_correctly_probed(start_pfn))
 			return -EBUSY;
 
 		ret = online_pages(start_pfn, nr_pages, online_type);
@@ -727,7 +731,7 @@ int register_new_memory(int nid, struct mem_section *section)
 	}
 
 	if (mem->section_count == sections_per_block)
-		ret = register_mem_sect_under_node(mem, nid);
+		ret = register_mem_sect_under_node(mem, nid, false);
 out:
 	mutex_unlock(&mem_sysfs_mutex);
 	return ret;
diff --git a/drivers/base/node.c b/drivers/base/node.c
index ee090ab9171c..f1c0c130ac88 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -397,7 +397,8 @@ static int __ref get_nid_for_pfn(unsigned long pfn)
 }
 
 /* register memory section under specified node if it spans that node */
-int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
+int register_mem_sect_under_node(struct memory_block *mem_blk, int nid,
+				 bool check_nid)
 {
 	int ret;
 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
@@ -423,11 +424,13 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
 			continue;
 		}
 
-		page_nid = get_nid_for_pfn(pfn);
-		if (page_nid < 0)
-			continue;
-		if (page_nid != nid)
-			continue;
+		if (check_nid) {
+			page_nid = get_nid_for_pfn(pfn);
+			if (page_nid < 0)
+				continue;
+			if (page_nid != nid)
+				continue;
+		}
 		ret = sysfs_create_link_nowarn(&node_devices[nid]->dev.kobj,
 					&mem_blk->dev.kobj,
 					kobject_name(&mem_blk->dev.kobj));
@@ -502,7 +505,7 @@ int link_mem_sections(int nid, unsigned long start_pfn, unsigned long nr_pages)
 
 		mem_blk = find_memory_block_hinted(mem_sect, mem_blk);
 
-		ret = register_mem_sect_under_node(mem_blk, nid);
+		ret = register_mem_sect_under_node(mem_blk, nid, true);
 		if (!err)
 			err = ret;
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 787411bdeadb..13acf181fb4b 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -234,6 +234,8 @@ void put_online_mems(void);
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
+int get_section_nid(unsigned long section_nr);
+
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 #define pfn_to_online_page(pfn)			\
 ({						\
diff --git a/include/linux/node.h b/include/linux/node.h
index 4ece0fee0ffc..41f171861dcc 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -67,7 +67,7 @@ extern void unregister_one_node(int nid);
 extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int register_mem_sect_under_node(struct memory_block *mem_blk,
-						int nid);
+						int nid, bool check_nid);
 extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
 					   unsigned long phys_index);
 
@@ -97,7 +97,7 @@ static inline int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
 	return 0;
 }
 static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
-							int nid)
+							int nid, bool check_nid)
 {
 	return 0;
 }
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6a9bee33ffa7..e7d11a14b1d1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -250,7 +250,6 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 		struct vmem_altmap *altmap, bool want_memblock)
 {
 	int ret;
-	int i;
 
 	if (pfn_valid(phys_start_pfn))
 		return -EEXIST;
@@ -259,23 +258,6 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 	if (ret < 0)
 		return ret;
 
-	/*
-	 * Make all the pages reserved so that nobody will stumble over half
-	 * initialized state.
-	 * FIXME: We also have to associate it with a node because page_to_nid
-	 * relies on having page with the proper node.
-	 */
-	for (i = 0; i < PAGES_PER_SECTION; i++) {
-		unsigned long pfn = phys_start_pfn + i;
-		struct page *page;
-		if (!pfn_valid(pfn))
-			continue;
-
-		page = pfn_to_page(pfn);
-		set_page_node(page, nid);
-		SetPageReserved(page);
-	}
-
 	if (!want_memblock)
 		return 0;
 
@@ -913,7 +895,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	int ret;
 	struct memory_notify arg;
 
-	nid = pfn_to_nid(pfn);
+	nid = get_section_nid(pfn_to_section_nr(pfn));
+
 	/* associate pfn range with the zone */
 	zone = move_pfn_range(online_type, nid, pfn, nr_pages);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e2b42f603b1a..6af4c2cb88f6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1178,10 +1178,9 @@ static void free_one_page(struct zone *zone,
 }
 
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
-				unsigned long zone, int nid, bool zero)
+				unsigned long zone, int nid)
 {
-	if (zero)
-		mm_zero_struct_page(page);
+	mm_zero_struct_page(page);
 	set_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
@@ -1195,12 +1194,6 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 #endif
 }
 
-static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
-					int nid, bool zero)
-{
-	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid, zero);
-}
-
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 static void __meminit init_reserved_page(unsigned long pfn)
 {
@@ -1219,7 +1212,7 @@ static void __meminit init_reserved_page(unsigned long pfn)
 		if (pfn >= zone->zone_start_pfn && pfn < zone_end_pfn(zone))
 			break;
 	}
-	__init_single_pfn(pfn, zid, nid, true);
+	__init_single_page(pfn_to_page(pfn), pfn, zid, nid);
 }
 #else
 static inline void init_reserved_page(unsigned long pfn)
@@ -1536,7 +1529,7 @@ static unsigned long  __init deferred_init_pages(int nid, int zid,
 		} else {
 			page++;
 		}
-		__init_single_page(page, pfn, zid, nid, true);
+		__init_single_page(page, pfn, zid, nid);
 		nr_pages++;
 	}
 	return (nr_pages);
@@ -5329,6 +5322,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	pg_data_t *pgdat = NODE_DATA(nid);
 	unsigned long pfn;
 	unsigned long nr_initialised = 0;
+	struct page *page;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 	struct memblock_region *r = NULL, *tmp;
 #endif
@@ -5390,6 +5384,11 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 #endif
 
 not_early:
+		page = pfn_to_page(pfn);
+		__init_single_page(page, pfn, zone, nid);
+		if (context == MEMMAP_HOTPLUG)
+			SetPageReserved(page);
+
 		/*
 		 * Mark the block movable so that blocks are reserved for
 		 * movable at startup. This will force kernel allocations
@@ -5406,15 +5405,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		 * because this is done early in sparse_add_one_section
 		 */
 		if (!(pfn & (pageblock_nr_pages - 1))) {
-			struct page *page = pfn_to_page(pfn);
-
-			__init_single_page(page, pfn, zone, nid,
-					context != MEMMAP_HOTPLUG);
 			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 			cond_resched();
-		} else {
-			__init_single_pfn(pfn, zone, nid,
-					context != MEMMAP_HOTPLUG);
 		}
 	}
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index 7af5e7a92528..d7808307023b 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -30,11 +30,14 @@ struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]
 #endif
 EXPORT_SYMBOL(mem_section);
 
-#ifdef NODE_NOT_IN_PAGE_FLAGS
+#if defined(NODE_NOT_IN_PAGE_FLAGS) || defined(CONFIG_MEMORY_HOTPLUG)
 /*
  * If we did not store the node number in the page then we have to
  * do a lookup in the section_to_node_table in order to find which
  * node the page belongs to.
+ *
+ * We also use this data in case memory hotplugging is enabled to be
+ * able to determine nid while struct pages are not yet initialized.
  */
 #if MAX_NUMNODES <= 256
 static u8 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned;
@@ -42,17 +45,28 @@ static u8 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned;
 static u16 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned;
 #endif
 
+#ifdef NODE_NOT_IN_PAGE_FLAGS
 int page_to_nid(const struct page *page)
 {
 	return section_to_node_table[page_to_section(page)];
 }
 EXPORT_SYMBOL(page_to_nid);
+#endif /* NODE_NOT_IN_PAGE_FLAGS */
 
 static void set_section_nid(unsigned long section_nr, int nid)
 {
 	section_to_node_table[section_nr] = nid;
 }
-#else /* !NODE_NOT_IN_PAGE_FLAGS */
+
+/* Return NID for given section number */
+int get_section_nid(unsigned long section_nr)
+{
+	if (WARN_ON(section_nr >= NR_MEM_SECTIONS))
+		return 0;
+	return section_to_node_table[section_nr];
+}
+EXPORT_SYMBOL(get_section_nid);
+#else /* ! (NODE_NOT_IN_PAGE_FLAGS || CONFIG_MEMORY_HOTPLUG) */
 static inline void set_section_nid(unsigned long section_nr, int nid)
 {
 }
@@ -816,7 +830,14 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
 		goto out;
 	}
 
-	memset(memmap, 0, sizeof(struct page) * PAGES_PER_SECTION);
+#ifdef CONFIG_DEBUG_VM
+	/*
+	 * poison uninitialized struct pages in order to catch invalid flags
+	 * combinations.
+	 */
+	memset(memmap, PAGE_POISON_PATTERN,
+	       sizeof(struct page) * PAGES_PER_SECTION);
+#endif
 
 	section_mark_present(ms);
 
@@ -827,6 +848,8 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
 	if (ret <= 0) {
 		kfree(usemap);
 		__kfree_section_memmap(memmap, altmap);
+	} else {
+		set_section_nid(section_nr, pgdat->node_id);
 	}
 	return ret;
 }
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/2] mm, memory_hotplug: optimize memory hotplug
@ 2018-01-31 21:03   ` Pavel Tatashin
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Tatashin @ 2018-01-31 21:03 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

This patch was inspired by the discussion of this problem:
http://lkml.kernel.org/r/20180130083006.GB1245@in.ibm.com

Currently, during memory hotplugging we traverse struct pages several
times:

1. memset(0) in sparse_add_one_section()
2. loop in __add_section() to set do: set_page_node(page, nid); and
   SetPageReserved(page);
3. loop in pages_correctly_reserved() to check that SetPageReserved is set.
4. loop in memmap_init_zone() to call __init_single_pfn()

This patch removes loops 1, 2, and 3 and only leaves the loop 4, where all
struct page fields are initialized in one go, the same as it is now done
during boot.

The benefits:
- We improve the memory hotplug performance because we are not evicting
  cache several times and also reduce loop branching overheads.

- Remove condition from hotpath in __init_single_pfn(), that was added in
  order to fix the problem that was reported by Bharata in the above email
  thread, thus also improve the performance during normal boot.

- Make memory hotplug more similar to boot memory initialization path
  because we zero and initialize struct pages only in one function.

- Simplifies memory hotplug strut page initialization code, and thus
  enables future improvements, such as multi-threading the initialization
  of struct pages in order to improve the hotplug performance even further
  on larger machines.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 drivers/base/memory.c          | 38 +++++++++++++++++++++-----------------
 drivers/base/node.c            | 17 ++++++++++-------
 include/linux/memory_hotplug.h |  2 ++
 include/linux/node.h           |  4 ++--
 mm/memory_hotplug.c            | 21 ++-------------------
 mm/page_alloc.c                | 28 ++++++++++------------------
 mm/sparse.c                    | 29 ++++++++++++++++++++++++++---
 7 files changed, 73 insertions(+), 66 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index fe4b24f05f6a..a14fb0cd424a 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -187,13 +187,14 @@ int memory_isolate_notify(unsigned long val, void *v)
 }
 
 /*
- * The probe routines leave the pages reserved, just as the bootmem code does.
- * Make sure they're still that way.
+ * The probe routines leave the pages uninitialized, just as the bootmem code
+ * does. Make sure we do not access them, but instead use only information from
+ * within sections.
  */
-static bool pages_correctly_reserved(unsigned long start_pfn)
+static bool pages_correctly_probed(unsigned long start_pfn)
 {
-	int i, j;
-	struct page *page;
+	unsigned long section_nr = pfn_to_section_nr(start_pfn);
+	unsigned long section_nr_end = section_nr + sections_per_block;
 	unsigned long pfn = start_pfn;
 
 	/*
@@ -201,21 +202,24 @@ static bool pages_correctly_reserved(unsigned long start_pfn)
 	 * SPARSEMEM_VMEMMAP. We lookup the page once per section
 	 * and assume memmap is contiguous within each section
 	 */
-	for (i = 0; i < sections_per_block; i++, pfn += PAGES_PER_SECTION) {
+	for (; section_nr < section_nr_end; section_nr++) {
 		if (WARN_ON_ONCE(!pfn_valid(pfn)))
 			return false;
-		page = pfn_to_page(pfn);
-
-		for (j = 0; j < PAGES_PER_SECTION; j++) {
-			if (PageReserved(page + j))
-				continue;
-
-			printk(KERN_WARNING "section number %ld page number %d "
-				"not reserved, was it already online?\n",
-				pfn_to_section_nr(pfn), j);
 
+		if (!present_section_nr(section_nr)) {
+			pr_warn("section %ld pfn[%lx, %lx) not present",
+				section_nr, pfn, pfn + PAGES_PER_SECTION);
+			return false;
+		} else if (!valid_section_nr(section_nr)) {
+			pr_warn("section %ld pfn[%lx, %lx) no valid memmap",
+				section_nr, pfn, pfn + PAGES_PER_SECTION);
+			return false;
+		} else if (online_section_nr(section_nr)) {
+			pr_warn("section %ld pfn[%lx, %lx) is already online",
+				section_nr, pfn, pfn + PAGES_PER_SECTION);
 			return false;
 		}
+		pfn += PAGES_PER_SECTION;
 	}
 
 	return true;
@@ -237,7 +241,7 @@ memory_block_action(unsigned long phys_index, unsigned long action, int online_t
 
 	switch (action) {
 	case MEM_ONLINE:
-		if (!pages_correctly_reserved(start_pfn))
+		if (!pages_correctly_probed(start_pfn))
 			return -EBUSY;
 
 		ret = online_pages(start_pfn, nr_pages, online_type);
@@ -727,7 +731,7 @@ int register_new_memory(int nid, struct mem_section *section)
 	}
 
 	if (mem->section_count == sections_per_block)
-		ret = register_mem_sect_under_node(mem, nid);
+		ret = register_mem_sect_under_node(mem, nid, false);
 out:
 	mutex_unlock(&mem_sysfs_mutex);
 	return ret;
diff --git a/drivers/base/node.c b/drivers/base/node.c
index ee090ab9171c..f1c0c130ac88 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -397,7 +397,8 @@ static int __ref get_nid_for_pfn(unsigned long pfn)
 }
 
 /* register memory section under specified node if it spans that node */
-int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
+int register_mem_sect_under_node(struct memory_block *mem_blk, int nid,
+				 bool check_nid)
 {
 	int ret;
 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
@@ -423,11 +424,13 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
 			continue;
 		}
 
-		page_nid = get_nid_for_pfn(pfn);
-		if (page_nid < 0)
-			continue;
-		if (page_nid != nid)
-			continue;
+		if (check_nid) {
+			page_nid = get_nid_for_pfn(pfn);
+			if (page_nid < 0)
+				continue;
+			if (page_nid != nid)
+				continue;
+		}
 		ret = sysfs_create_link_nowarn(&node_devices[nid]->dev.kobj,
 					&mem_blk->dev.kobj,
 					kobject_name(&mem_blk->dev.kobj));
@@ -502,7 +505,7 @@ int link_mem_sections(int nid, unsigned long start_pfn, unsigned long nr_pages)
 
 		mem_blk = find_memory_block_hinted(mem_sect, mem_blk);
 
-		ret = register_mem_sect_under_node(mem_blk, nid);
+		ret = register_mem_sect_under_node(mem_blk, nid, true);
 		if (!err)
 			err = ret;
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 787411bdeadb..13acf181fb4b 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -234,6 +234,8 @@ void put_online_mems(void);
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
+int get_section_nid(unsigned long section_nr);
+
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 #define pfn_to_online_page(pfn)			\
 ({						\
diff --git a/include/linux/node.h b/include/linux/node.h
index 4ece0fee0ffc..41f171861dcc 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -67,7 +67,7 @@ extern void unregister_one_node(int nid);
 extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int register_mem_sect_under_node(struct memory_block *mem_blk,
-						int nid);
+						int nid, bool check_nid);
 extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
 					   unsigned long phys_index);
 
@@ -97,7 +97,7 @@ static inline int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
 	return 0;
 }
 static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
-							int nid)
+							int nid, bool check_nid)
 {
 	return 0;
 }
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6a9bee33ffa7..e7d11a14b1d1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -250,7 +250,6 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 		struct vmem_altmap *altmap, bool want_memblock)
 {
 	int ret;
-	int i;
 
 	if (pfn_valid(phys_start_pfn))
 		return -EEXIST;
@@ -259,23 +258,6 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 	if (ret < 0)
 		return ret;
 
-	/*
-	 * Make all the pages reserved so that nobody will stumble over half
-	 * initialized state.
-	 * FIXME: We also have to associate it with a node because page_to_nid
-	 * relies on having page with the proper node.
-	 */
-	for (i = 0; i < PAGES_PER_SECTION; i++) {
-		unsigned long pfn = phys_start_pfn + i;
-		struct page *page;
-		if (!pfn_valid(pfn))
-			continue;
-
-		page = pfn_to_page(pfn);
-		set_page_node(page, nid);
-		SetPageReserved(page);
-	}
-
 	if (!want_memblock)
 		return 0;
 
@@ -913,7 +895,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	int ret;
 	struct memory_notify arg;
 
-	nid = pfn_to_nid(pfn);
+	nid = get_section_nid(pfn_to_section_nr(pfn));
+
 	/* associate pfn range with the zone */
 	zone = move_pfn_range(online_type, nid, pfn, nr_pages);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e2b42f603b1a..6af4c2cb88f6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1178,10 +1178,9 @@ static void free_one_page(struct zone *zone,
 }
 
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
-				unsigned long zone, int nid, bool zero)
+				unsigned long zone, int nid)
 {
-	if (zero)
-		mm_zero_struct_page(page);
+	mm_zero_struct_page(page);
 	set_page_links(page, zone, nid, pfn);
 	init_page_count(page);
 	page_mapcount_reset(page);
@@ -1195,12 +1194,6 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
 #endif
 }
 
-static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
-					int nid, bool zero)
-{
-	return __init_single_page(pfn_to_page(pfn), pfn, zone, nid, zero);
-}
-
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 static void __meminit init_reserved_page(unsigned long pfn)
 {
@@ -1219,7 +1212,7 @@ static void __meminit init_reserved_page(unsigned long pfn)
 		if (pfn >= zone->zone_start_pfn && pfn < zone_end_pfn(zone))
 			break;
 	}
-	__init_single_pfn(pfn, zid, nid, true);
+	__init_single_page(pfn_to_page(pfn), pfn, zid, nid);
 }
 #else
 static inline void init_reserved_page(unsigned long pfn)
@@ -1536,7 +1529,7 @@ static unsigned long  __init deferred_init_pages(int nid, int zid,
 		} else {
 			page++;
 		}
-		__init_single_page(page, pfn, zid, nid, true);
+		__init_single_page(page, pfn, zid, nid);
 		nr_pages++;
 	}
 	return (nr_pages);
@@ -5329,6 +5322,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	pg_data_t *pgdat = NODE_DATA(nid);
 	unsigned long pfn;
 	unsigned long nr_initialised = 0;
+	struct page *page;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 	struct memblock_region *r = NULL, *tmp;
 #endif
@@ -5390,6 +5384,11 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 #endif
 
 not_early:
+		page = pfn_to_page(pfn);
+		__init_single_page(page, pfn, zone, nid);
+		if (context == MEMMAP_HOTPLUG)
+			SetPageReserved(page);
+
 		/*
 		 * Mark the block movable so that blocks are reserved for
 		 * movable at startup. This will force kernel allocations
@@ -5406,15 +5405,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		 * because this is done early in sparse_add_one_section
 		 */
 		if (!(pfn & (pageblock_nr_pages - 1))) {
-			struct page *page = pfn_to_page(pfn);
-
-			__init_single_page(page, pfn, zone, nid,
-					context != MEMMAP_HOTPLUG);
 			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 			cond_resched();
-		} else {
-			__init_single_pfn(pfn, zone, nid,
-					context != MEMMAP_HOTPLUG);
 		}
 	}
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index 7af5e7a92528..d7808307023b 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -30,11 +30,14 @@ struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]
 #endif
 EXPORT_SYMBOL(mem_section);
 
-#ifdef NODE_NOT_IN_PAGE_FLAGS
+#if defined(NODE_NOT_IN_PAGE_FLAGS) || defined(CONFIG_MEMORY_HOTPLUG)
 /*
  * If we did not store the node number in the page then we have to
  * do a lookup in the section_to_node_table in order to find which
  * node the page belongs to.
+ *
+ * We also use this data in case memory hotplugging is enabled to be
+ * able to determine nid while struct pages are not yet initialized.
  */
 #if MAX_NUMNODES <= 256
 static u8 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned;
@@ -42,17 +45,28 @@ static u8 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned;
 static u16 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned;
 #endif
 
+#ifdef NODE_NOT_IN_PAGE_FLAGS
 int page_to_nid(const struct page *page)
 {
 	return section_to_node_table[page_to_section(page)];
 }
 EXPORT_SYMBOL(page_to_nid);
+#endif /* NODE_NOT_IN_PAGE_FLAGS */
 
 static void set_section_nid(unsigned long section_nr, int nid)
 {
 	section_to_node_table[section_nr] = nid;
 }
-#else /* !NODE_NOT_IN_PAGE_FLAGS */
+
+/* Return NID for given section number */
+int get_section_nid(unsigned long section_nr)
+{
+	if (WARN_ON(section_nr >= NR_MEM_SECTIONS))
+		return 0;
+	return section_to_node_table[section_nr];
+}
+EXPORT_SYMBOL(get_section_nid);
+#else /* ! (NODE_NOT_IN_PAGE_FLAGS || CONFIG_MEMORY_HOTPLUG) */
 static inline void set_section_nid(unsigned long section_nr, int nid)
 {
 }
@@ -816,7 +830,14 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
 		goto out;
 	}
 
-	memset(memmap, 0, sizeof(struct page) * PAGES_PER_SECTION);
+#ifdef CONFIG_DEBUG_VM
+	/*
+	 * poison uninitialized struct pages in order to catch invalid flags
+	 * combinations.
+	 */
+	memset(memmap, PAGE_POISON_PATTERN,
+	       sizeof(struct page) * PAGES_PER_SECTION);
+#endif
 
 	section_mark_present(ms);
 
@@ -827,6 +848,8 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
 	if (ret <= 0) {
 		kfree(usemap);
 		__kfree_section_memmap(memmap, altmap);
+	} else {
+		set_section_nid(section_nr, pgdat->node_id);
 	}
 	return ret;
 }
-- 
2.16.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
  2018-01-31 21:02   ` Pavel Tatashin
  (?)
@ 2018-02-02  4:12   ` kbuild test robot
  -1 siblings, 0 replies; 19+ messages in thread
From: kbuild test robot @ 2018-02-02  4:12 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: kbuild-all, steven.sistare, daniel.m.jordan, akpm, mgorman,
	mhocko, linux-mm, linux-kernel, gregkh, vbabka, bharata

[-- Attachment #1: Type: text/plain, Size: 1674 bytes --]

Hi Pavel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mmotm/master]
[also build test WARNING on v4.15 next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pavel-Tatashin/mm-uninitialized-struct-page-poisoning-sanity-checking/20180202-105827
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-x013-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/page_ref.h:7:0,
                    from include/linux/mm.h:26,
                    from include/linux/memblock.h:18,
                    from mm/memblock.c:21:
   mm/memblock.c: In function 'memblock_virt_alloc_try_nid_raw':
>> include/linux/page-flags.h:159:29: warning: overflow in implicit constant conversion [-Woverflow]
    #define PAGE_POISON_PATTERN ~0ul
                                ^
>> mm/memblock.c:1376:15: note: in expansion of macro 'PAGE_POISON_PATTERN'
      memset(ptr, PAGE_POISON_PATTERN, size);
                  ^~~~~~~~~~~~~~~~~~~

vim +159 include/linux/page-flags.h

   158	
 > 159	#define	PAGE_POISON_PATTERN	~0ul
   160	static inline int PagePoisoned(const struct page *page)
   161	{
   162		return page->flags == PAGE_POISON_PATTERN;
   163	}
   164	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34919 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm, memory_hotplug: optimize memory hotplug
  2018-01-31 21:03   ` Pavel Tatashin
  (?)
@ 2018-02-02  4:48   ` kbuild test robot
  -1 siblings, 0 replies; 19+ messages in thread
From: kbuild test robot @ 2018-02-02  4:48 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: kbuild-all, steven.sistare, daniel.m.jordan, akpm, mgorman,
	mhocko, linux-mm, linux-kernel, gregkh, vbabka, bharata

[-- Attachment #1: Type: text/plain, Size: 7923 bytes --]

Hi Pavel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mmotm/master]
[also build test WARNING on next-20180201]
[cannot apply to v4.15]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pavel-Tatashin/mm-uninitialized-struct-page-poisoning-sanity-checking/20180202-105827
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-x019-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/page_ref.h:7:0,
                    from include/linux/mm.h:26,
                    from mm/sparse.c:5:
   mm/sparse.c: In function 'sparse_add_one_section':
   include/linux/page-flags.h:159:29: warning: overflow in implicit constant conversion [-Woverflow]
    #define PAGE_POISON_PATTERN ~0ul
                                ^
>> mm/sparse.c:838:17: note: in expansion of macro 'PAGE_POISON_PATTERN'
     memset(memmap, PAGE_POISON_PATTERN,
                    ^~~~~~~~~~~~~~~~~~~
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls64
   Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u64
   Cyclomatic Complexity 3 include/linux/log2.h:is_power_of_2
   Cyclomatic Complexity 1 include/asm-generic/getorder.h:__get_order
   Cyclomatic Complexity 3 include/linux/string.h:memset
   Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock_irqrestore
   Cyclomatic Complexity 1 include/linux/nodemask.h:node_state
   Cyclomatic Complexity 1 include/linux/memory_hotplug.h:pgdat_resize_lock
   Cyclomatic Complexity 1 include/linux/memory_hotplug.h:pgdat_resize_unlock
   Cyclomatic Complexity 1 include/linux/mmzone.h:pfn_to_section_nr
   Cyclomatic Complexity 1 include/linux/mmzone.h:section_nr_to_pfn
   Cyclomatic Complexity 3 include/linux/mmzone.h:__nr_to_section
   Cyclomatic Complexity 1 include/linux/mmzone.h:__section_mem_map_addr
   Cyclomatic Complexity 3 include/linux/mmzone.h:present_section
   Cyclomatic Complexity 1 include/linux/mmzone.h:present_section_nr
   Cyclomatic Complexity 3 include/linux/mmzone.h:valid_section
   Cyclomatic Complexity 1 include/linux/mmzone.h:valid_section_nr
   Cyclomatic Complexity 1 include/linux/mmzone.h:__pfn_to_section
   Cyclomatic Complexity 2 include/linux/mmzone.h:pfn_present
   Cyclomatic Complexity 1 arch/x86/include/asm/topology.h:numa_node_id
   Cyclomatic Complexity 1 include/linux/topology.h:numa_mem_id
   Cyclomatic Complexity 1 include/linux/mm.h:is_vmalloc_addr
   Cyclomatic Complexity 1 include/linux/mm.h:page_to_section
   Cyclomatic Complexity 28 include/linux/slab.h:kmalloc_index
   Cyclomatic Complexity 1 include/linux/slab.h:__kmalloc_node
   Cyclomatic Complexity 1 include/linux/slab.h:kmem_cache_alloc_node_trace
   Cyclomatic Complexity 68 include/linux/slab.h:kmalloc_large
   Cyclomatic Complexity 5 include/linux/slab.h:kmalloc
   Cyclomatic Complexity 5 include/linux/slab.h:kmalloc_node
   Cyclomatic Complexity 1 include/linux/slab.h:kzalloc_node
   Cyclomatic Complexity 1 include/linux/bootmem.h:alloc_remap
   Cyclomatic Complexity 1 mm/sparse.c:set_section_nid
   Cyclomatic Complexity 1 mm/sparse.c:sparse_encode_early_nid
   Cyclomatic Complexity 1 mm/sparse.c:sparse_early_nid
   Cyclomatic Complexity 4 mm/sparse.c:next_present_section_nr
   Cyclomatic Complexity 1 mm/sparse.c:check_usemap_section_nr
   Cyclomatic Complexity 6 mm/sparse.c:alloc_usemap_and_memmap
   Cyclomatic Complexity 1 include/linux/bootmem.h:memblock_virt_alloc
   Cyclomatic Complexity 1 include/linux/bootmem.h:memblock_virt_alloc_node
   Cyclomatic Complexity 2 mm/sparse.c:sparse_index_alloc
   Cyclomatic Complexity 3 mm/sparse.c:sparse_index_init
   Cyclomatic Complexity 1 include/linux/bootmem.h:memblock_virt_alloc_node_nopanic
   Cyclomatic Complexity 1 mm/sparse.c:sparse_early_usemaps_alloc_pgdat_section
   Cyclomatic Complexity 2 mm/sparse.c:sparse_encode_mem_map
   Cyclomatic Complexity 2 mm/sparse.c:sparse_init_one_section
   Cyclomatic Complexity 1 include/linux/bootmem.h:memblock_free_early
   Cyclomatic Complexity 1 include/linux/gfp.h:__alloc_pages
   Cyclomatic Complexity 2 include/linux/gfp.h:__alloc_pages_node
   Cyclomatic Complexity 2 include/linux/gfp.h:alloc_pages_node
   Cyclomatic Complexity 70 mm/sparse.c:__kmalloc_section_memmap
   Cyclomatic Complexity 1 mm/sparse.c:kmalloc_section_memmap
   Cyclomatic Complexity 2 mm/sparse.c:__kfree_section_memmap
   Cyclomatic Complexity 3 mm/sparse.c:get_section_nid
   Cyclomatic Complexity 5 mm/sparse.c:__section_nr
   Cyclomatic Complexity 2 mm/sparse.c:section_mark_present
   Cyclomatic Complexity 7 mm/sparse.c:mminit_validate_memmodel_limits
   Cyclomatic Complexity 4 mm/sparse.c:node_memmap_size_bytes
   Cyclomatic Complexity 4 mm/sparse.c:memory_present
   Cyclomatic Complexity 1 mm/sparse.c:sparse_decode_mem_map
   Cyclomatic Complexity 1 mm/sparse.c:usemap_size
   Cyclomatic Complexity 4 mm/sparse.c:sparse_early_usemaps_alloc_node
   Cyclomatic Complexity 1 mm/sparse.c:__kmalloc_section_usemap
   Cyclomatic Complexity 2 mm/sparse.c:sparse_mem_map_populate
   Cyclomatic Complexity 10 mm/sparse.c:sparse_mem_maps_populate_node
   Cyclomatic Complexity 1 mm/sparse.c:sparse_early_mem_maps_alloc_node
   Cyclomatic Complexity 1 mm/sparse.c:vmemmap_populate_print_last
   Cyclomatic Complexity 6 mm/sparse.c:sparse_init
   Cyclomatic Complexity 4 mm/sparse.c:online_mem_sections
   Cyclomatic Complexity 6 mm/sparse.c:sparse_add_one_section

vim +/PAGE_POISON_PATTERN +838 mm/sparse.c

   793	
   794	/*
   795	 * returns the number of sections whose mem_maps were properly
   796	 * set.  If this is <=0, then that means that the passed-in
   797	 * map was not consumed and must be freed.
   798	 */
   799	int __meminit sparse_add_one_section(struct pglist_data *pgdat,
   800			unsigned long start_pfn, struct vmem_altmap *altmap)
   801	{
   802		unsigned long section_nr = pfn_to_section_nr(start_pfn);
   803		struct mem_section *ms;
   804		struct page *memmap;
   805		unsigned long *usemap;
   806		unsigned long flags;
   807		int ret;
   808	
   809		/*
   810		 * no locking for this, because it does its own
   811		 * plus, it does a kmalloc
   812		 */
   813		ret = sparse_index_init(section_nr, pgdat->node_id);
   814		if (ret < 0 && ret != -EEXIST)
   815			return ret;
   816		memmap = kmalloc_section_memmap(section_nr, pgdat->node_id, altmap);
   817		if (!memmap)
   818			return -ENOMEM;
   819		usemap = __kmalloc_section_usemap();
   820		if (!usemap) {
   821			__kfree_section_memmap(memmap, altmap);
   822			return -ENOMEM;
   823		}
   824	
   825		pgdat_resize_lock(pgdat, &flags);
   826	
   827		ms = __pfn_to_section(start_pfn);
   828		if (ms->section_mem_map & SECTION_MARKED_PRESENT) {
   829			ret = -EEXIST;
   830			goto out;
   831		}
   832	
   833	#ifdef CONFIG_DEBUG_VM
   834		/*
   835		 * poison uninitialized struct pages in order to catch invalid flags
   836		 * combinations.
   837		 */
 > 838		memset(memmap, PAGE_POISON_PATTERN,
   839		       sizeof(struct page) * PAGES_PER_SECTION);
   840	#endif
   841	
   842		section_mark_present(ms);
   843	
   844		ret = sparse_init_one_section(ms, section_nr, memmap, usemap);
   845	
   846	out:
   847		pgdat_resize_unlock(pgdat, &flags);
   848		if (ret <= 0) {
   849			kfree(usemap);
   850			__kfree_section_memmap(memmap, altmap);
   851		} else {
   852			set_section_nid(section_nr, pgdat->node_id);
   853		}
   854		return ret;
   855	}
   856	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29871 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
  2018-01-31 21:02   ` Pavel Tatashin
  (?)
  (?)
@ 2018-03-13 23:43   ` Sasha Levin
  2018-03-14  0:38     ` Pavel Tatashin
  -1 siblings, 1 reply; 19+ messages in thread
From: Sasha Levin @ 2018-03-13 23:43 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

On Wed, Jan 31, 2018 at 04:02:59PM -0500, Pavel Tatashin wrote:
>During boot we poison struct page memory in order to ensure that no one is
>accessing this memory until the struct pages are initialized in
>__init_single_page().
>
>This patch adds more scrutiny to this checking, by making sure that flags
>do not equal to poison pattern when the are accessed. The pattern is all
>ones.
>
>Since, node id is also stored in struct page, and may be accessed quiet
>early we add the enforcement into page_to_nid() function as well.
>
>Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
>---

Hey Pasha,

This patch is causing the following on boot:

[    1.253732] BUG: unable to handle kernel paging request at fffffffffffffffe
[    1.254000] PGD 2284e19067 P4D 2284e19067 PUD 2284e1b067 PMD 0
[    1.254000] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[    1.254000] Modules linked in:
[    1.254000] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc5-next-20180313 #10
[    1.254000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
[    1.254000] RIP: 0010:__dump_page (??:?)
[    1.254000] RSP: 0000:ffff881c63c17810 EFLAGS: 00010246
[    1.254000] RAX: dffffc0000000000 RBX: ffffea0084000000 RCX: 1ffff1038c782f2b
[    1.254000] RDX: 1fffffffffffffff RSI: ffffffff9e160640 RDI: ffffea0084000000
[    1.254000] RBP: ffff881c63c17c00 R08: ffff8840107e8880 R09: ffffed0802167a4d
[    1.254000] R10: 0000000000000001 R11: ffffed0802167a4c R12: 1ffff1038c782f07
[    1.254000] R13: ffffea0084000020 R14: fffffffffffffffe R15: ffff881c63c17bd8
[    1.254000] FS:  0000000000000000(0000) GS:ffff881c6ac00000(0000) knlGS:0000000000000000
[    1.254000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.254000] CR2: fffffffffffffffe CR3: 0000002284e16000 CR4: 00000000003406e0
[    1.254000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    1.254000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    1.254000] Call Trace:
[    1.254000] dump_page (/mm/debug.c:80)
[    1.254000] get_nid_for_pfn (/./include/linux/mm.h:900 /drivers/base/node.c:396)
[    1.254000] register_mem_sect_under_node (/drivers/base/node.c:438)
[    1.254000] link_mem_sections (/drivers/base/node.c:517)
[    1.254000] topology_init (/./include/linux/nodemask.h:271 /arch/x86/kernel/topology.c:164)
[    1.254000] do_one_initcall (/init/main.c:835)
[    1.254000] kernel_init_freeable (/init/main.c:901 /init/main.c:909 /init/main.c:927 /init/main.c:1076)
[    1.254000] kernel_init (/init/main.c:1004)
[    1.254000] ret_from_fork (/arch/x86/entry/entry_64.S:417)
[ 1.254000] Code: ff a8 01 4c 0f 44 f3 4d 85 f6 0f 84 31 0e 00 00 4c 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 2d 11 00 00 <49> 83 3e ff 0f 84 a9 06 00 00 4d 8d b7 c0 fd ff ff 48 b8 00 00
All code
========
   0:   ff a8 01 4c 0f 44       ljmp   *0x440f4c01(%rax)
   6:   f3 4d 85 f6             repz test %r14,%r14
   a:   0f 84 31 0e 00 00       je     0xe41
  10:   4c 89 f2                mov    %r14,%rdx
  13:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
  1a:   fc ff df
  1d:   48 c1 ea 03             shr    $0x3,%rdx
  21:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)
  25:   0f 85 2d 11 00 00       jne    0x1158
  2b:*  49 83 3e ff             cmpq   $0xffffffffffffffff,(%r14)               <-- trapping instruction
  2f:   0f 84 a9 06 00 00       je     0x6de
  35:   4d 8d b7 c0 fd ff ff    lea    -0x240(%r15),%r14
  3c:   48                      rex.W
  3d:   b8                      .byte 0xb8
        ...

Code starting with the faulting instruction
===========================================
   0:   49 83 3e ff             cmpq   $0xffffffffffffffff,(%r14)
   4:   0f 84 a9 06 00 00       je     0x6b3
   a:   4d 8d b7 c0 fd ff ff    lea    -0x240(%r15),%r14
  11:   48                      rex.W
  12:   b8                      .byte 0xb8
        ...
[    1.254000] RIP: __dump_page+0x1c8/0x13c0 RSP: ffff881c63c17810 (/./include/asm-generic/sections.h:42)
[    1.254000] CR2: fffffffffffffffe
[    1.254000] ---[ end trace e643dfbc44b562ca ]---

-- 

Thanks,
Sasha

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
  2018-03-13 23:43   ` Sasha Levin
@ 2018-03-14  0:38     ` Pavel Tatashin
  2018-03-14  0:53       ` Sasha Levin
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Tatashin @ 2018-03-14  0:38 UTC (permalink / raw)
  To: Sasha Levin
  Cc: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

Hi Sasha,

It seems the patch is doing the right thing, and it catches bugs. Here
we access uninitialized struct page. The question is why this happens?

register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
   page_nid = get_nid_for_pfn(pfn);

node id is stored in page flags, and since struct page is poisoned,
and the pattern is recognized, the panic is triggered.

Do you have config file? Also, instructions how to reproduce it?

Thank you,
Pasha


On Tue, Mar 13, 2018 at 7:43 PM, Sasha Levin
<Alexander.Levin@microsoft.com> wrote:
> On Wed, Jan 31, 2018 at 04:02:59PM -0500, Pavel Tatashin wrote:
>>During boot we poison struct page memory in order to ensure that no one is
>>accessing this memory until the struct pages are initialized in
>>__init_single_page().
>>
>>This patch adds more scrutiny to this checking, by making sure that flags
>>do not equal to poison pattern when the are accessed. The pattern is all
>>ones.
>>
>>Since, node id is also stored in struct page, and may be accessed quiet
>>early we add the enforcement into page_to_nid() function as well.
>>
>>Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
>>---
>
> Hey Pasha,
>
> This patch is causing the following on boot:
>
> [    1.253732] BUG: unable to handle kernel paging request at fffffffffffffffe
> [    1.254000] PGD 2284e19067 P4D 2284e19067 PUD 2284e1b067 PMD 0
> [    1.254000] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [    1.254000] Modules linked in:
> [    1.254000] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc5-next-20180313 #10
> [    1.254000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
> [    1.254000] RIP: 0010:__dump_page (??:?)
> [    1.254000] RSP: 0000:ffff881c63c17810 EFLAGS: 00010246
> [    1.254000] RAX: dffffc0000000000 RBX: ffffea0084000000 RCX: 1ffff1038c782f2b
> [    1.254000] RDX: 1fffffffffffffff RSI: ffffffff9e160640 RDI: ffffea0084000000
> [    1.254000] RBP: ffff881c63c17c00 R08: ffff8840107e8880 R09: ffffed0802167a4d
> [    1.254000] R10: 0000000000000001 R11: ffffed0802167a4c R12: 1ffff1038c782f07
> [    1.254000] R13: ffffea0084000020 R14: fffffffffffffffe R15: ffff881c63c17bd8
> [    1.254000] FS:  0000000000000000(0000) GS:ffff881c6ac00000(0000) knlGS:0000000000000000
> [    1.254000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.254000] CR2: fffffffffffffffe CR3: 0000002284e16000 CR4: 00000000003406e0
> [    1.254000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    1.254000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    1.254000] Call Trace:
> [    1.254000] dump_page (/mm/debug.c:80)
> [    1.254000] get_nid_for_pfn (/./include/linux/mm.h:900 /drivers/base/node.c:396)
> [    1.254000] register_mem_sect_under_node (/drivers/base/node.c:438)
> [    1.254000] link_mem_sections (/drivers/base/node.c:517)
> [    1.254000] topology_init (/./include/linux/nodemask.h:271 /arch/x86/kernel/topology.c:164)
> [    1.254000] do_one_initcall (/init/main.c:835)
> [    1.254000] kernel_init_freeable (/init/main.c:901 /init/main.c:909 /init/main.c:927 /init/main.c:1076)
> [    1.254000] kernel_init (/init/main.c:1004)
> [    1.254000] ret_from_fork (/arch/x86/entry/entry_64.S:417)
> [ 1.254000] Code: ff a8 01 4c 0f 44 f3 4d 85 f6 0f 84 31 0e 00 00 4c 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 2d 11 00 00 <49> 83 3e ff 0f 84 a9 06 00 00 4d 8d b7 c0 fd ff ff 48 b8 00 00
> All code
> ========
>    0:   ff a8 01 4c 0f 44       ljmp   *0x440f4c01(%rax)
>    6:   f3 4d 85 f6             repz test %r14,%r14
>    a:   0f 84 31 0e 00 00       je     0xe41
>   10:   4c 89 f2                mov    %r14,%rdx
>   13:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
>   1a:   fc ff df
>   1d:   48 c1 ea 03             shr    $0x3,%rdx
>   21:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)
>   25:   0f 85 2d 11 00 00       jne    0x1158
>   2b:*  49 83 3e ff             cmpq   $0xffffffffffffffff,(%r14)               <-- trapping instruction
>   2f:   0f 84 a9 06 00 00       je     0x6de
>   35:   4d 8d b7 c0 fd ff ff    lea    -0x240(%r15),%r14
>   3c:   48                      rex.W
>   3d:   b8                      .byte 0xb8
>         ...
>
> Code starting with the faulting instruction
> ===========================================
>    0:   49 83 3e ff             cmpq   $0xffffffffffffffff,(%r14)
>    4:   0f 84 a9 06 00 00       je     0x6b3
>    a:   4d 8d b7 c0 fd ff ff    lea    -0x240(%r15),%r14
>   11:   48                      rex.W
>   12:   b8                      .byte 0xb8
>         ...
> [    1.254000] RIP: __dump_page+0x1c8/0x13c0 RSP: ffff881c63c17810 (/./include/asm-generic/sections.h:42)
> [    1.254000] CR2: fffffffffffffffe
> [    1.254000] ---[ end trace e643dfbc44b562ca ]---
>
> --
>
> Thanks,
> Sasha

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
  2018-03-14  0:38     ` Pavel Tatashin
@ 2018-03-14  0:53       ` Sasha Levin
  2018-03-14  1:02         ` Pavel Tatashin
  2018-03-15 19:04         ` Pavel Tatashin
  0 siblings, 2 replies; 19+ messages in thread
From: Sasha Levin @ 2018-03-14  0:53 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

[-- Attachment #1: Type: text/plain, Size: 5300 bytes --]

On Tue, Mar 13, 2018 at 08:38:57PM -0400, Pavel Tatashin wrote:
>Hi Sasha,
>
>It seems the patch is doing the right thing, and it catches bugs. Here
>we access uninitialized struct page. The question is why this happens?

Not completely; note that we die on an invalid reference rather than
assertion failure.

>register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
>   page_nid = get_nid_for_pfn(pfn);
>
>node id is stored in page flags, and since struct page is poisoned,
>and the pattern is recognized, the panic is triggered.
>
>Do you have config file? Also, instructions how to reproduce it?

Attached the config. It just happens on boot.

>Thank you,
>Pasha
>
>
>On Tue, Mar 13, 2018 at 7:43 PM, Sasha Levin
><Alexander.Levin@microsoft.com> wrote:
>> On Wed, Jan 31, 2018 at 04:02:59PM -0500, Pavel Tatashin wrote:
>>>During boot we poison struct page memory in order to ensure that no one is
>>>accessing this memory until the struct pages are initialized in
>>>__init_single_page().
>>>
>>>This patch adds more scrutiny to this checking, by making sure that flags
>>>do not equal to poison pattern when the are accessed. The pattern is all
>>>ones.
>>>
>>>Since, node id is also stored in struct page, and may be accessed quiet
>>>early we add the enforcement into page_to_nid() function as well.
>>>
>>>Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
>>>---
>>
>> Hey Pasha,
>>
>> This patch is causing the following on boot:
>>
>> [    1.253732] BUG: unable to handle kernel paging request at fffffffffffffffe
>> [    1.254000] PGD 2284e19067 P4D 2284e19067 PUD 2284e1b067 PMD 0
>> [    1.254000] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
>> [    1.254000] Modules linked in:
>> [    1.254000] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc5-next-20180313 #10
>> [    1.254000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
>> [    1.254000] RIP: 0010:__dump_page (??:?)
>> [    1.254000] RSP: 0000:ffff881c63c17810 EFLAGS: 00010246
>> [    1.254000] RAX: dffffc0000000000 RBX: ffffea0084000000 RCX: 1ffff1038c782f2b
>> [    1.254000] RDX: 1fffffffffffffff RSI: ffffffff9e160640 RDI: ffffea0084000000
>> [    1.254000] RBP: ffff881c63c17c00 R08: ffff8840107e8880 R09: ffffed0802167a4d
>> [    1.254000] R10: 0000000000000001 R11: ffffed0802167a4c R12: 1ffff1038c782f07
>> [    1.254000] R13: ffffea0084000020 R14: fffffffffffffffe R15: ffff881c63c17bd8
>> [    1.254000] FS:  0000000000000000(0000) GS:ffff881c6ac00000(0000) knlGS:0000000000000000
>> [    1.254000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    1.254000] CR2: fffffffffffffffe CR3: 0000002284e16000 CR4: 00000000003406e0
>> [    1.254000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [    1.254000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [    1.254000] Call Trace:
>> [    1.254000] dump_page (/mm/debug.c:80)
>> [    1.254000] get_nid_for_pfn (/./include/linux/mm.h:900 /drivers/base/node.c:396)
>> [    1.254000] register_mem_sect_under_node (/drivers/base/node.c:438)
>> [    1.254000] link_mem_sections (/drivers/base/node.c:517)
>> [    1.254000] topology_init (/./include/linux/nodemask.h:271 /arch/x86/kernel/topology.c:164)
>> [    1.254000] do_one_initcall (/init/main.c:835)
>> [    1.254000] kernel_init_freeable (/init/main.c:901 /init/main.c:909 /init/main.c:927 /init/main.c:1076)
>> [    1.254000] kernel_init (/init/main.c:1004)
>> [    1.254000] ret_from_fork (/arch/x86/entry/entry_64.S:417)
>> [ 1.254000] Code: ff a8 01 4c 0f 44 f3 4d 85 f6 0f 84 31 0e 00 00 4c 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 2d 11 00 00 <49> 83 3e ff 0f 84 a9 06 00 00 4d 8d b7 c0 fd ff ff 48 b8 00 00
>> All code
>> ========
>>    0:   ff a8 01 4c 0f 44       ljmp   *0x440f4c01(%rax)
>>    6:   f3 4d 85 f6             repz test %r14,%r14
>>    a:   0f 84 31 0e 00 00       je     0xe41
>>   10:   4c 89 f2                mov    %r14,%rdx
>>   13:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
>>   1a:   fc ff df
>>   1d:   48 c1 ea 03             shr    $0x3,%rdx
>>   21:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)
>>   25:   0f 85 2d 11 00 00       jne    0x1158
>>   2b:*  49 83 3e ff             cmpq   $0xffffffffffffffff,(%r14)               <-- trapping instruction
>>   2f:   0f 84 a9 06 00 00       je     0x6de
>>   35:   4d 8d b7 c0 fd ff ff    lea    -0x240(%r15),%r14
>>   3c:   48                      rex.W
>>   3d:   b8                      .byte 0xb8
>>         ...
>>
>> Code starting with the faulting instruction
>> ===========================================
>>    0:   49 83 3e ff             cmpq   $0xffffffffffffffff,(%r14)
>>    4:   0f 84 a9 06 00 00       je     0x6b3
>>    a:   4d 8d b7 c0 fd ff ff    lea    -0x240(%r15),%r14
>>   11:   48                      rex.W
>>   12:   b8                      .byte 0xb8
>>         ...
>> [    1.254000] RIP: __dump_page+0x1c8/0x13c0 RSP: ffff881c63c17810 (/./include/asm-generic/sections.h:42)
>> [    1.254000] CR2: fffffffffffffffe
>> [    1.254000] ---[ end trace e643dfbc44b562ca ]---
>>
>> --
>>
>> Thanks,
>> Sasha

-- 

Thanks,
Sasha

[-- Attachment #2: config-sasha.gz --]
[-- Type: application/gzip, Size: 33407 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
  2018-03-14  0:53       ` Sasha Levin
@ 2018-03-14  1:02         ` Pavel Tatashin
  2018-03-15 19:04         ` Pavel Tatashin
  1 sibling, 0 replies; 19+ messages in thread
From: Pavel Tatashin @ 2018-03-14  1:02 UTC (permalink / raw)
  To: Sasha Levin
  Cc: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

On Tue, Mar 13, 2018 at 8:53 PM, Sasha Levin
<Alexander.Levin@microsoft.com> wrote:
> On Tue, Mar 13, 2018 at 08:38:57PM -0400, Pavel Tatashin wrote:
>>Hi Sasha,
>>
>>It seems the patch is doing the right thing, and it catches bugs. Here
>>we access uninitialized struct page. The question is why this happens?
>
> Not completely; note that we die on an invalid reference rather than
> assertion failure.

I think that invalid reference happens within assertion failure, as
far as I can tell, it is dump_page() where we get the invalid
reference, but to get to dump_page() from get_nid_for_pfn() we must
have triggered the assertion.

>
>>register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
>>   page_nid = get_nid_for_pfn(pfn);
>>
>>node id is stored in page flags, and since struct page is poisoned,
>>and the pattern is recognized, the panic is triggered.
>>
>>Do you have config file? Also, instructions how to reproduce it?
>
> Attached the config. It just happens on boot.

Thanks, I will try in qemu.

Pasha

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
  2018-03-14  0:53       ` Sasha Levin
  2018-03-14  1:02         ` Pavel Tatashin
@ 2018-03-15 19:04         ` Pavel Tatashin
  2018-03-15 20:43           ` Sasha Levin
  1 sibling, 1 reply; 19+ messages in thread
From: Pavel Tatashin @ 2018-03-15 19:04 UTC (permalink / raw)
  To: Sasha Levin
  Cc: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

> 
> Attached the config. It just happens on boot.

Hi Sasha,

I have tried unsuccessfully to reproduce the bug in qemu with 20G RAM,
and 8 CPUs.

Patch "mm: uninitialized struct page poisoning sanity" should be improved
to make dump_page() to detect poisoned struct page, and simply print hex
in such case. I will send an updated patch later.

How do you run this on Microsoft hypervisor? Do I need Windows 10 for
that?

BTW, I am going to be on vacation for the next two week (going to Israel),
so I may not be able to response quickly.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
  2018-03-15 19:04         ` Pavel Tatashin
@ 2018-03-15 20:43           ` Sasha Levin
  2018-04-04  2:17             ` Pavel Tatashin
  0 siblings, 1 reply; 19+ messages in thread
From: Sasha Levin @ 2018-03-15 20:43 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

On Thu, Mar 15, 2018 at 03:04:30PM -0400, Pavel Tatashin wrote:
>>
>> Attached the config. It just happens on boot.
>
>Hi Sasha,
>
>I have tried unsuccessfully to reproduce the bug in qemu with 20G RAM,
>and 8 CPUs.
>
>Patch "mm: uninitialized struct page poisoning sanity" should be improved
>to make dump_page() to detect poisoned struct page, and simply print hex
>in such case. I will send an updated patch later.
>
>How do you run this on Microsoft hypervisor? Do I need Windows 10 for
>that?

Booting a Linux VM on Azure would be the easiest, and free too :)

>BTW, I am going to be on vacation for the next two week (going to Israel),
>so I may not be able to response quickly.

Have fun!

We may need to hold off on getting this patch merged for the time being.

-- 

Thanks,
Sasha

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
  2018-03-15 20:43           ` Sasha Levin
@ 2018-04-04  2:17             ` Pavel Tatashin
  2018-04-05 13:49               ` Pavel Tatashin
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Tatashin @ 2018-04-04  2:17 UTC (permalink / raw)
  To: Sasha Levin
  Cc: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

On 18-03-15 20:43:14, Sasha Levin wrote:
> On Thu, Mar 15, 2018 at 03:04:30PM -0400, Pavel Tatashin wrote:
> >>
> >> Attached the config. It just happens on boot.
> >
> >Hi Sasha,
> >
> >I have tried unsuccessfully to reproduce the bug in qemu with 20G RAM,
> >and 8 CPUs.
> >
> >Patch "mm: uninitialized struct page poisoning sanity" should be improved
> >to make dump_page() to detect poisoned struct page, and simply print hex
> >in such case. I will send an updated patch later.
> >
> >How do you run this on Microsoft hypervisor? Do I need Windows 10 for
> >that?
> 
> Booting a Linux VM on Azure would be the easiest, and free too :)

Hi Sasha,

I have registered on Azure's portal, and created a VM with 4 CPUs and 16G
of RAM. However, I still was not able to reproduce the boot bug you found.

Could you please try an updated patch that I sent out today, the panic
message should improve:

https://lkml.org/lkml/2018/4/3/583

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
  2018-04-04  2:17             ` Pavel Tatashin
@ 2018-04-05 13:49               ` Pavel Tatashin
  2018-04-05 19:22                 ` Sasha Levin
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Tatashin @ 2018-04-05 13:49 UTC (permalink / raw)
  To: Sasha Levin
  Cc: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

> Hi Sasha,
> 
> I have registered on Azure's portal, and created a VM with 4 CPUs and 16G
> of RAM. However, I still was not able to reproduce the boot bug you found.

I have also tried to reproduce this issue on Windows 10 + Hyper-V, still
unsuccessful.

> 
> Could you please try an updated patch that I sent out today, the panic
> message should improve:
> 
> https://lkml.org/lkml/2018/4/3/583
> 
> Thank you,
> Pasha
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
  2018-04-05 13:49               ` Pavel Tatashin
@ 2018-04-05 19:22                 ` Sasha Levin
  2018-04-06 12:45                   ` Pavel Tatashin
  0 siblings, 1 reply; 19+ messages in thread
From: Sasha Levin @ 2018-04-05 19:22 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

On Thu, Apr 05, 2018 at 09:49:40AM -0400, Pavel Tatashin wrote:
>> Hi Sasha,
>>
>> I have registered on Azure's portal, and created a VM with 4 CPUs and 16G
>> of RAM. However, I still was not able to reproduce the boot bug you found.
>
>I have also tried to reproduce this issue on Windows 10 + Hyper-V, still
>unsuccessful.

I'm not sure why you can't reproduce it. I built a 4.16 kernel + your 6
patches on top, and booting on a D64s_v3 instance gives me this:

[    1.205726] page:ffffea0084000000 is uninitialized and poisoned
[    1.205737] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[    1.207016] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[    1.208014] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
[    1.209087] ------------[ cut here ]------------
[    1.210000] kernel BUG at ./include/linux/mm.h:901!
[    1.210015] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[    1.211000] Modules linked in:
[    1.211000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0+ #10
[    1.211000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
[    1.211000] RIP: 0010:get_nid_for_pfn+0x6e/0xa0
[    1.211000] RSP: 0000:ffff881c63cbfc28 EFLAGS: 00010246
[    1.211000] RAX: 0000000000000000 RBX: ffffea0084000000 RCX: 0000000000000000
[    1.211000] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffed038c797f78
[    1.211000] RBP: ffff881c63cbfc30 R08: ffff88401174a480 R09: 0000000000000000
[    1.211000] R10: ffff8840e00d6040 R11: 0000000000000000 R12: 0000000002107fff
[    1.211000] R13: fffffbfff4648234 R14: 0000000000000001 R15: 0000000000000001
[    1.211000] FS:  0000000000000000(0000) GS:ffff881c6aa00000(0000) knlGS:0000000000000000
[    1.211000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.211000] CR2: 0000000000000000 CR3: 0000002814216000 CR4: 00000000003406f0
[    1.211000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    1.211000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    1.211000] Call Trace:
[    1.211000]  register_mem_sect_under_node+0x1a2/0x530
[    1.211000]  link_mem_sections+0x12d/0x200
[    1.211000]  topology_init+0xe6/0x178
[    1.211000]  ? enable_cpu0_hotplug+0x1a/0x1a
[    1.211000]  do_one_initcall+0xb0/0x31f
[    1.211000]  ? initcall_blacklisted+0x220/0x220
[    1.211000]  ? up_write+0x78/0x140
[    1.211000]  ? up_read+0x40/0x40
[    1.211000]  ? __asan_register_globals+0x30/0xa0
[    1.211000]  ? kasan_unpoison_shadow+0x35/0x50
[    1.211000]  kernel_init_freeable+0x69d/0x764
[    1.211000]  ? start_kernel+0x8fd/0x8fd
[    1.211000]  ? finish_task_switch+0x1b6/0x9c0
[    1.211000]  ? rest_init+0x120/0x120
[    1.211000]  kernel_init+0x13/0x150
[    1.211000]  ? rest_init+0x120/0x120
[    1.211000]  ret_from_fork+0x3a/0x50
[    1.211000] Code: ff df 48 c1 ea 03 80 3c 02 00 75 34 48 8b 03 48 83 f8 ff 74 07 48 c1 e8 36 5b 5d c3 48 c7 c6 00 ca f5 9e 48 89 df e8 82 13 d5 fd <0f> 0b 48 c7 c7 00 24 2e a1 e8 05 36 c1 fe e8 af 07 ea fd eb ac
[    1.211000] RIP: get_nid_for_pfn+0x6e/0xa0 RSP: ffff881c63cbfc28
[    1.211017] ---[ end trace d86a03841f7ef229 ]---
[    1.212020] ==================================================================
[    1.213000] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x64c/0x810
[    1.213000] Read of size 8 at addr ffff881c63cbfaf8 by task swapper/0/1
[    1.213000]
[    1.213000] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G      D          4.16.0+ #10
[    1.213000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
[    1.213000] Call Trace:
[    1.213000]  dump_stack+0xe3/0x196
[    1.213000]  ? _atomic_dec_and_lock+0x31a/0x31a
[    1.213000]  ? vprintk_func+0x27/0x60
[    1.213000]  ? printk+0x9c/0xc3
[    1.213000]  ? show_regs_print_info+0x10/0x10
[    1.213000]  ? lock_acquire+0x760/0x760
[    1.213000]  ? update_stack_state+0x64c/0x810
[    1.213000]  print_address_description+0xe4/0x480
[    1.213000]  ? update_stack_state+0x64c/0x810
[    1.213000]  kasan_report+0x1d7/0x460
[    1.213000]  ? console_unlock+0x652/0xe90
[    1.213000]  ? update_stack_state+0x64c/0x810
[    1.213000]  __asan_report_load8_noabort+0x19/0x20
[    1.213000]  update_stack_state+0x64c/0x810
[    1.213000]  ? __read_once_size_nocheck.constprop.2+0x50/0x50
[    1.213000]  ? put_files_struct+0x2a4/0x390
[    1.213000]  ? unwind_next_frame+0x202/0x1230
[    1.213000]  unwind_next_frame+0x202/0x1230
[    1.213000]  ? unwind_dump+0x590/0x590
[    1.213000]  ? get_stack_info+0x42/0x3b0
[    1.213000]  ? debug_check_no_locks_freed+0x300/0x300
[    1.213000]  ? __unwind_start+0x170/0x380
[    1.213000]  __save_stack_trace+0x82/0x140
[    1.213000]  ? put_files_struct+0x2a4/0x390
[    1.213000]  save_stack_trace+0x39/0x70
[    1.213000]  save_stack+0x43/0xd0
[    1.213000]  ? save_stack+0x43/0xd0
[    1.213000]  ? __kasan_slab_free+0x11f/0x170
[    1.213000]  ? kasan_slab_free+0xe/0x10
[    1.213000]  ? kmem_cache_free+0xe6/0x560
[    1.213000]  ? put_files_struct+0x2a4/0x390
[    1.213000]  ? _get_random_bytes+0x162/0x5a0
[    1.213000]  ? trace_hardirqs_off+0xd/0x10
[    1.213000]  ? lock_acquire+0x212/0x760
[    1.213000]  ? rcuwait_wake_up+0x15e/0x2c0
[    1.213000]  ? lock_acquire+0x212/0x760
[    1.213000]  ? free_obj_work+0x8a0/0x8a0
[    1.213000]  ? lock_acquire+0x212/0x760
[    1.213000]  ? acct_collect+0x776/0xe80
[    1.213000]  ? acct_collect+0x2e4/0xe80
[    1.213000]  ? acct_collect+0x2e4/0xe80
[    1.213000]  ? lock_acquire+0x760/0x760
[    1.213000]  ? lock_downgrade+0x910/0x910
[    1.213000]  __kasan_slab_free+0x11f/0x170
[    1.213000]  ? put_files_struct+0x2a4/0x390
[    1.213000]  kasan_slab_free+0xe/0x10
[    1.213000]  kmem_cache_free+0xe6/0x560
[    1.213000]  put_files_struct+0x2a4/0x390
[    1.213000]  ? get_files_struct+0x80/0x80
[    1.213000]  ? do_raw_spin_trylock+0x1f0/0x1f0
[    1.213000]  exit_files+0x83/0xc0
[    1.213000]  do_exit+0x9be/0x2190
[    1.213000]  ? do_invalid_op+0x20/0x30
[    1.213000]  ? mm_update_next_owner+0x1200/0x1200
[    1.213000]  ? get_nid_for_pfn+0x6e/0xa0
[    1.213000]  ? get_nid_for_pfn+0x6e/0xa0
[    1.213000]  ? register_mem_sect_under_node+0x1a2/0x530
[    1.213000]  ? link_mem_sections+0x12d/0x200
[    1.213000]  ? topology_init+0xe6/0x178
[    1.213000]  ? enable_cpu0_hotplug+0x1a/0x1a
[    1.213000]  ? do_one_initcall+0xb0/0x31f
[    1.213000]  ? initcall_blacklisted+0x220/0x220
[    1.213000]  ? up_write+0x78/0x140
[    1.213000]  ? up_read+0x40/0x40
[    1.213000]  ? __asan_register_globals+0x30/0xa0
[    1.213000]  ? kasan_unpoison_shadow+0x35/0x50
[    1.213000]  ? kernel_init_freeable+0x69d/0x764
[    1.213000]  ? start_kernel+0x8fd/0x8fd
[    1.213000]  ? finish_task_switch+0x1b6/0x9c0
[    1.213000]  ? rest_init+0x120/0x120
[    1.213000]  rewind_stack_do_exit+0x17/0x20
[    1.213000]
[    1.213000] The buggy address belongs to the page:
[    1.213000] page:ffffea00718f2fc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[    1.213000] flags: 0x17ffffc0000000()
[    1.213000] raw: 0017ffffc0000000 0000000000000000 0000000000000000 00000000ffffffff
[    1.213000] raw: ffffea00718f2fe0 ffffea00718f2fe0 0000000000000000 0000000000000000
[    1.213000] page dumped because: kasan: bad access detected
[    1.213000]
[    1.213000] Memory state around the buggy address:
[    1.213000]  ffff881c63cbf980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    1.213000]  ffff881c63cbfa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
[    1.213000] >ffff881c63cbfa80: f1 f8 f2 f2 f2 00 00 00 00 00 00 00 00 00 f3 f3
[    1.213000]                                                                 ^
[    1.213000]  ffff881c63cbfb00: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    1.213000]  ffff881c63cbfb80: f1 f1 f1 f1 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2
[    1.213000] ==================================================================
[    1.213033] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.213033]
[    1.214000] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
  2018-04-05 19:22                 ` Sasha Levin
@ 2018-04-06 12:45                   ` Pavel Tatashin
  2018-04-07 14:45                     ` Pavel Tatashin
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Tatashin @ 2018-04-06 12:45 UTC (permalink / raw)
  To: Sasha Levin
  Cc: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

On 18-04-05 19:22:58, Sasha Levin wrote:
> On Thu, Apr 05, 2018 at 09:49:40AM -0400, Pavel Tatashin wrote:
> >> Hi Sasha,
> >>
> >> I have registered on Azure's portal, and created a VM with 4 CPUs and 16G
> >> of RAM. However, I still was not able to reproduce the boot bug you found.
> >
> >I have also tried to reproduce this issue on Windows 10 + Hyper-V, still
> >unsuccessful.
> 
> I'm not sure why you can't reproduce it. I built a 4.16 kernel + your 6
> patches on top, and booting on a D64s_v3 instance gives me this:

Hi Sasha,

Thank you for running it again, the new trace is cleaner, as we do not get
nested panics within dump_page.

Perhaps a NUMA is required to reproduce this issue. I have tried,
unsuccessfully, on D4S_V3. This is the largest VM allowed with free trial,
as 4 CPU is the limit. D64S_V3 is with 64 CPUs and over $2K a month! :)

Let me study your trace, perhaps I will able to figure out the issue
without reproducing it.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking
  2018-04-06 12:45                   ` Pavel Tatashin
@ 2018-04-07 14:45                     ` Pavel Tatashin
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Tatashin @ 2018-04-07 14:45 UTC (permalink / raw)
  To: Sasha Levin
  Cc: steven.sistare, daniel.m.jordan, akpm, mgorman, mhocko, linux-mm,
	linux-kernel, gregkh, vbabka, bharata

> Let me study your trace, perhaps I will able to figure out the issue
> without reproducing it.

Hi Sasha,

I've been studying this problem more. The issue happens in this stack:

...subsys_init...
 topology_init()
  register_one_node(nid)
   link_mem_sections(nid, pgdat->node_start_pfn, pgdat->node_spanned_pages)
    register_mem_sect_under_node(mem_blk, nid)
     get_nid_for_pfn(pfn)
      pfn_to_nid(pfn)
       page_to_nid(page)
        PF_POISONED_CHECK(page)

We are trying to get nid from struct page which has not been
initialized.  My patches add this extra scrutiny to make sure that we
never get invalid nid from a "struct page" by adding
PF_POISONED_CHECK() to page_to_nid(). So, the bug already exists in
Linux where incorrect nid is read. The question is why does happen?

First, I thought, that perhaps struct page is not yet initialized.
But, the initcalls are done after deferred pages are initialized, and
thus every struct page must be initialized by now. Also, if deferred
pages were enabled, we would take a slightly different path and avoid
this bug by getting nid from memblock instead of struct page:

get_nid_for_pfn(pfn)
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 if (system_state < SYSTEM_RUNNING)
  return early_pfn_to_nid(pfn);
#endif

I also verified in your config that CONFIG_DEFERRED_STRUCT_PAGE_INIT
is not set. So, one way to fix this issue, is to remove this "#ifdef"
(I have not checked for dependancies), but this is simply addressing
symptom, not fixing the actual issue.

Thus, we have a "struct page" backing memory for this pfn, but we have
not initialized it. For some reason memmap_init_zone() decided to skip
it, and I am not sure why. Looking at the code we skip initializing
if:
!early_pfn_valid(pfn)) aka !pfn_valid(pfn) and if !early_pfn_in_nid(pfn, nid).

I suspect, this has something to do with !pfn_valid(pfn). But, without
having a machine on which I could reproduce this problem, I cannot
study it further to determine exactly why pfn is not valid.

Please replace !pfn_valid_within() with !pfn_valid() in
get_nid_for_pfn() and see if problem still happens. If it does not
happen, lets study the memory map, pgdata's start end, and the value
of this pfn.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-04-07 14:45 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-31 21:02 [PATCH v2 0/2] optimize memory hotplug Pavel Tatashin
2018-01-31 21:02 ` Pavel Tatashin
2018-01-31 21:02 ` [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking Pavel Tatashin
2018-01-31 21:02   ` Pavel Tatashin
2018-02-02  4:12   ` kbuild test robot
2018-03-13 23:43   ` Sasha Levin
2018-03-14  0:38     ` Pavel Tatashin
2018-03-14  0:53       ` Sasha Levin
2018-03-14  1:02         ` Pavel Tatashin
2018-03-15 19:04         ` Pavel Tatashin
2018-03-15 20:43           ` Sasha Levin
2018-04-04  2:17             ` Pavel Tatashin
2018-04-05 13:49               ` Pavel Tatashin
2018-04-05 19:22                 ` Sasha Levin
2018-04-06 12:45                   ` Pavel Tatashin
2018-04-07 14:45                     ` Pavel Tatashin
2018-01-31 21:03 ` [PATCH v2 2/2] mm, memory_hotplug: optimize memory hotplug Pavel Tatashin
2018-01-31 21:03   ` Pavel Tatashin
2018-02-02  4:48   ` kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.