linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not
@ 2018-11-05 15:04 Mikhail Zaslonko
  2018-11-05 15:04 ` [PATCH v2 1/1] memory_hotplug: fix the panic when memory end is not on the section boundary Mikhail Zaslonko
  2018-11-05 18:35 ` [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Michal Hocko
  0 siblings, 2 replies; 3+ messages in thread
From: Mikhail Zaslonko @ 2018-11-05 15:04 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, mhocko, Pavel.Tatashin, schwidefsky,
	heiko.carstens, gerald.schaefer, zaslonko

This patch refers to the older thread:
https://marc.info/?t=153658306400001&r=1&w=2

I have tried to take the approaches suggested in the discussion like
simply ignoring unaligned memory to section memory much earlier or
initializing struct pages beyond the "end" but both had issues.

First I tried to ignore unaligned memory early by adjusting memory_end
value. But the thing is that kernel mem parameter parsing and memory_end
calculation take place in the architecture code and adjusting it afterwards
in common code might be too late in my view. Also with this approach we
might lose the memory up to the entire section(256Mb on s390) just because
of unfortunate alignment.

Another approach was to fix memmap_init() and initialize struct pages
beyond the end. Since struct pages are allocated section-wise we can try to
round the size parameter passed to the memmap_init() function up to the
section boundary thus forcing the mapping initialization for the entire
section. But then it leads to another VM_BUG_ON panic due to
zone_spans_pfn() sanity check triggered for the first page of each page
block from set_pageblock_migratetype() function:
 page dumped because: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
      Call Trace:
      ([<00000000003013f8>] set_pfnblock_flags_mask+0xe8/0x140)
       [<00000000003014aa>] set_pageblock_migratetype+0x5a/0x70
       [<0000000000bef706>] memmap_init_zone+0x25e/0x2e0
       [<00000000010fc3d8>] free_area_init_node+0x530/0x558
       [<00000000010fcf02>] free_area_init_nodes+0x81a/0x8f0
       [<00000000010e7fdc>] paging_init+0x124/0x130
       [<00000000010e4dfa>] setup_arch+0xbf2/0xcc8
       [<00000000010de9e6>] start_kernel+0x7e/0x588
       [<000000000010007c>] startup_continue+0x7c/0x300
      Last Breaking-Event-Address:
       [<00000000003013f8>] set_pfnblock_flags_mask+0xe8/0x1401
We might ignore this check for the struct pages beyond the "end" but I'm not
sure about further implications.
For now I suggest to stay with my original proposal fixing specific
functions for memory hotplug sysfs handlers.

Changes v1 -> v2:
* Expanded commit message to show both failing scenarious.
* Use 'pfn + i' instead of 'pfn' for zone_spans_pfn() check within
test_pages_in_a_zone() function thus taking CONFIG_HOLES_IN_ZONE into
consideration.

Mikhail Zaslonko (1):
  memory_hotplug: fix the panic when memory end is not on the section
    boundary

 mm/memory_hotplug.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v2 1/1] memory_hotplug: fix the panic when memory end is not on the section boundary
  2018-11-05 15:04 [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Mikhail Zaslonko
@ 2018-11-05 15:04 ` Mikhail Zaslonko
  2018-11-05 18:35 ` [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Michal Hocko
  1 sibling, 0 replies; 3+ messages in thread
From: Mikhail Zaslonko @ 2018-11-05 15:04 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, mhocko, Pavel.Tatashin, schwidefsky,
	heiko.carstens, gerald.schaefer, zaslonko

If memory end is not aligned with the sparse memory section boundary, the
mapping of such a section is only partly initialized. This may lead to
VM_BUG_ON due to uninitialized struct pages access from
is_mem_section_removable() or test_pages_in_a_zone() function triggered by
memory_hotplug sysfs handlers.

Here are the the panic examples:
 CONFIG_DEBUG_VM_PGFLAGS=y
 kernel parameter mem=2050M
 --------------------------
 page:000003d082008000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
  [<00000000008f15c4>] show_valid_zones+0x5c/0x190
  [<00000000008cf9c4>] dev_attr_show+0x34/0x70
  [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
  [<00000000003e4194>] seq_read+0x204/0x480
  [<00000000003b53ea>] __vfs_read+0x32/0x178
  [<00000000003b55b2>] vfs_read+0x82/0x138
  [<00000000003b5be2>] ksys_read+0x5a/0xb0
  [<0000000000b86ba0>] system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
  [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
 Kernel panic - not syncing: Fatal exception: panic_on_oops

 CONFIG_DEBUG_VM_PGFLAGS=y
 kernel parameter mem=3075M
 --------------------------
 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
  [<00000000008f12fa>] show_mem_removable+0x9a/0xd8
  [<00000000008cf9c4>] dev_attr_show+0x34/0x70
  [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
  [<00000000003e4194>] seq_read+0x204/0x480
  [<00000000003b53ea>] __vfs_read+0x32/0x178
  [<00000000003b55b2>] vfs_read+0x82/0x138
  [<00000000003b5be2>] ksys_read+0x5a/0xb0
  [<0000000000b86ba0>] system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
  [<000000000038596c>] is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

This fix checks if the page lies within the zone boundaries before
accessing the struct page data. The check is added to both functions.

Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: <stable@vger.kernel.org>
---
 mm/memory_hotplug.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 38d94b703e9d..8402e70f74c2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1229,9 +1229,8 @@ static struct page *next_active_pageblock(struct page *page)
 	return page + pageblock_nr_pages;
 }
 
-static bool is_pageblock_removable_nolock(struct page *page)
+static bool is_pageblock_removable_nolock(struct page *page, struct zone **zone)
 {
-	struct zone *zone;
 	unsigned long pfn;
 
 	/*
@@ -1241,15 +1240,14 @@ static bool is_pageblock_removable_nolock(struct page *page)
 	 * We have to take care about the node as well. If the node is offline
 	 * its NODE_DATA will be NULL - see page_zone.
 	 */
-	if (!node_online(page_to_nid(page)))
-		return false;
-
-	zone = page_zone(page);
 	pfn = page_to_pfn(page);
-	if (!zone_spans_pfn(zone, pfn))
+	if (*zone && !zone_spans_pfn(*zone, pfn))
 		return false;
+	if (!node_online(page_to_nid(page)))
+		return false;
+	*zone = page_zone(page);
 
-	return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, true);
+	return !has_unmovable_pages(*zone, page, 0, MIGRATE_MOVABLE, true);
 }
 
 /* Checks if this range of memory is likely to be hot-removable. */
@@ -1257,10 +1255,11 @@ bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 {
 	struct page *page = pfn_to_page(start_pfn);
 	struct page *end_page = page + nr_pages;
+	struct zone *zone = NULL;
 
 	/* Check the starting page of each pageblock within the range */
 	for (; page < end_page; page = next_active_pageblock(page)) {
-		if (!is_pageblock_removable_nolock(page))
+		if (!is_pageblock_removable_nolock(page, &zone))
 			return false;
 		cond_resched();
 	}
@@ -1296,6 +1295,9 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn,
 				i++;
 			if (i == MAX_ORDER_NR_PAGES || pfn + i >= end_pfn)
 				continue;
+			/* Check if we got outside of the zone */
+			if (zone && !zone_spans_pfn(zone, pfn + i))
+				return 0;
 			page = pfn_to_page(pfn + i);
 			if (zone && page_zone(page) != zone)
 				return 0;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not
  2018-11-05 15:04 [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Mikhail Zaslonko
  2018-11-05 15:04 ` [PATCH v2 1/1] memory_hotplug: fix the panic when memory end is not on the section boundary Mikhail Zaslonko
@ 2018-11-05 18:35 ` Michal Hocko
  1 sibling, 0 replies; 3+ messages in thread
From: Michal Hocko @ 2018-11-05 18:35 UTC (permalink / raw)
  To: Mikhail Zaslonko
  Cc: akpm, linux-kernel, linux-mm, Pavel.Tatashin, schwidefsky,
	heiko.carstens, gerald.schaefer

On Mon 05-11-18 16:04:00, Mikhail Zaslonko wrote:
[...]
> Another approach was to fix memmap_init() and initialize struct pages
> beyond the end.

Yes I still do not want to give up at least this option. We do have
struct pages for the full section. Leaving som of them uninitialized is
just asking for problems. And adding special cases to some hotplug paths
just makes the code harder to follow and maintain.

So

> Since struct pages are allocated section-wise we can try to
> round the size parameter passed to the memmap_init() function up to the
> section boundary thus forcing the mapping initialization for the entire
> section. But then it leads to another VM_BUG_ON panic due to
> zone_spans_pfn() sanity check triggered for the first page of each page
> block from set_pageblock_migratetype() function:
>  page dumped because: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
>       Call Trace:
>       ([<00000000003013f8>] set_pfnblock_flags_mask+0xe8/0x140)
>        [<00000000003014aa>] set_pageblock_migratetype+0x5a/0x70
>        [<0000000000bef706>] memmap_init_zone+0x25e/0x2e0
>        [<00000000010fc3d8>] free_area_init_node+0x530/0x558
>        [<00000000010fcf02>] free_area_init_nodes+0x81a/0x8f0
>        [<00000000010e7fdc>] paging_init+0x124/0x130
>        [<00000000010e4dfa>] setup_arch+0xbf2/0xcc8
>        [<00000000010de9e6>] start_kernel+0x7e/0x588
>        [<000000000010007c>] startup_continue+0x7c/0x300
>       Last Breaking-Event-Address:
>        [<00000000003013f8>] set_pfnblock_flags_mask+0xe8/0x1401
> We might ignore this check for the struct pages beyond the "end" but I'm not
> sure about further implications.

find out all these implictions or do something like below (untested)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a919ba5cb3c8..a3f9ad8e40ee 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5544,6 +5544,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			cond_resched();
 		}
 	}
+
+#ifdef CONFIG_SPARSEMEM
+	/*
+	 * If we do not have a zone which doesn't span the rest of the
+	 * section then we should at least initialize those pages. We
+	 * could blow up on a poisoned page in some paths which depend
+	 * on full pageblocks being allocated (e.g. memory hotplug).
+	 */
+	while (end_pfn % PAGES_PER_SECTION) {
+		__init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
+		end_pfn++
+	}
+
+#endif
+
 }
 
 #ifdef CONFIG_ZONE_DEVICE
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-11-05 18:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-05 15:04 [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Mikhail Zaslonko
2018-11-05 15:04 ` [PATCH v2 1/1] memory_hotplug: fix the panic when memory end is not on the section boundary Mikhail Zaslonko
2018-11-05 18:35 ` [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).