* [PATCH v2 1/1] memory_hotplug: fix the panic when memory end is not on the section boundary
2018-11-05 15:04 [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Mikhail Zaslonko
@ 2018-11-05 15:04 ` Mikhail Zaslonko
2019-06-14 21:56 ` Sasha Levin
2018-11-05 18:35 ` [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Michal Hocko
1 sibling, 1 reply; 4+ messages in thread
From: Mikhail Zaslonko @ 2018-11-05 15:04 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, linux-mm, mhocko, Pavel.Tatashin, schwidefsky,
heiko.carstens, gerald.schaefer, zaslonko
If memory end is not aligned with the sparse memory section boundary, the
mapping of such a section is only partly initialized. This may lead to
VM_BUG_ON due to uninitialized struct pages access from
is_mem_section_removable() or test_pages_in_a_zone() function triggered by
memory_hotplug sysfs handlers.
Here are the the panic examples:
CONFIG_DEBUG_VM_PGFLAGS=y
kernel parameter mem=2050M
--------------------------
page:000003d082008000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
[<00000000008f15c4>] show_valid_zones+0x5c/0x190
[<00000000008cf9c4>] dev_attr_show+0x34/0x70
[<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
[<00000000003e4194>] seq_read+0x204/0x480
[<00000000003b53ea>] __vfs_read+0x32/0x178
[<00000000003b55b2>] vfs_read+0x82/0x138
[<00000000003b5be2>] ksys_read+0x5a/0xb0
[<0000000000b86ba0>] system_call+0xdc/0x2d8
Last Breaking-Event-Address:
[<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
Kernel panic - not syncing: Fatal exception: panic_on_oops
CONFIG_DEBUG_VM_PGFLAGS=y
kernel parameter mem=3075M
--------------------------
page:000003d08300c000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
[<00000000008f12fa>] show_mem_removable+0x9a/0xd8
[<00000000008cf9c4>] dev_attr_show+0x34/0x70
[<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
[<00000000003e4194>] seq_read+0x204/0x480
[<00000000003b53ea>] __vfs_read+0x32/0x178
[<00000000003b55b2>] vfs_read+0x82/0x138
[<00000000003b5be2>] ksys_read+0x5a/0xb0
[<0000000000b86ba0>] system_call+0xdc/0x2d8
Last Breaking-Event-Address:
[<000000000038596c>] is_mem_section_removable+0xb4/0x190
Kernel panic - not syncing: Fatal exception: panic_on_oops
This fix checks if the page lies within the zone boundaries before
accessing the struct page data. The check is added to both functions.
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: <stable@vger.kernel.org>
---
mm/memory_hotplug.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 38d94b703e9d..8402e70f74c2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1229,9 +1229,8 @@ static struct page *next_active_pageblock(struct page *page)
return page + pageblock_nr_pages;
}
-static bool is_pageblock_removable_nolock(struct page *page)
+static bool is_pageblock_removable_nolock(struct page *page, struct zone **zone)
{
- struct zone *zone;
unsigned long pfn;
/*
@@ -1241,15 +1240,14 @@ static bool is_pageblock_removable_nolock(struct page *page)
* We have to take care about the node as well. If the node is offline
* its NODE_DATA will be NULL - see page_zone.
*/
- if (!node_online(page_to_nid(page)))
- return false;
-
- zone = page_zone(page);
pfn = page_to_pfn(page);
- if (!zone_spans_pfn(zone, pfn))
+ if (*zone && !zone_spans_pfn(*zone, pfn))
return false;
+ if (!node_online(page_to_nid(page)))
+ return false;
+ *zone = page_zone(page);
- return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, true);
+ return !has_unmovable_pages(*zone, page, 0, MIGRATE_MOVABLE, true);
}
/* Checks if this range of memory is likely to be hot-removable. */
@@ -1257,10 +1255,11 @@ bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
{
struct page *page = pfn_to_page(start_pfn);
struct page *end_page = page + nr_pages;
+ struct zone *zone = NULL;
/* Check the starting page of each pageblock within the range */
for (; page < end_page; page = next_active_pageblock(page)) {
- if (!is_pageblock_removable_nolock(page))
+ if (!is_pageblock_removable_nolock(page, &zone))
return false;
cond_resched();
}
@@ -1296,6 +1295,9 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn,
i++;
if (i == MAX_ORDER_NR_PAGES || pfn + i >= end_pfn)
continue;
+ /* Check if we got outside of the zone */
+ if (zone && !zone_spans_pfn(zone, pfn + i))
+ return 0;
page = pfn_to_page(pfn + i);
if (zone && page_zone(page) != zone)
return 0;
--
2.16.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not
2018-11-05 15:04 [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Mikhail Zaslonko
2018-11-05 15:04 ` [PATCH v2 1/1] memory_hotplug: fix the panic when memory end is not on the section boundary Mikhail Zaslonko
@ 2018-11-05 18:35 ` Michal Hocko
1 sibling, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2018-11-05 18:35 UTC (permalink / raw)
To: Mikhail Zaslonko
Cc: akpm, linux-kernel, linux-mm, Pavel.Tatashin, schwidefsky,
heiko.carstens, gerald.schaefer
On Mon 05-11-18 16:04:00, Mikhail Zaslonko wrote:
[...]
> Another approach was to fix memmap_init() and initialize struct pages
> beyond the end.
Yes I still do not want to give up at least this option. We do have
struct pages for the full section. Leaving som of them uninitialized is
just asking for problems. And adding special cases to some hotplug paths
just makes the code harder to follow and maintain.
So
> Since struct pages are allocated section-wise we can try to
> round the size parameter passed to the memmap_init() function up to the
> section boundary thus forcing the mapping initialization for the entire
> section. But then it leads to another VM_BUG_ON panic due to
> zone_spans_pfn() sanity check triggered for the first page of each page
> block from set_pageblock_migratetype() function:
> page dumped because: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
> Call Trace:
> ([<00000000003013f8>] set_pfnblock_flags_mask+0xe8/0x140)
> [<00000000003014aa>] set_pageblock_migratetype+0x5a/0x70
> [<0000000000bef706>] memmap_init_zone+0x25e/0x2e0
> [<00000000010fc3d8>] free_area_init_node+0x530/0x558
> [<00000000010fcf02>] free_area_init_nodes+0x81a/0x8f0
> [<00000000010e7fdc>] paging_init+0x124/0x130
> [<00000000010e4dfa>] setup_arch+0xbf2/0xcc8
> [<00000000010de9e6>] start_kernel+0x7e/0x588
> [<000000000010007c>] startup_continue+0x7c/0x300
> Last Breaking-Event-Address:
> [<00000000003013f8>] set_pfnblock_flags_mask+0xe8/0x1401
> We might ignore this check for the struct pages beyond the "end" but I'm not
> sure about further implications.
find out all these implictions or do something like below (untested)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a919ba5cb3c8..a3f9ad8e40ee 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5544,6 +5544,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
cond_resched();
}
}
+
+#ifdef CONFIG_SPARSEMEM
+ /*
+ * If we do not have a zone which doesn't span the rest of the
+ * section then we should at least initialize those pages. We
+ * could blow up on a poisoned page in some paths which depend
+ * on full pageblocks being allocated (e.g. memory hotplug).
+ */
+ while (end_pfn % PAGES_PER_SECTION) {
+ __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
+ end_pfn++
+ }
+
+#endif
+
}
#ifdef CONFIG_ZONE_DEVICE
--
Michal Hocko
SUSE Labs
^ permalink raw reply related [flat|nested] 4+ messages in thread