Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not
@ 2018-11-05 15:04 Mikhail Zaslonko
  2018-11-05 15:04 ` [PATCH v2 1/1] memory_hotplug: fix the panic when memory end is not on the section boundary Mikhail Zaslonko
  2018-11-05 18:35 ` [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Michal Hocko
  0 siblings, 2 replies; 4+ messages in thread
From: Mikhail Zaslonko @ 2018-11-05 15:04 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, mhocko, Pavel.Tatashin, schwidefsky,
	heiko.carstens, gerald.schaefer, zaslonko

This patch refers to the older thread:
https://marc.info/?t=153658306400001&r=1&w=2

I have tried to take the approaches suggested in the discussion like
simply ignoring unaligned memory to section memory much earlier or
initializing struct pages beyond the "end" but both had issues.

First I tried to ignore unaligned memory early by adjusting memory_end
value. But the thing is that kernel mem parameter parsing and memory_end
calculation take place in the architecture code and adjusting it afterwards
in common code might be too late in my view. Also with this approach we
might lose the memory up to the entire section(256Mb on s390) just because
of unfortunate alignment.

Another approach was to fix memmap_init() and initialize struct pages
beyond the end. Since struct pages are allocated section-wise we can try to
round the size parameter passed to the memmap_init() function up to the
section boundary thus forcing the mapping initialization for the entire
section. But then it leads to another VM_BUG_ON panic due to
zone_spans_pfn() sanity check triggered for the first page of each page
block from set_pageblock_migratetype() function:
 page dumped because: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
      Call Trace:
      ([<00000000003013f8>] set_pfnblock_flags_mask+0xe8/0x140)
       [<00000000003014aa>] set_pageblock_migratetype+0x5a/0x70
       [<0000000000bef706>] memmap_init_zone+0x25e/0x2e0
       [<00000000010fc3d8>] free_area_init_node+0x530/0x558
       [<00000000010fcf02>] free_area_init_nodes+0x81a/0x8f0
       [<00000000010e7fdc>] paging_init+0x124/0x130
       [<00000000010e4dfa>] setup_arch+0xbf2/0xcc8
       [<00000000010de9e6>] start_kernel+0x7e/0x588
       [<000000000010007c>] startup_continue+0x7c/0x300
      Last Breaking-Event-Address:
       [<00000000003013f8>] set_pfnblock_flags_mask+0xe8/0x1401
We might ignore this check for the struct pages beyond the "end" but I'm not
sure about further implications.
For now I suggest to stay with my original proposal fixing specific
functions for memory hotplug sysfs handlers.

Changes v1 -> v2:
* Expanded commit message to show both failing scenarious.
* Use 'pfn + i' instead of 'pfn' for zone_spans_pfn() check within
test_pages_in_a_zone() function thus taking CONFIG_HOLES_IN_ZONE into
consideration.

Mikhail Zaslonko (1):
  memory_hotplug: fix the panic when memory end is not on the section
    boundary

 mm/memory_hotplug.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

-- 
2.16.4

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/1] memory_hotplug: fix the panic when memory end is not on the section boundary
  2018-11-05 15:04 [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Mikhail Zaslonko
@ 2018-11-05 15:04 ` Mikhail Zaslonko
  2019-06-14 21:56   ` Sasha Levin
  2018-11-05 18:35 ` [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Michal Hocko
  1 sibling, 1 reply; 4+ messages in thread
From: Mikhail Zaslonko @ 2018-11-05 15:04 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, mhocko, Pavel.Tatashin, schwidefsky,
	heiko.carstens, gerald.schaefer, zaslonko

If memory end is not aligned with the sparse memory section boundary, the
mapping of such a section is only partly initialized. This may lead to
VM_BUG_ON due to uninitialized struct pages access from
is_mem_section_removable() or test_pages_in_a_zone() function triggered by
memory_hotplug sysfs handlers.

Here are the the panic examples:
 CONFIG_DEBUG_VM_PGFLAGS=y
 kernel parameter mem=2050M
 --------------------------
 page:000003d082008000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
  [<00000000008f15c4>] show_valid_zones+0x5c/0x190
  [<00000000008cf9c4>] dev_attr_show+0x34/0x70
  [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
  [<00000000003e4194>] seq_read+0x204/0x480
  [<00000000003b53ea>] __vfs_read+0x32/0x178
  [<00000000003b55b2>] vfs_read+0x82/0x138
  [<00000000003b5be2>] ksys_read+0x5a/0xb0
  [<0000000000b86ba0>] system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
  [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
 Kernel panic - not syncing: Fatal exception: panic_on_oops

 CONFIG_DEBUG_VM_PGFLAGS=y
 kernel parameter mem=3075M
 --------------------------
 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
  [<00000000008f12fa>] show_mem_removable+0x9a/0xd8
  [<00000000008cf9c4>] dev_attr_show+0x34/0x70
  [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
  [<00000000003e4194>] seq_read+0x204/0x480
  [<00000000003b53ea>] __vfs_read+0x32/0x178
  [<00000000003b55b2>] vfs_read+0x82/0x138
  [<00000000003b5be2>] ksys_read+0x5a/0xb0
  [<0000000000b86ba0>] system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
  [<000000000038596c>] is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

This fix checks if the page lies within the zone boundaries before
accessing the struct page data. The check is added to both functions.

Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: <stable@vger.kernel.org>
---
 mm/memory_hotplug.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 38d94b703e9d..8402e70f74c2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1229,9 +1229,8 @@ static struct page *next_active_pageblock(struct page *page)
 	return page + pageblock_nr_pages;
 }
 
-static bool is_pageblock_removable_nolock(struct page *page)
+static bool is_pageblock_removable_nolock(struct page *page, struct zone **zone)
 {
-	struct zone *zone;
 	unsigned long pfn;
 
 	/*
@@ -1241,15 +1240,14 @@ static bool is_pageblock_removable_nolock(struct page *page)
 	 * We have to take care about the node as well. If the node is offline
 	 * its NODE_DATA will be NULL - see page_zone.
 	 */
-	if (!node_online(page_to_nid(page)))
-		return false;
-
-	zone = page_zone(page);
 	pfn = page_to_pfn(page);
-	if (!zone_spans_pfn(zone, pfn))
+	if (*zone && !zone_spans_pfn(*zone, pfn))
 		return false;
+	if (!node_online(page_to_nid(page)))
+		return false;
+	*zone = page_zone(page);
 
-	return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, true);
+	return !has_unmovable_pages(*zone, page, 0, MIGRATE_MOVABLE, true);
 }
 
 /* Checks if this range of memory is likely to be hot-removable. */
@@ -1257,10 +1255,11 @@ bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 {
 	struct page *page = pfn_to_page(start_pfn);
 	struct page *end_page = page + nr_pages;
+	struct zone *zone = NULL;
 
 	/* Check the starting page of each pageblock within the range */
 	for (; page < end_page; page = next_active_pageblock(page)) {
-		if (!is_pageblock_removable_nolock(page))
+		if (!is_pageblock_removable_nolock(page, &zone))
 			return false;
 		cond_resched();
 	}
@@ -1296,6 +1295,9 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn,
 				i++;
 			if (i == MAX_ORDER_NR_PAGES || pfn + i >= end_pfn)
 				continue;
+			/* Check if we got outside of the zone */
+			if (zone && !zone_spans_pfn(zone, pfn + i))
+				return 0;
 			page = pfn_to_page(pfn + i);
 			if (zone && page_zone(page) != zone)
 				return 0;
-- 
2.16.4

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not
  2018-11-05 15:04 [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Mikhail Zaslonko
  2018-11-05 15:04 ` [PATCH v2 1/1] memory_hotplug: fix the panic when memory end is not on the section boundary Mikhail Zaslonko
@ 2018-11-05 18:35 ` Michal Hocko
  1 sibling, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2018-11-05 18:35 UTC (permalink / raw)
  To: Mikhail Zaslonko
  Cc: akpm, linux-kernel, linux-mm, Pavel.Tatashin, schwidefsky,
	heiko.carstens, gerald.schaefer

On Mon 05-11-18 16:04:00, Mikhail Zaslonko wrote:
[...]
> Another approach was to fix memmap_init() and initialize struct pages
> beyond the end.

Yes I still do not want to give up at least this option. We do have
struct pages for the full section. Leaving som of them uninitialized is
just asking for problems. And adding special cases to some hotplug paths
just makes the code harder to follow and maintain.

So

> Since struct pages are allocated section-wise we can try to
> round the size parameter passed to the memmap_init() function up to the
> section boundary thus forcing the mapping initialization for the entire
> section. But then it leads to another VM_BUG_ON panic due to
> zone_spans_pfn() sanity check triggered for the first page of each page
> block from set_pageblock_migratetype() function:
>  page dumped because: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
>       Call Trace:
>       ([<00000000003013f8>] set_pfnblock_flags_mask+0xe8/0x140)
>        [<00000000003014aa>] set_pageblock_migratetype+0x5a/0x70
>        [<0000000000bef706>] memmap_init_zone+0x25e/0x2e0
>        [<00000000010fc3d8>] free_area_init_node+0x530/0x558
>        [<00000000010fcf02>] free_area_init_nodes+0x81a/0x8f0
>        [<00000000010e7fdc>] paging_init+0x124/0x130
>        [<00000000010e4dfa>] setup_arch+0xbf2/0xcc8
>        [<00000000010de9e6>] start_kernel+0x7e/0x588
>        [<000000000010007c>] startup_continue+0x7c/0x300
>       Last Breaking-Event-Address:
>        [<00000000003013f8>] set_pfnblock_flags_mask+0xe8/0x1401
> We might ignore this check for the struct pages beyond the "end" but I'm not
> sure about further implications.

find out all these implictions or do something like below (untested)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a919ba5cb3c8..a3f9ad8e40ee 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5544,6 +5544,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			cond_resched();
 		}
 	}
+
+#ifdef CONFIG_SPARSEMEM
+	/*
+	 * If we do not have a zone which doesn't span the rest of the
+	 * section then we should at least initialize those pages. We
+	 * could blow up on a poisoned page in some paths which depend
+	 * on full pageblocks being allocated (e.g. memory hotplug).
+	 */
+	while (end_pfn % PAGES_PER_SECTION) {
+		__init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
+		end_pfn++
+	}
+
+#endif
+
 }
 
 #ifdef CONFIG_ZONE_DEVICE
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 1/1] memory_hotplug: fix the panic when memory end is not on the section boundary
  2018-11-05 15:04 ` [PATCH v2 1/1] memory_hotplug: fix the panic when memory end is not on the section boundary Mikhail Zaslonko
@ 2019-06-14 21:56   ` Sasha Levin
  0 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2019-06-14 21:56 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-kernel, linux-mm

Hi,

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.1.9, v4.19.50, v4.14.125, v4.9.181, v4.4.181.

v5.1.9: Failed to apply! Possible dependencies:
    Unable to calculate

v4.19.50: Failed to apply! Possible dependencies:
    Unable to calculate

v4.14.125: Failed to apply! Possible dependencies:
    4da2ce250f98 ("mm: distinguish CMA and MOVABLE isolation in has_unmovable_pages()")
    da024512a1fa ("mm: pass the vmem_altmap to arch_remove_memory and __remove_pages")
    fb52bbaee598 ("mm: move is_pageblock_removable_nolock() to mm/memory_hotplug.c")

v4.9.181: Failed to apply! Possible dependencies:
    133ff0eac95b ("mm/hmm: heterogeneous memory management (HMM for short)")
    3859a271a003 ("randstruct: Mark various structs for randomization")
    4ef589dc9b10 ("mm/hmm/devmem: device memory hotplug using ZONE_DEVICE")
    5613fda9a503 ("sched/cputime: Convert task/group cputime to nsecs")
    60f3e00d25b4 ("sysv,ipc: cacheline align kern_ipc_perm")
    8c8b73c4811f ("sched/cputime, powerpc: Prepare accounting structure for cputime flush on tick")
    a19ff1a2cc92 ("sched/cputime, powerpc/vtime: Accumulate cputime and account only on tick/task switch")
    b18b6a9cef7f ("timers: Omit POSIX timer stuff from task_struct when disabled")
    b584c2544041 ("powerpc/vmemmap: Add altmap support")
    baa73d9e478f ("posix-timers: Make them configurable")
    c3edc4010e9d ("sched/headers: Move task_struct::signal and task_struct::sighand types and accessors into <linux/sched/signal.h>")
    d3df0a423397 ("mm/hmm: add new helper to hotplug CDM memory region")
    d69dece5f5b6 ("LSM: Add /sys/kernel/security/lsm")
    d7d9b612f1b0 ("powerpc/vmemmap: Reshuffle vmemmap_free()")
    da024512a1fa ("mm: pass the vmem_altmap to arch_remove_memory and __remove_pages")
    fb52bbaee598 ("mm: move is_pageblock_removable_nolock() to mm/memory_hotplug.c")

v4.4.181: Failed to apply! Possible dependencies:
    11a6f6abd74a ("powerpc/mm: Move radix/hash common data structures to book3s64 headers")
    15b1624b7807 ("powerpc: Use defines for __init_tlb_power[78]")
    18569c1f134e ("powerpc/64: Don't try to use radix MMU under a hypervisor")
    1a01dc87e09b ("powerpc/mm: Add mmu_early_init_devtree()")
    26b6a3d9bb48 ("powerpc/mm: move pte headers to book3s directory")
    2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines")
    3dfcb315d81e ("powerpc/mm: make a separate copy for book3s")
    5c3c7ede2bdc ("powerpc/mm: Split hash page table sizing heuristic into a helper")
    756d08d1ba16 ("powerpc/mm: Abstract early MMU init in preparation for radix")
    b275bfb26963 ("powerpc/mm/radix: Add a kernel command line to disable radix")
    b584c2544041 ("powerpc/vmemmap: Add altmap support")
    c3ab300ea555 ("powerpc: Add POWER9 cputable entry")
    c610ec60ed63 ("powerpc/mm: Move disable_radix handling into mmu_early_init_devtree()")
    da024512a1fa ("mm: pass the vmem_altmap to arch_remove_memory and __remove_pages")
    f64e8084c94b ("powerpc/mm: Move hash related mmu-*.h headers to book3s/")
    fb52bbaee598 ("mm: move is_pageblock_removable_nolock() to mm/memory_hotplug.c")


How should we proceed with this patch?

--
Thanks,
Sasha


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-05 15:04 [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Mikhail Zaslonko
2018-11-05 15:04 ` [PATCH v2 1/1] memory_hotplug: fix the panic when memory end is not on the section boundary Mikhail Zaslonko
2019-06-14 21:56   ` Sasha Levin
2018-11-05 18:35 ` [PATCH v2 0/1] memory_hotplug: fix the panic when memory end is not Michal Hocko

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org linux-mm@archiver.kernel.org
	public-inbox-index linux-mm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox