Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory
@ 2019-10-06  8:56 David Hildenbrand
  2019-10-06  8:56 ` [PATCH v6 01/10] mm/memunmap: Don't access uninitialized memmap in memunmap_pages() David Hildenbrand
                   ` (10 more replies)
  0 siblings, 11 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-10-06  8:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, David Hildenbrand, Aneesh Kumar K . V,
	Andrew Morton, Dan Williams, Michal Hocko, Alexander Duyck,
	Alexander Potapenko, Andy Lutomirski, Anshuman Khandual,
	Benjamin Herrenschmidt, Borislav Petkov, Catalin Marinas,
	Christian Borntraeger, Christophe Leroy, Dave Hansen, Fenghua Yu,
	Gerald Schaefer, Greg Kroah-Hartman, Halil Pasic, Heiko Carstens,
	H. Peter Anvin, Ingo Molnar, Ira Weiny, Jason Gunthorpe, Jun Yao,
	Logan Gunthorpe, Mark Rutland, Masahiro Yamada,
	Matthew Wilcox (Oracle),
	Mel Gorman, Michael Ellerman, Mike Rapoport, Oscar Salvador,
	Pankaj Gupta, Paul Mackerras, Pavel Tatashin, Pavel Tatashin,
	Peter Zijlstra, Qian Cai, Rich Felker, Robin Murphy,
	Steve Capper, Thomas Gleixner, Tom Lendacky, Tony Luck,
	Vasily Gorbik, Vlastimil Babka, Wei Yang, Wei Yang, Will Deacon,
	Yoshinori Sato, Yu Zhao

This series fixes the access of uninitialized memmaps when shrinking
zones/nodes and when removing memory. Also, it contains all fixes for
crashes that can be triggered when removing certain namespace using
memunmap_pages() - ZONE_DEVICE, reported by Aneesh.

We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be
more involved (we don't have SECTION_IS_ONLINE as an indicator), and
shrinking is only of limited use (set_zone_contiguous() cannot detect
the ZONE_DEVICE as contiguous).

We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount of
code to a minimum. Shrinking is especially necessary to keep
zone->contiguous set where possible, especially, on memory unplug of
DIMMs at zone boundaries.

--------------------------------------------------------------------------

Zones are now properly shrunk when offlining memory blocks or when
onlining failed. This allows to properly shrink zones on memory unplug
even if the separate memory blocks of a DIMM were onlined to different
zones or re-onlined to a different zone after offlining.

Example:

:/# cat /proc/zoneinfo
Node 1, zone  Movable
        spanned  0
        present  0
        managed  0
:/# echo "online_movable" > /sys/devices/system/memory/memory41/state
:/# echo "online_movable" > /sys/devices/system/memory/memory43/state
:/# cat /proc/zoneinfo
Node 1, zone  Movable
        spanned  98304
        present  65536
        managed  65536
:/# echo 0 > /sys/devices/system/memory/memory43/online
:/# cat /proc/zoneinfo
Node 1, zone  Movable
        spanned  32768
        present  32768
        managed  32768
:/# echo 0 > /sys/devices/system/memory/memory41/online
:/# cat /proc/zoneinfo
Node 1, zone  Movable
        spanned  0
        present  0
        managed  0

--------------------------------------------------------------------------

I tested this with DIMMs on x86. I didn't test the ZONE_DEVICE part.

v4 -> v6:
- "mm/memunmap: Don't access uninitialized memmap in memunmap_pages()"
-- Minimize code changes, rephrase subject and description
- "mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()"
-- Add ifdef to make it compile without ZONE_DEVICE

v4 -> v5:
- "mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()"
-- Add more details why ZONE_DEVICE is special
- Include two patches from Aneesh
-- "mm/memunmap: Use the correct start and end pfn when removing pages
    from zone"
-- "mm/memmap_init: Update variable name in memmap_init_zone"

v3 -> v4:
- Drop "mm/memremap: Get rid of memmap_init_zone_device()"
-- As Alexander noticed, it was messy either way
- Drop "mm/memory_hotplug: Exit early in __remove_pages() on BUGs"
- Drop "mm: Exit early in set_zone_contiguous() if already contiguous"
- Drop "mm/memory_hotplug: Optimize zone shrinking code when checking for
  holes"
- Merged "mm/memory_hotplug: Remove pages from a zone before removing
  memory" and "mm/memory_hotplug: Remove zone parameter from
  __remove_pages()" into "mm/memory_hotplug: Shrink zones when offlining
  memory"
- Added "mm/memory_hotplug: Poison memmap in remove_pfn_range_from_zone()"
- Stop shrinking ZONE_DEVICE
- Reshuffle patches, moving all fixes to the front. Add Fixes: tags.
- Change subject/description of various patches
- Minor changes (too many to mention)

Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>

Aneesh Kumar K.V (2):
  mm/memunmap: Don't access uninitialized memmap in memunmap_pages()
  mm/memmap_init: Update variable name in memmap_init_zone

David Hildenbrand (8):
  mm/memory_hotplug: Don't access uninitialized memmaps in
    shrink_pgdat_span()
  mm/memory_hotplug: Don't access uninitialized memmaps in
    shrink_zone_span()
  mm/memory_hotplug: Shrink zones when offlining memory
  mm/memory_hotplug: Poison memmap in remove_pfn_range_from_zone()
  mm/memory_hotplug: We always have a zone in
    find_(smallest|biggest)_section_pfn
  mm/memory_hotplug: Don't check for "all holes" in shrink_zone_span()
  mm/memory_hotplug: Drop local variables in shrink_zone_span()
  mm/memory_hotplug: Cleanup __remove_pages()

 arch/arm64/mm/mmu.c            |   4 +-
 arch/ia64/mm/init.c            |   4 +-
 arch/powerpc/mm/mem.c          |   3 +-
 arch/s390/mm/init.c            |   4 +-
 arch/sh/mm/init.c              |   4 +-
 arch/x86/mm/init_32.c          |   4 +-
 arch/x86/mm/init_64.c          |   4 +-
 include/linux/memory_hotplug.h |   7 +-
 mm/memory_hotplug.c            | 186 ++++++++++++---------------------
 mm/memremap.c                  |  13 ++-
 mm/page_alloc.c                |   8 +-
 11 files changed, 90 insertions(+), 151 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v6 01/10] mm/memunmap: Don't access uninitialized memmap in memunmap_pages()
  2019-10-06  8:56 [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
@ 2019-10-06  8:56 ` David Hildenbrand
  2019-10-06 19:58   ` Damian Tometzki
  2019-10-14  9:05   ` David Hildenbrand
  2019-10-06  8:56 ` [PATCH v6 02/10] mm/memmap_init: Update variable name in memmap_init_zone David Hildenbrand
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-10-06  8:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, Aneesh Kumar K.V, Dan Williams, Andrew Morton,
	Jason Gunthorpe, Logan Gunthorpe, Ira Weiny, David Hildenbrand

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>

With an altmap, the memmap falling into the reserved altmap space are
not initialized and, therefore, contain a garbage NID and a garbage
zone. Make sure to read the NID/zone from a memmap that was initialzed.

This fixes a kernel crash that is observed when destroying a namespace:

[   81.356173] kernel BUG at include/linux/mm.h:1107!
cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
    pc: c0000000004b9728: memunmap_pages+0x238/0x340
    lr: c0000000004b9724: memunmap_pages+0x234/0x340
...
    pid   = 3669, comm = ndctl
kernel BUG at include/linux/mm.h:1107!
[c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
[c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
[c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
[c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
[c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
[c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
[c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
[c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
[c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
[c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
[c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[ minimze code changes, rephrase description ]
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memremap.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/memremap.c b/mm/memremap.c
index 557e53c6fb46..8c2fb44c3b4d 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -123,6 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
 void memunmap_pages(struct dev_pagemap *pgmap)
 {
 	struct resource *res = &pgmap->res;
+	struct page *first_page;
 	unsigned long pfn;
 	int nid;
 
@@ -131,14 +132,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
 		put_page(pfn_to_page(pfn));
 	dev_pagemap_cleanup(pgmap);
 
+	/* make sure to access a memmap that was actually initialized */
+	first_page = pfn_to_page(pfn_first(pgmap));
+
 	/* pages are dead and unused, undo the arch mapping */
-	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
+	nid = page_to_nid(first_page);
 
 	mem_hotplug_begin();
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-		pfn = PHYS_PFN(res->start);
-		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
-				 PHYS_PFN(resource_size(res)), NULL);
+		__remove_pages(page_zone(first_page), PHYS_PFN(res->start),
+			       PHYS_PFN(resource_size(res)), NULL);
 	} else {
 		arch_remove_memory(nid, res->start, resource_size(res),
 				pgmap_altmap(pgmap));
-- 
2.21.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v6 02/10] mm/memmap_init: Update variable name in memmap_init_zone
  2019-10-06  8:56 [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
  2019-10-06  8:56 ` [PATCH v6 01/10] mm/memunmap: Don't access uninitialized memmap in memunmap_pages() David Hildenbrand
@ 2019-10-06  8:56 ` David Hildenbrand
  2019-10-06  8:56 ` [PATCH v6 03/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_pgdat_span() David Hildenbrand
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-10-06  8:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, Aneesh Kumar K.V, Andrew Morton, Michal Hocko,
	Vlastimil Babka, Oscar Salvador, Mel Gorman, Mike Rapoport,
	Dan Williams, Alexander Duyck, Pavel Tatashin,
	Alexander Potapenko, Pankaj Gupta, David Hildenbrand

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>

The third argument is actually number of pages. Changes the variable name
from size to nr_pages to indicate this better.

No functional change in this patch.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Alexander Potapenko <glider@google.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 15c2050c629b..b0b2d5464000 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5936,10 +5936,10 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 #ifdef CONFIG_ZONE_DEVICE
 void __ref memmap_init_zone_device(struct zone *zone,
 				   unsigned long start_pfn,
-				   unsigned long size,
+				   unsigned long nr_pages,
 				   struct dev_pagemap *pgmap)
 {
-	unsigned long pfn, end_pfn = start_pfn + size;
+	unsigned long pfn, end_pfn = start_pfn + nr_pages;
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	struct vmem_altmap *altmap = pgmap_altmap(pgmap);
 	unsigned long zone_idx = zone_idx(zone);
@@ -5956,7 +5956,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
 	 */
 	if (altmap) {
 		start_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
-		size = end_pfn - start_pfn;
+		nr_pages = end_pfn - start_pfn;
 	}
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
@@ -6003,7 +6003,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
 	}
 
 	pr_info("%s initialised %lu pages in %ums\n", __func__,
-		size, jiffies_to_msecs(jiffies - start));
+		nr_pages, jiffies_to_msecs(jiffies - start));
 }
 
 #endif
-- 
2.21.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v6 03/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_pgdat_span()
  2019-10-06  8:56 [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
  2019-10-06  8:56 ` [PATCH v6 01/10] mm/memunmap: Don't access uninitialized memmap in memunmap_pages() David Hildenbrand
  2019-10-06  8:56 ` [PATCH v6 02/10] mm/memmap_init: Update variable name in memmap_init_zone David Hildenbrand
@ 2019-10-06  8:56 ` David Hildenbrand
  2019-10-14  9:31   ` David Hildenbrand
  2019-10-06  8:56 ` [PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span() David Hildenbrand
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2019-10-06  8:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang,
	Aneesh Kumar K . V

We might use the nid of memmaps that were never initialized. For
example, if the memmap was poisoned, we will crash the kernel in
pfn_to_nid() right now. Let's use the calculated boundaries of the separate
zones instead. This now also avoids having to iterate over a whole bunch of
subsections again, after shrinking one zone.

Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized to 0 and the node was set to the
right value. After that commit, the node might be garbage.

We'll have to fix shrink_zone_span() next.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 72 ++++++++++-----------------------------------
 1 file changed, 15 insertions(+), 57 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 680b4b3e57d9..86b4dc18e831 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -436,67 +436,25 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 	zone_span_writeunlock(zone);
 }
 
-static void shrink_pgdat_span(struct pglist_data *pgdat,
-			      unsigned long start_pfn, unsigned long end_pfn)
+static void update_pgdat_span(struct pglist_data *pgdat)
 {
-	unsigned long pgdat_start_pfn = pgdat->node_start_pfn;
-	unsigned long p = pgdat_end_pfn(pgdat); /* pgdat_end_pfn namespace clash */
-	unsigned long pgdat_end_pfn = p;
-	unsigned long pfn;
-	int nid = pgdat->node_id;
-
-	if (pgdat_start_pfn == start_pfn) {
-		/*
-		 * If the section is smallest section in the pgdat, it need
-		 * shrink pgdat->node_start_pfn and pgdat->node_spanned_pages.
-		 * In this case, we find second smallest valid mem_section
-		 * for shrinking zone.
-		 */
-		pfn = find_smallest_section_pfn(nid, NULL, end_pfn,
-						pgdat_end_pfn);
-		if (pfn) {
-			pgdat->node_start_pfn = pfn;
-			pgdat->node_spanned_pages = pgdat_end_pfn - pfn;
-		}
-	} else if (pgdat_end_pfn == end_pfn) {
-		/*
-		 * If the section is biggest section in the pgdat, it need
-		 * shrink pgdat->node_spanned_pages.
-		 * In this case, we find second biggest valid mem_section for
-		 * shrinking zone.
-		 */
-		pfn = find_biggest_section_pfn(nid, NULL, pgdat_start_pfn,
-					       start_pfn);
-		if (pfn)
-			pgdat->node_spanned_pages = pfn - pgdat_start_pfn + 1;
-	}
-
-	/*
-	 * If the section is not biggest or smallest mem_section in the pgdat,
-	 * it only creates a hole in the pgdat. So in this case, we need not
-	 * change the pgdat.
-	 * But perhaps, the pgdat has only hole data. Thus it check the pgdat
-	 * has only hole or not.
-	 */
-	pfn = pgdat_start_pfn;
-	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUBSECTION) {
-		if (unlikely(!pfn_valid(pfn)))
-			continue;
-
-		if (pfn_to_nid(pfn) != nid)
-			continue;
+	unsigned long node_start_pfn = 0, node_end_pfn = 0;
+	struct zone *zone;
 
-		/* Skip range to be removed */
-		if (pfn >= start_pfn && pfn < end_pfn)
-			continue;
+	for (zone = pgdat->node_zones;
+	     zone < pgdat->node_zones + MAX_NR_ZONES; zone++) {
+		unsigned long zone_end_pfn = zone->zone_start_pfn +
+					     zone->spanned_pages;
 
-		/* If we find valid section, we have nothing to do */
-		return;
+		/* No need to lock the zones, they can't change. */
+		if (zone_end_pfn > node_end_pfn)
+			node_end_pfn = zone_end_pfn;
+		if (zone->zone_start_pfn < node_start_pfn)
+			node_start_pfn = zone->zone_start_pfn;
 	}
 
-	/* The pgdat has no valid section */
-	pgdat->node_start_pfn = 0;
-	pgdat->node_spanned_pages = 0;
+	pgdat->node_start_pfn = node_start_pfn;
+	pgdat->node_spanned_pages = node_end_pfn - node_start_pfn;
 }
 
 static void __remove_zone(struct zone *zone, unsigned long start_pfn,
@@ -507,7 +465,7 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn,
 
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
 	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
-	shrink_pgdat_span(pgdat, start_pfn, start_pfn + nr_pages);
+	update_pgdat_span(pgdat);
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
 }
 
-- 
2.21.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()
  2019-10-06  8:56 [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
                   ` (2 preceding siblings ...)
  2019-10-06  8:56 ` [PATCH v6 03/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_pgdat_span() David Hildenbrand
@ 2019-10-06  8:56 ` David Hildenbrand
  2019-10-14  9:32   ` David Hildenbrand
  2019-10-06  8:56 ` [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory David Hildenbrand
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2019-10-06  8:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Aneesh Kumar K . V

Let's limit shrinking to !ZONE_DEVICE so we can fix the current code. We
should never try to touch the memmap of offline sections where we could
have uninitialized memmaps and could trigger BUGs when calling
page_to_nid() on poisoned pages.

There is no reliable way to distinguish an uninitialized memmap from an
initialized memmap that belongs to ZONE_DEVICE, as we don't have
anything like SECTION_IS_ONLINE we can use similar to
pfn_to_online_section() for !ZONE_DEVICE memory. E.g.,
set_zone_contiguous() similarly relies on pfn_to_online_section() and
will therefore never set a ZONE_DEVICE zone consecutive. Stopping to
shrink the ZONE_DEVICE therefore results in no observable changes,
besides /proc/zoneinfo indicating different boundaries - something we
can totally live with.

Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized with 0 and the node with the
right value. So the zone might be wrong but not garbage. After that
commit, both the zone and the node will be garbage when touching
uninitialized memmaps.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 86b4dc18e831..f96608d24f6a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -331,7 +331,7 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
 				     unsigned long end_pfn)
 {
 	for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUBSECTION) {
-		if (unlikely(!pfn_valid(start_pfn)))
+		if (unlikely(!pfn_to_online_page(start_pfn)))
 			continue;
 
 		if (unlikely(pfn_to_nid(start_pfn) != nid))
@@ -356,7 +356,7 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
 	/* pfn is the end pfn of a memory section. */
 	pfn = end_pfn - 1;
 	for (; pfn >= start_pfn; pfn -= PAGES_PER_SUBSECTION) {
-		if (unlikely(!pfn_valid(pfn)))
+		if (unlikely(!pfn_to_online_page(pfn)))
 			continue;
 
 		if (unlikely(pfn_to_nid(pfn) != nid))
@@ -415,7 +415,7 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 	 */
 	pfn = zone_start_pfn;
 	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUBSECTION) {
-		if (unlikely(!pfn_valid(pfn)))
+		if (unlikely(!pfn_to_online_page(pfn)))
 			continue;
 
 		if (page_zone(pfn_to_page(pfn)) != zone)
@@ -463,6 +463,16 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn,
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	unsigned long flags;
 
+#ifdef CONFIG_ZONE_DEVICE
+	/*
+	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
+	 * we will not try to shrink the zones - which is okay as
+	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+	 */
+	if (zone_idx(zone) == ZONE_DEVICE)
+		return;
+#endif
+
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
 	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
 	update_pgdat_span(pgdat);
-- 
2.21.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
  2019-10-06  8:56 [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
                   ` (3 preceding siblings ...)
  2019-10-06  8:56 ` [PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span() David Hildenbrand
@ 2019-10-06  8:56 ` David Hildenbrand
  2019-10-14  9:39   ` David Hildenbrand
                     ` (2 more replies)
  2019-10-06  8:56 ` [PATCH v6 06/10] mm/memory_hotplug: Poison memmap in remove_pfn_range_from_zone() David Hildenbrand
                   ` (5 subsequent siblings)
  10 siblings, 3 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-10-06  8:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, David Hildenbrand, Catalin Marinas, Will Deacon,
	Tony Luck, Fenghua Yu, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Andrew Morton, Mark Rutland,
	Steve Capper, Mike Rapoport, Anshuman Khandual, Yu Zhao, Jun Yao,
	Robin Murphy, Michal Hocko, Oscar Salvador,
	Matthew Wilcox (Oracle),
	Christophe Leroy, Aneesh Kumar K.V, Pavel Tatashin,
	Gerald Schaefer, Halil Pasic, Tom Lendacky, Greg Kroah-Hartman,
	Masahiro Yamada, Dan Williams, Wei Yang, Qian Cai,
	Jason Gunthorpe, Logan Gunthorpe, Ira Weiny

We currently try to shrink a single zone when removing memory. We use the
zone of the first page of the memory we are removing. If that memmap was
never initialized (e.g., memory was never onlined), we will read garbage
and can trigger kernel BUGs (due to a stale pointer):

:/# [   23.912993] BUG: unable to handle page fault for address: 000000000000353d
[   23.914219] #PF: supervisor write access in kernel mode
[   23.915199] #PF: error_code(0x0002) - not-present page
[   23.916160] PGD 0 P4D 0
[   23.916627] Oops: 0002 [#1] SMP PTI
[   23.917256] CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317
[   23.918900] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
[   23.921194] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[   23.922249] RIP: 0010:clear_zone_contiguous+0x5/0x10
[   23.923173] Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840
[   23.926876] RSP: 0018:ffffad2400043c98 EFLAGS: 00010246
[   23.927928] RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
[   23.929458] RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40
[   23.930899] RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001
[   23.932362] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
[   23.933603] R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680
[   23.934913] FS:  0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000
[   23.936294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.937481] CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0
[   23.938687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.939889] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   23.941168] Call Trace:
[   23.941580]  __remove_pages+0x4b/0x640
[   23.942303]  ? mark_held_locks+0x49/0x70
[   23.943149]  arch_remove_memory+0x63/0x8d
[   23.943921]  try_remove_memory+0xdb/0x130
[   23.944766]  ? walk_memory_blocks+0x7f/0x9e
[   23.945616]  __remove_memory+0xa/0x11
[   23.946274]  acpi_memory_device_remove+0x70/0x100
[   23.947308]  acpi_bus_trim+0x55/0x90
[   23.947914]  acpi_device_hotplug+0x227/0x3a0
[   23.948714]  acpi_hotplug_work_fn+0x1a/0x30
[   23.949433]  process_one_work+0x221/0x550
[   23.950190]  worker_thread+0x50/0x3b0
[   23.950993]  kthread+0x105/0x140
[   23.951644]  ? process_one_work+0x550/0x550
[   23.952508]  ? kthread_park+0x80/0x80
[   23.953367]  ret_from_fork+0x3a/0x50
[   23.954025] Modules linked in:
[   23.954613] CR2: 000000000000353d
[   23.955248] ---[ end trace 93d982b1fb3e1a69 ]---

Instead, shrink the zones when offlining memory or when onlining failed.
Introduce and use remove_pfn_range_from_zone(() for that. We now properly
shrink the zones, even if we have DIMMs whereby
- Some memory blocks fall into no zone (never onlined)
- Some memory blocks fall into multiple zones (offlined+re-onlined)
- Multiple memory blocks that fall into different zones

Drop the zone parameter (with a potential dubious value) from
__remove_pages() and __remove_section().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/arm64/mm/mmu.c            |  4 +---
 arch/ia64/mm/init.c            |  4 +---
 arch/powerpc/mm/mem.c          |  3 +--
 arch/s390/mm/init.c            |  4 +---
 arch/sh/mm/init.c              |  4 +---
 arch/x86/mm/init_32.c          |  4 +---
 arch/x86/mm/init_64.c          |  4 +---
 include/linux/memory_hotplug.h |  7 +++++--
 mm/memory_hotplug.c            | 31 ++++++++++++++++---------------
 mm/memremap.c                  |  2 +-
 10 files changed, 29 insertions(+), 38 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 60c929f3683b..d10247fab0fd 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1069,7 +1069,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct zone *zone;
 
 	/*
 	 * FIXME: Cleanup page tables (also in arch_add_memory() in case
@@ -1078,7 +1077,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 	 * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
 	 * unlocked yet.
 	 */
-	zone = page_zone(pfn_to_page(start_pfn));
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap);
 }
 #endif
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index bf9df2625bc8..a6dd80a2c939 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -689,9 +689,7 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct zone *zone;
 
-	zone = page_zone(pfn_to_page(start_pfn));
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap);
 }
 #endif
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index be941d382c8d..97e5922cb52e 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -130,10 +130,9 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct page *page = pfn_to_page(start_pfn) + vmem_altmap_offset(altmap);
 	int ret;
 
-	__remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap);
 
 	/* Remove htab bolted mappings for this section of memory */
 	start = (unsigned long)__va(start);
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index a124f19f7b3c..c1d96e588152 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -291,10 +291,8 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct zone *zone;
 
-	zone = page_zone(pfn_to_page(start_pfn));
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap);
 	vmem_remove_mapping(start, size);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index dfdbaa50946e..d1b1ff2be17a 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -434,9 +434,7 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct zone *zone;
 
-	zone = page_zone(pfn_to_page(start_pfn));
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 930edeb41ec3..0a74407ef92e 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -865,10 +865,8 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct zone *zone;
 
-	zone = page_zone(pfn_to_page(start_pfn));
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap);
 }
 #endif
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a6b5c653727b..b8541d77452c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1212,10 +1212,8 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct page *page = pfn_to_page(start_pfn) + vmem_altmap_offset(altmap);
-	struct zone *zone = page_zone(page);
 
-	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(start_pfn, nr_pages, altmap);
 	kernel_physical_mapping_remove(start, start + size);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index bc477e98a310..517b70943732 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -126,8 +126,8 @@ static inline bool movable_node_is_enabled(void)
 
 extern void arch_remove_memory(int nid, u64 start, u64 size,
 			       struct vmem_altmap *altmap);
-extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
-			   unsigned long nr_pages, struct vmem_altmap *altmap);
+extern void __remove_pages(unsigned long start_pfn, unsigned long nr_pages,
+			   struct vmem_altmap *altmap);
 
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
@@ -346,6 +346,9 @@ extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
+extern void remove_pfn_range_from_zone(struct zone *zone,
+				       unsigned long start_pfn,
+				       unsigned long nr_pages);
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern int sparse_add_section(int nid, unsigned long pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f96608d24f6a..5b003ffa5dc9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -457,8 +457,9 @@ static void update_pgdat_span(struct pglist_data *pgdat)
 	pgdat->node_spanned_pages = node_end_pfn - node_start_pfn;
 }
 
-static void __remove_zone(struct zone *zone, unsigned long start_pfn,
-		unsigned long nr_pages)
+void __ref remove_pfn_range_from_zone(struct zone *zone,
+				      unsigned long start_pfn,
+				      unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	unsigned long flags;
@@ -473,28 +474,30 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn,
 		return;
 #endif
 
+	clear_zone_contiguous(zone);
+
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
 	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
 	update_pgdat_span(pgdat);
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
+
+	set_zone_contiguous(zone);
 }
 
-static void __remove_section(struct zone *zone, unsigned long pfn,
-		unsigned long nr_pages, unsigned long map_offset,
-		struct vmem_altmap *altmap)
+static void __remove_section(unsigned long pfn, unsigned long nr_pages,
+			     unsigned long map_offset,
+			     struct vmem_altmap *altmap)
 {
 	struct mem_section *ms = __nr_to_section(pfn_to_section_nr(pfn));
 
 	if (WARN_ON_ONCE(!valid_section(ms)))
 		return;
 
-	__remove_zone(zone, pfn, nr_pages);
 	sparse_remove_section(ms, pfn, nr_pages, map_offset, altmap);
 }
 
 /**
- * __remove_pages() - remove sections of pages from a zone
- * @zone: zone from which pages need to be removed
+ * __remove_pages() - remove sections of pages
  * @pfn: starting pageframe (must be aligned to start of a section)
  * @nr_pages: number of pages to remove (must be multiple of section size)
  * @altmap: alternative device page map or %NULL if default memmap is used
@@ -504,16 +507,14 @@ static void __remove_section(struct zone *zone, unsigned long pfn,
  * sure that pages are marked reserved and zones are adjust properly by
  * calling offline_pages().
  */
-void __remove_pages(struct zone *zone, unsigned long pfn,
-		    unsigned long nr_pages, struct vmem_altmap *altmap)
+void __remove_pages(unsigned long pfn, unsigned long nr_pages,
+		    struct vmem_altmap *altmap)
 {
 	unsigned long map_offset = 0;
 	unsigned long nr, start_sec, end_sec;
 
 	map_offset = vmem_altmap_offset(altmap);
 
-	clear_zone_contiguous(zone);
-
 	if (check_pfn_span(pfn, nr_pages, "remove"))
 		return;
 
@@ -525,13 +526,11 @@ void __remove_pages(struct zone *zone, unsigned long pfn,
 		cond_resched();
 		pfns = min(nr_pages, PAGES_PER_SECTION
 				- (pfn & ~PAGE_SECTION_MASK));
-		__remove_section(zone, pfn, pfns, map_offset, altmap);
+		__remove_section(pfn, pfns, map_offset, altmap);
 		pfn += pfns;
 		nr_pages -= pfns;
 		map_offset = 0;
 	}
-
-	set_zone_contiguous(zone);
 }
 
 int set_online_page_callback(online_page_callback_t callback)
@@ -859,6 +858,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 		 (unsigned long long) pfn << PAGE_SHIFT,
 		 (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
 	memory_notify(MEM_CANCEL_ONLINE, &arg);
+	remove_pfn_range_from_zone(zone, pfn, nr_pages);
 	mem_hotplug_done();
 	return ret;
 }
@@ -1605,6 +1605,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	writeback_set_ratelimit();
 
 	memory_notify(MEM_OFFLINE, &arg);
+	remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
 	mem_hotplug_done();
 	return 0;
 
diff --git a/mm/memremap.c b/mm/memremap.c
index 8c2fb44c3b4d..70263e6f093e 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -140,7 +140,7 @@ void memunmap_pages(struct dev_pagemap *pgmap)
 
 	mem_hotplug_begin();
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-		__remove_pages(page_zone(first_page), PHYS_PFN(res->start),
+		__remove_pages(PHYS_PFN(res->start),
 			       PHYS_PFN(resource_size(res)), NULL);
 	} else {
 		arch_remove_memory(nid, res->start, resource_size(res),
-- 
2.21.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v6 06/10] mm/memory_hotplug: Poison memmap in remove_pfn_range_from_zone()
  2019-10-06  8:56 [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
                   ` (4 preceding siblings ...)
  2019-10-06  8:56 ` [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory David Hildenbrand
@ 2019-10-06  8:56 ` David Hildenbrand
  2019-10-16 14:01   ` David Hildenbrand
  2019-10-06  8:56 ` [PATCH v6 07/10] mm/memory_hotplug: We always have a zone in find_(smallest|biggest)_section_pfn David Hildenbrand
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2019-10-06  8:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams

Let's poison the pages similar to when adding new memory in
sparse_add_section(). Also call remove_pfn_range_from_zone() from
memunmap_pages(), so we can poison the memmap from there as well.

While at it, calculate the pfn in memunmap_pages() only once.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 3 +++
 mm/memremap.c       | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5b003ffa5dc9..bf5173e7913d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -464,6 +464,9 @@ void __ref remove_pfn_range_from_zone(struct zone *zone,
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	unsigned long flags;
 
+	/* Poison struct pages because they are now uninitialized again. */
+	page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages);
+
 #ifdef CONFIG_ZONE_DEVICE
 	/*
 	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
diff --git a/mm/memremap.c b/mm/memremap.c
index 70263e6f093e..7fed8bd32a18 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -139,6 +139,8 @@ void memunmap_pages(struct dev_pagemap *pgmap)
 	nid = page_to_nid(first_page);
 
 	mem_hotplug_begin();
+	remove_pfn_range_from_zone(page_zone(first_page), PHYS_PFN(res->start),
+				   PHYS_PFN(resource_size(res)));
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
 		__remove_pages(PHYS_PFN(res->start),
 			       PHYS_PFN(resource_size(res)), NULL);
-- 
2.21.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v6 07/10] mm/memory_hotplug: We always have a zone in find_(smallest|biggest)_section_pfn
  2019-10-06  8:56 [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
                   ` (5 preceding siblings ...)
  2019-10-06  8:56 ` [PATCH v6 06/10] mm/memory_hotplug: Poison memmap in remove_pfn_range_from_zone() David Hildenbrand
@ 2019-10-06  8:56 ` David Hildenbrand
  2019-10-06  8:56 ` [PATCH v6 08/10] mm/memory_hotplug: Don't check for "all holes" in shrink_zone_span() David Hildenbrand
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-10-06  8:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang

With shrink_pgdat_span() out of the way, we now always have a valid
zone.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index bf5173e7913d..f294918f7211 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -337,7 +337,7 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
 		if (unlikely(pfn_to_nid(start_pfn) != nid))
 			continue;
 
-		if (zone && zone != page_zone(pfn_to_page(start_pfn)))
+		if (zone != page_zone(pfn_to_page(start_pfn)))
 			continue;
 
 		return start_pfn;
@@ -362,7 +362,7 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
 		if (unlikely(pfn_to_nid(pfn) != nid))
 			continue;
 
-		if (zone && zone != page_zone(pfn_to_page(pfn)))
+		if (zone != page_zone(pfn_to_page(pfn)))
 			continue;
 
 		return pfn;
-- 
2.21.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v6 08/10] mm/memory_hotplug: Don't check for "all holes" in shrink_zone_span()
  2019-10-06  8:56 [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
                   ` (6 preceding siblings ...)
  2019-10-06  8:56 ` [PATCH v6 07/10] mm/memory_hotplug: We always have a zone in find_(smallest|biggest)_section_pfn David Hildenbrand
@ 2019-10-06  8:56 ` David Hildenbrand
  2019-10-06  8:56 ` [PATCH v6 09/10] mm/memory_hotplug: Drop local variables " David Hildenbrand
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-10-06  8:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang

If we have holes, the holes will automatically get detected and removed
once we remove the next bigger/smaller section. The extra checks can
go.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 34 +++++++---------------------------
 1 file changed, 7 insertions(+), 27 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f294918f7211..8dafa1ba8d9f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -393,6 +393,9 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 		if (pfn) {
 			zone->zone_start_pfn = pfn;
 			zone->spanned_pages = zone_end_pfn - pfn;
+		} else {
+			zone->zone_start_pfn = 0;
+			zone->spanned_pages = 0;
 		}
 	} else if (zone_end_pfn == end_pfn) {
 		/*
@@ -405,34 +408,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 					       start_pfn);
 		if (pfn)
 			zone->spanned_pages = pfn - zone_start_pfn + 1;
+		else {
+			zone->zone_start_pfn = 0;
+			zone->spanned_pages = 0;
+		}
 	}
-
-	/*
-	 * The section is not biggest or smallest mem_section in the zone, it
-	 * only creates a hole in the zone. So in this case, we need not
-	 * change the zone. But perhaps, the zone has only hole data. Thus
-	 * it check the zone has only hole or not.
-	 */
-	pfn = zone_start_pfn;
-	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUBSECTION) {
-		if (unlikely(!pfn_to_online_page(pfn)))
-			continue;
-
-		if (page_zone(pfn_to_page(pfn)) != zone)
-			continue;
-
-		/* Skip range to be removed */
-		if (pfn >= start_pfn && pfn < end_pfn)
-			continue;
-
-		/* If we find valid section, we have nothing to do */
-		zone_span_writeunlock(zone);
-		return;
-	}
-
-	/* The zone has no valid section */
-	zone->zone_start_pfn = 0;
-	zone->spanned_pages = 0;
 	zone_span_writeunlock(zone);
 }
 
-- 
2.21.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v6 09/10] mm/memory_hotplug: Drop local variables in shrink_zone_span()
  2019-10-06  8:56 [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
                   ` (7 preceding siblings ...)
  2019-10-06  8:56 ` [PATCH v6 08/10] mm/memory_hotplug: Don't check for "all holes" in shrink_zone_span() David Hildenbrand
@ 2019-10-06  8:56 ` " David Hildenbrand
  2019-10-06  8:56 ` [PATCH v6 10/10] mm/memory_hotplug: Cleanup __remove_pages() David Hildenbrand
  2019-12-02  9:09 ` [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
  10 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-10-06  8:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang

Get rid of the unnecessary local variables.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8dafa1ba8d9f..843481bd507d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -374,14 +374,11 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
 static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 			     unsigned long end_pfn)
 {
-	unsigned long zone_start_pfn = zone->zone_start_pfn;
-	unsigned long z = zone_end_pfn(zone); /* zone_end_pfn namespace clash */
-	unsigned long zone_end_pfn = z;
 	unsigned long pfn;
 	int nid = zone_to_nid(zone);
 
 	zone_span_writelock(zone);
-	if (zone_start_pfn == start_pfn) {
+	if (zone->zone_start_pfn == start_pfn) {
 		/*
 		 * If the section is smallest section in the zone, it need
 		 * shrink zone->zone_start_pfn and zone->zone_spanned_pages.
@@ -389,25 +386,25 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 		 * for shrinking zone.
 		 */
 		pfn = find_smallest_section_pfn(nid, zone, end_pfn,
-						zone_end_pfn);
+						zone_end_pfn(zone));
 		if (pfn) {
+			zone->spanned_pages = zone_end_pfn(zone) - pfn;
 			zone->zone_start_pfn = pfn;
-			zone->spanned_pages = zone_end_pfn - pfn;
 		} else {
 			zone->zone_start_pfn = 0;
 			zone->spanned_pages = 0;
 		}
-	} else if (zone_end_pfn == end_pfn) {
+	} else if (zone_end_pfn(zone) == end_pfn) {
 		/*
 		 * If the section is biggest section in the zone, it need
 		 * shrink zone->spanned_pages.
 		 * In this case, we find second biggest valid mem_section for
 		 * shrinking zone.
 		 */
-		pfn = find_biggest_section_pfn(nid, zone, zone_start_pfn,
+		pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
 					       start_pfn);
 		if (pfn)
-			zone->spanned_pages = pfn - zone_start_pfn + 1;
+			zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
 		else {
 			zone->zone_start_pfn = 0;
 			zone->spanned_pages = 0;
-- 
2.21.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v6 10/10] mm/memory_hotplug: Cleanup __remove_pages()
  2019-10-06  8:56 [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
                   ` (8 preceding siblings ...)
  2019-10-06  8:56 ` [PATCH v6 09/10] mm/memory_hotplug: Drop local variables " David Hildenbrand
@ 2019-10-06  8:56 ` David Hildenbrand
  2019-12-02  9:09 ` [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
  10 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-10-06  8:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang

Let's drop the basically unused section stuff and simplify.

Also, let's use a shorter variant to calculate the number of pages to
the next section boundary.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 843481bd507d..2275240cfa10 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -490,25 +490,20 @@ static void __remove_section(unsigned long pfn, unsigned long nr_pages,
 void __remove_pages(unsigned long pfn, unsigned long nr_pages,
 		    struct vmem_altmap *altmap)
 {
+	const unsigned long end_pfn = pfn + nr_pages;
+	unsigned long cur_nr_pages;
 	unsigned long map_offset = 0;
-	unsigned long nr, start_sec, end_sec;
 
 	map_offset = vmem_altmap_offset(altmap);
 
 	if (check_pfn_span(pfn, nr_pages, "remove"))
 		return;
 
-	start_sec = pfn_to_section_nr(pfn);
-	end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
-	for (nr = start_sec; nr <= end_sec; nr++) {
-		unsigned long pfns;
-
+	for (; pfn < end_pfn; pfn += cur_nr_pages) {
 		cond_resched();
-		pfns = min(nr_pages, PAGES_PER_SECTION
-				- (pfn & ~PAGE_SECTION_MASK));
-		__remove_section(pfn, pfns, map_offset, altmap);
-		pfn += pfns;
-		nr_pages -= pfns;
+		/* Select all remaining pages up to the next section boundary */
+		cur_nr_pages = min(end_pfn - pfn, -(pfn | PAGE_SECTION_MASK));
+		__remove_section(pfn, cur_nr_pages, map_offset, altmap);
 		map_offset = 0;
 	}
 }
-- 
2.21.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 01/10] mm/memunmap: Don't access uninitialized memmap in memunmap_pages()
  2019-10-06  8:56 ` [PATCH v6 01/10] mm/memunmap: Don't access uninitialized memmap in memunmap_pages() David Hildenbrand
@ 2019-10-06 19:58   ` Damian Tometzki
  2019-10-06 20:13     ` David Hildenbrand
  2019-10-14  9:05   ` David Hildenbrand
  1 sibling, 1 reply; 29+ messages in thread
From: Damian Tometzki @ 2019-10-06 19:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-arm-kernel, linux-ia64,
	linuxppc-dev, linux-s390, linux-sh, x86, Aneesh Kumar K.V,
	Dan Williams, Andrew Morton, Jason Gunthorpe, Logan Gunthorpe,
	Ira Weiny

Hello David,

patch 05/10 is missing in the patch series. 


On Sun, 06. Oct 10:56, David Hildenbrand wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> 
> With an altmap, the memmap falling into the reserved altmap space are
> not initialized and, therefore, contain a garbage NID and a garbage
> zone. Make sure to read the NID/zone from a memmap that was initialzed.
> 
> This fixes a kernel crash that is observed when destroying a namespace:
> 
> [   81.356173] kernel BUG at include/linux/mm.h:1107!
> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>     pc: c0000000004b9728: memunmap_pages+0x238/0x340
>     lr: c0000000004b9724: memunmap_pages+0x234/0x340
> ...
>     pid   = 3669, comm = ndctl
> kernel BUG at include/linux/mm.h:1107!
> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> [ minimze code changes, rephrase description ]
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memremap.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 557e53c6fb46..8c2fb44c3b4d 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -123,6 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>  void memunmap_pages(struct dev_pagemap *pgmap)
>  {
>  	struct resource *res = &pgmap->res;
> +	struct page *first_page;
>  	unsigned long pfn;
>  	int nid;
>  
> @@ -131,14 +132,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>  		put_page(pfn_to_page(pfn));
>  	dev_pagemap_cleanup(pgmap);
>  
> +	/* make sure to access a memmap that was actually initialized */
> +	first_page = pfn_to_page(pfn_first(pgmap));
> +
>  	/* pages are dead and unused, undo the arch mapping */
> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
> +	nid = page_to_nid(first_page);

Why we need 'nid = page_to_nid(first_page)' we didnt use it anymore in this function ?

>  
>  	mem_hotplug_begin();
>  	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> -		pfn = PHYS_PFN(res->start);
> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
> -				 PHYS_PFN(resource_size(res)), NULL);
> +		__remove_pages(page_zone(first_page), PHYS_PFN(res->start),
> +			       PHYS_PFN(resource_size(res)), NULL);
>  	} else {
>  		arch_remove_memory(nid, res->start, resource_size(res),
>  				pgmap_altmap(pgmap));
> -- 
> 2.21.0
>
Best regards
Damian
 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 01/10] mm/memunmap: Don't access uninitialized memmap in memunmap_pages()
  2019-10-06 19:58   ` Damian Tometzki
@ 2019-10-06 20:13     ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-10-06 20:13 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-arm-kernel, linux-ia64,
	linuxppc-dev, linux-s390, linux-sh, x86, Aneesh Kumar K.V,
	Dan Williams, Andrew Morton, Jason Gunthorpe, Logan Gunthorpe,
	Ira Weiny

On 06.10.19 21:58, Damian Tometzki wrote:
> Hello David,
> 
> patch 05/10 is missing in the patch series. 
> 

Hi Damian,

not really. Could be that lkml is slow today. E.g., check

https://marc.info/?l=linux-mm&m=157035222620403&w=2

and especially

https://marc.info/?l=linux-mm&m=157035225120440&w=2

All mails popped up on the mm list.

> 
> On Sun, 06. Oct 10:56, David Hildenbrand wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>
>> With an altmap, the memmap falling into the reserved altmap space are
>> not initialized and, therefore, contain a garbage NID and a garbage
>> zone. Make sure to read the NID/zone from a memmap that was initialzed.
>>
>> This fixes a kernel crash that is observed when destroying a namespace:
>>
>> [   81.356173] kernel BUG at include/linux/mm.h:1107!
>> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>>     pc: c0000000004b9728: memunmap_pages+0x238/0x340
>>     lr: c0000000004b9724: memunmap_pages+0x234/0x340
>> ...
>>     pid   = 3669, comm = ndctl
>> kernel BUG at include/linux/mm.h:1107!
>> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
>> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
>> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
>> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
>> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
>> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
>> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
>> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
>> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
>> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
>> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
>>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>> Cc: Logan Gunthorpe <logang@deltatee.com>
>> Cc: Ira Weiny <ira.weiny@intel.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> [ minimze code changes, rephrase description ]
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  mm/memremap.c | 11 +++++++----
>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memremap.c b/mm/memremap.c
>> index 557e53c6fb46..8c2fb44c3b4d 100644
>> --- a/mm/memremap.c
>> +++ b/mm/memremap.c
>> @@ -123,6 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>>  void memunmap_pages(struct dev_pagemap *pgmap)
>>  {
>>  	struct resource *res = &pgmap->res;
>> +	struct page *first_page;
>>  	unsigned long pfn;
>>  	int nid;
>>  
>> @@ -131,14 +132,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>  		put_page(pfn_to_page(pfn));
>>  	dev_pagemap_cleanup(pgmap);
>>  
>> +	/* make sure to access a memmap that was actually initialized */
>> +	first_page = pfn_to_page(pfn_first(pgmap));
>> +
>>  	/* pages are dead and unused, undo the arch mapping */
>> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
>> +	nid = page_to_nid(first_page);
> 
> Why we need 'nid = page_to_nid(first_page)' we didnt use it anymore in this function ?

Please see ...

> 
>>  
>>  	mem_hotplug_begin();
>>  	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>> -		pfn = PHYS_PFN(res->start);
>> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
>> -				 PHYS_PFN(resource_size(res)), NULL);
>> +		__remove_pages(page_zone(first_page), PHYS_PFN(res->start),
>> +			       PHYS_PFN(resource_size(res)), NULL);
>>  	} else {
>>  		arch_remove_memory(nid, res->start, resource_size(res),
                                   ^ here

:)

>>  				pgmap_altmap(pgmap));



-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 01/10] mm/memunmap: Don't access uninitialized memmap in memunmap_pages()
  2019-10-06  8:56 ` [PATCH v6 01/10] mm/memunmap: Don't access uninitialized memmap in memunmap_pages() David Hildenbrand
  2019-10-06 19:58   ` Damian Tometzki
@ 2019-10-14  9:05   ` David Hildenbrand
  1 sibling, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-10-14  9:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, Aneesh Kumar K.V, Dan Williams, Andrew Morton,
	Jason Gunthorpe, Logan Gunthorpe, Ira Weiny

On 06.10.19 10:56, David Hildenbrand wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> 
> With an altmap, the memmap falling into the reserved altmap space are
> not initialized and, therefore, contain a garbage NID and a garbage
> zone. Make sure to read the NID/zone from a memmap that was initialzed.
> 
> This fixes a kernel crash that is observed when destroying a namespace:
> 
> [   81.356173] kernel BUG at include/linux/mm.h:1107!
> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>      pc: c0000000004b9728: memunmap_pages+0x238/0x340
>      lr: c0000000004b9724: memunmap_pages+0x234/0x340
> ...
>      pid   = 3669, comm = ndctl
> kernel BUG at include/linux/mm.h:1107!
> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> [ minimze code changes, rephrase description ]
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   mm/memremap.c | 11 +++++++----
>   1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 557e53c6fb46..8c2fb44c3b4d 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -123,6 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>   void memunmap_pages(struct dev_pagemap *pgmap)
>   {
>   	struct resource *res = &pgmap->res;
> +	struct page *first_page;
>   	unsigned long pfn;
>   	int nid;
>   
> @@ -131,14 +132,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>   		put_page(pfn_to_page(pfn));
>   	dev_pagemap_cleanup(pgmap);
>   
> +	/* make sure to access a memmap that was actually initialized */
> +	first_page = pfn_to_page(pfn_first(pgmap));
> +
>   	/* pages are dead and unused, undo the arch mapping */
> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
> +	nid = page_to_nid(first_page);
>   
>   	mem_hotplug_begin();
>   	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> -		pfn = PHYS_PFN(res->start);
> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
> -				 PHYS_PFN(resource_size(res)), NULL);
> +		__remove_pages(page_zone(first_page), PHYS_PFN(res->start),
> +			       PHYS_PFN(resource_size(res)), NULL);
>   	} else {
>   		arch_remove_memory(nid, res->start, resource_size(res),
>   				pgmap_altmap(pgmap));
> 

@Andrew, can you add

Fixes: 2c2a5af6fed2 ("mm, memory_hotplug: add nid parameter to 
arch_remove_memory")

(which basically introduced the nid = page_to_nid(first_page))

The "page_zone(pfn_to_page(pfn)" was introduced by 69324b8f4833 ("mm, 
devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support"), however, I 
think we will never have driver reserved memory with 
MEMORY_DEVICE_PRIVATE (no altmap AFAIKS).

Also, I think

Cc: stable@vger.kernel.org # v5.0+

makes sense.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 03/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_pgdat_span()
  2019-10-06  8:56 ` [PATCH v6 03/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_pgdat_span() David Hildenbrand
@ 2019-10-14  9:31   ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-10-14  9:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, Oscar Salvador, Michal Hocko, Pavel Tatashin,
	Dan Williams, Wei Yang, Aneesh Kumar K . V

On 06.10.19 10:56, David Hildenbrand wrote:
> We might use the nid of memmaps that were never initialized. For
> example, if the memmap was poisoned, we will crash the kernel in
> pfn_to_nid() right now. Let's use the calculated boundaries of the separate
> zones instead. This now also avoids having to iterate over a whole bunch of
> subsections again, after shrinking one zone.
> 
> Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
> hotplug"), the memmap was initialized to 0 and the node was set to the
> right value. After that commit, the node might be garbage.
> 
> We'll have to fix shrink_zone_span() next.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Wei Yang <richardw.yang@linux.intel.com>
> Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")

@Andrew, can you convert that to

Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319

and add

Cc: stable@vger.kernel.org # v4.13+

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()
  2019-10-06  8:56 ` [PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span() David Hildenbrand
@ 2019-10-14  9:32   ` David Hildenbrand
  2019-10-14 19:17     ` Andrew Morton
  0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2019-10-14  9:32 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, Oscar Salvador, Michal Hocko, Pavel Tatashin,
	Dan Williams, Aneesh Kumar K . V

On 06.10.19 10:56, David Hildenbrand wrote:
> Let's limit shrinking to !ZONE_DEVICE so we can fix the current code. We
> should never try to touch the memmap of offline sections where we could
> have uninitialized memmaps and could trigger BUGs when calling
> page_to_nid() on poisoned pages.
> 
> There is no reliable way to distinguish an uninitialized memmap from an
> initialized memmap that belongs to ZONE_DEVICE, as we don't have
> anything like SECTION_IS_ONLINE we can use similar to
> pfn_to_online_section() for !ZONE_DEVICE memory. E.g.,
> set_zone_contiguous() similarly relies on pfn_to_online_section() and
> will therefore never set a ZONE_DEVICE zone consecutive. Stopping to
> shrink the ZONE_DEVICE therefore results in no observable changes,
> besides /proc/zoneinfo indicating different boundaries - something we
> can totally live with.
> 
> Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
> hotplug"), the memmap was initialized with 0 and the node with the
> right value. So the zone might be wrong but not garbage. After that
> commit, both the zone and the node will be garbage when touching
> uninitialized memmaps.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")

@Andrew, can you convert that to

Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded 
memory to zones until online") # visible after d0dc12e86b319

and add

Cc: stable@vger.kernel.org # v4.13+


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
  2019-10-06  8:56 ` [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory David Hildenbrand
@ 2019-10-14  9:39   ` David Hildenbrand
  2019-10-14 19:16     ` Andrew Morton
  2019-10-27 22:45   ` David Hildenbrand
  2019-12-03 15:10   ` Oscar Salvador
  2 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2019-10-14  9:39 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, Catalin Marinas, Will Deacon, Tony Luck,
	Fenghua Yu, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Mark Rutland, Steve Capper,
	Mike Rapoport, Anshuman Khandual, Yu Zhao, Jun Yao, Robin Murphy,
	Michal Hocko, Oscar Salvador, Matthew Wilcox (Oracle),
	Christophe Leroy, Aneesh Kumar K.V, Pavel Tatashin,
	Gerald Schaefer, Halil Pasic, Tom Lendacky, Greg Kroah-Hartman,
	Masahiro Yamada, Dan Williams, Wei Yang, Qian Cai,
	Jason Gunthorpe, Logan Gunthorpe, Ira Weiny

On 06.10.19 10:56, David Hildenbrand wrote:
> We currently try to shrink a single zone when removing memory. We use the
> zone of the first page of the memory we are removing. If that memmap was
> never initialized (e.g., memory was never onlined), we will read garbage
> and can trigger kernel BUGs (due to a stale pointer):
> 
> :/# [   23.912993] BUG: unable to handle page fault for address: 000000000000353d
> [   23.914219] #PF: supervisor write access in kernel mode
> [   23.915199] #PF: error_code(0x0002) - not-present page
> [   23.916160] PGD 0 P4D 0
> [   23.916627] Oops: 0002 [#1] SMP PTI
> [   23.917256] CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317
> [   23.918900] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
> [   23.921194] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> [   23.922249] RIP: 0010:clear_zone_contiguous+0x5/0x10
> [   23.923173] Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840
> [   23.926876] RSP: 0018:ffffad2400043c98 EFLAGS: 00010246
> [   23.927928] RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
> [   23.929458] RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40
> [   23.930899] RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001
> [   23.932362] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
> [   23.933603] R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680
> [   23.934913] FS:  0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000
> [   23.936294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   23.937481] CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0
> [   23.938687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   23.939889] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   23.941168] Call Trace:
> [   23.941580]  __remove_pages+0x4b/0x640
> [   23.942303]  ? mark_held_locks+0x49/0x70
> [   23.943149]  arch_remove_memory+0x63/0x8d
> [   23.943921]  try_remove_memory+0xdb/0x130
> [   23.944766]  ? walk_memory_blocks+0x7f/0x9e
> [   23.945616]  __remove_memory+0xa/0x11
> [   23.946274]  acpi_memory_device_remove+0x70/0x100
> [   23.947308]  acpi_bus_trim+0x55/0x90
> [   23.947914]  acpi_device_hotplug+0x227/0x3a0
> [   23.948714]  acpi_hotplug_work_fn+0x1a/0x30
> [   23.949433]  process_one_work+0x221/0x550
> [   23.950190]  worker_thread+0x50/0x3b0
> [   23.950993]  kthread+0x105/0x140
> [   23.951644]  ? process_one_work+0x550/0x550
> [   23.952508]  ? kthread_park+0x80/0x80
> [   23.953367]  ret_from_fork+0x3a/0x50
> [   23.954025] Modules linked in:
> [   23.954613] CR2: 000000000000353d
> [   23.955248] ---[ end trace 93d982b1fb3e1a69 ]---
> 
> Instead, shrink the zones when offlining memory or when onlining failed.
> Introduce and use remove_pfn_range_from_zone(() for that. We now properly
> shrink the zones, even if we have DIMMs whereby
> - Some memory blocks fall into no zone (never onlined)
> - Some memory blocks fall into multiple zones (offlined+re-onlined)
> - Multiple memory blocks that fall into different zones
> 
> Drop the zone parameter (with a potential dubious value) from
> __remove_pages() and __remove_section().
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: Rich Felker <dalias@libc.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Steve Capper <steve.capper@arm.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Jun Yao <yaojun8558363@gmail.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-ia64@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-s390@vger.kernel.org
> Cc: linux-sh@vger.kernel.org
> Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")

@Andrew, can you convert that to

Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319

While adding cc'ing stable@vger.kernel.org # v4.13+ would be nice,
I doubt it will be easily possible to backport, as we are missing
some prereq patches (e.g., from Oscar like 2c2a5af6fed2 ("mm,
memory_hotplug: add nid parameter to arch_remove_memory")). But, it could
be done with some work.

I think "Cc: stable@vger.kernel.org # v5.0+" could be done more
easily. Maybe it's okay to not cc:stable this one. We usually
online all memory (except s390x), however, s390x does not remove that
memory ever. Devmem with driver reserved memory would be, however,
worth backporting this.

Thoughts?


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
  2019-10-14  9:39   ` David Hildenbrand
@ 2019-10-14 19:16     ` Andrew Morton
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2019-10-14 19:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-arm-kernel, linux-ia64,
	linuxppc-dev, linux-s390, linux-sh, x86, Catalin Marinas,
	Will Deacon, Tony Luck, Fenghua Yu, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Mark Rutland, Steve Capper,
	Mike Rapoport, Anshuman Khandual, Yu Zhao, Jun Yao, Robin Murphy,
	Michal Hocko, Oscar Salvador, Matthew Wilcox (Oracle),
	Christophe Leroy, Aneesh Kumar K.V, Pavel Tatashin,
	Gerald Schaefer, Halil Pasic, Tom Lendacky, Greg Kroah-Hartman,
	Masahiro Yamada, Dan Williams, Wei Yang, Qian Cai,
	Jason Gunthorpe, Logan Gunthorpe, Ira Weiny

On Mon, 14 Oct 2019 11:39:13 +0200 David Hildenbrand <david@redhat.com> wrote:

> > Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
> 
> @Andrew, can you convert that to
> 
> Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319

Done.

> While adding cc'ing stable@vger.kernel.org # v4.13+ would be nice,
> I doubt it will be easily possible to backport, as we are missing
> some prereq patches (e.g., from Oscar like 2c2a5af6fed2 ("mm,
> memory_hotplug: add nid parameter to arch_remove_memory")). But, it could
> be done with some work.
> 
> I think "Cc: stable@vger.kernel.org # v5.0+" could be done more
> easily. Maybe it's okay to not cc:stable this one. We usually
> online all memory (except s390x), however, s390x does not remove that
> memory ever. Devmem with driver reserved memory would be, however,
> worth backporting this.

I added 

Cc: <stable@vger.kernel.org>	[5.0+]


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()
  2019-10-14  9:32   ` David Hildenbrand
@ 2019-10-14 19:17     ` Andrew Morton
  2019-11-19 14:16       ` David Hildenbrand
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2019-10-14 19:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-arm-kernel, linux-ia64,
	linuxppc-dev, linux-s390, linux-sh, x86, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Aneesh Kumar K . V

On Mon, 14 Oct 2019 11:32:13 +0200 David Hildenbrand <david@redhat.com> wrote:

> > Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
> 
> @Andrew, can you convert that to
> 
> Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded 
> memory to zones until online") # visible after d0dc12e86b319
> 
> and add
> 
> Cc: stable@vger.kernel.org # v4.13+

Done, thanks.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 06/10] mm/memory_hotplug: Poison memmap in remove_pfn_range_from_zone()
  2019-10-06  8:56 ` [PATCH v6 06/10] mm/memory_hotplug: Poison memmap in remove_pfn_range_from_zone() David Hildenbrand
@ 2019-10-16 14:01   ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-10-16 14:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, Andrew Morton, Oscar Salvador, Michal Hocko,
	Pavel Tatashin, Dan Williams

On 06.10.19 10:56, David Hildenbrand wrote:
> Let's poison the pages similar to when adding new memory in
> sparse_add_section(). Also call remove_pfn_range_from_zone() from
> memunmap_pages(), so we can poison the memmap from there as well.
> 
> While at it, calculate the pfn in memunmap_pages() only once.

FWIW, this comment is stale and could be dropped :)

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
  2019-10-06  8:56 ` [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory David Hildenbrand
  2019-10-14  9:39   ` David Hildenbrand
@ 2019-10-27 22:45   ` David Hildenbrand
  2019-11-30 23:21     ` Andrew Morton
  2019-12-03 15:10   ` Oscar Salvador
  2 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2019-10-27 22:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, Catalin Marinas, Will Deacon, Tony Luck,
	Fenghua Yu, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Andrew Morton, Mark Rutland,
	Steve Capper, Mike Rapoport, Anshuman Khandual, Yu Zhao, Jun Yao,
	Robin Murphy, Michal Hocko, Oscar Salvador,
	Matthew Wilcox (Oracle),
	Christophe Leroy, Aneesh Kumar K.V, Pavel Tatashin,
	Gerald Schaefer, Halil Pasic, Tom Lendacky, Greg Kroah-Hartman,
	Masahiro Yamada, Dan Williams, Wei Yang, Qian Cai,
	Jason Gunthorpe, Logan Gunthorpe, Ira Weiny

On 06.10.19 10:56, David Hildenbrand wrote:
> We currently try to shrink a single zone when removing memory. We use the
> zone of the first page of the memory we are removing. If that memmap was
> never initialized (e.g., memory was never onlined), we will read garbage
> and can trigger kernel BUGs (due to a stale pointer):
> 
> :/# [   23.912993] BUG: unable to handle page fault for address: 000000000000353d
> [   23.914219] #PF: supervisor write access in kernel mode
> [   23.915199] #PF: error_code(0x0002) - not-present page
> [   23.916160] PGD 0 P4D 0
> [   23.916627] Oops: 0002 [#1] SMP PTI
> [   23.917256] CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317
> [   23.918900] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
> [   23.921194] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> [   23.922249] RIP: 0010:clear_zone_contiguous+0x5/0x10
> [   23.923173] Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840
> [   23.926876] RSP: 0018:ffffad2400043c98 EFLAGS: 00010246
> [   23.927928] RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
> [   23.929458] RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40
> [   23.930899] RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001
> [   23.932362] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
> [   23.933603] R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680
> [   23.934913] FS:  0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000
> [   23.936294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   23.937481] CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0
> [   23.938687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   23.939889] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   23.941168] Call Trace:
> [   23.941580]  __remove_pages+0x4b/0x640
> [   23.942303]  ? mark_held_locks+0x49/0x70
> [   23.943149]  arch_remove_memory+0x63/0x8d
> [   23.943921]  try_remove_memory+0xdb/0x130
> [   23.944766]  ? walk_memory_blocks+0x7f/0x9e
> [   23.945616]  __remove_memory+0xa/0x11
> [   23.946274]  acpi_memory_device_remove+0x70/0x100
> [   23.947308]  acpi_bus_trim+0x55/0x90
> [   23.947914]  acpi_device_hotplug+0x227/0x3a0
> [   23.948714]  acpi_hotplug_work_fn+0x1a/0x30
> [   23.949433]  process_one_work+0x221/0x550
> [   23.950190]  worker_thread+0x50/0x3b0
> [   23.950993]  kthread+0x105/0x140
> [   23.951644]  ? process_one_work+0x550/0x550
> [   23.952508]  ? kthread_park+0x80/0x80
> [   23.953367]  ret_from_fork+0x3a/0x50
> [   23.954025] Modules linked in:
> [   23.954613] CR2: 000000000000353d
> [   23.955248] ---[ end trace 93d982b1fb3e1a69 ]---
> 
> Instead, shrink the zones when offlining memory or when onlining failed.
> Introduce and use remove_pfn_range_from_zone(() for that. We now properly
> shrink the zones, even if we have DIMMs whereby
> - Some memory blocks fall into no zone (never onlined)
> - Some memory blocks fall into multiple zones (offlined+re-onlined)
> - Multiple memory blocks that fall into different zones
> 
> Drop the zone parameter (with a potential dubious value) from
> __remove_pages() and __remove_section().
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: Rich Felker <dalias@libc.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Steve Capper <steve.capper@arm.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Jun Yao <yaojun8558363@gmail.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-ia64@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-s390@vger.kernel.org
> Cc: linux-sh@vger.kernel.org
> Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   arch/arm64/mm/mmu.c            |  4 +---
>   arch/ia64/mm/init.c            |  4 +---
>   arch/powerpc/mm/mem.c          |  3 +--
>   arch/s390/mm/init.c            |  4 +---
>   arch/sh/mm/init.c              |  4 +---
>   arch/x86/mm/init_32.c          |  4 +---
>   arch/x86/mm/init_64.c          |  4 +---
>   include/linux/memory_hotplug.h |  7 +++++--
>   mm/memory_hotplug.c            | 31 ++++++++++++++++---------------
>   mm/memremap.c                  |  2 +-
>   10 files changed, 29 insertions(+), 38 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 60c929f3683b..d10247fab0fd 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1069,7 +1069,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>   {
>   	unsigned long start_pfn = start >> PAGE_SHIFT;
>   	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct zone *zone;
>   
>   	/*
>   	 * FIXME: Cleanup page tables (also in arch_add_memory() in case
> @@ -1078,7 +1077,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>   	 * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
>   	 * unlocked yet.
>   	 */
> -	zone = page_zone(pfn_to_page(start_pfn));
> -	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>   }
>   #endif
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index bf9df2625bc8..a6dd80a2c939 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -689,9 +689,7 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>   {
>   	unsigned long start_pfn = start >> PAGE_SHIFT;
>   	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct zone *zone;
>   
> -	zone = page_zone(pfn_to_page(start_pfn));
> -	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>   }
>   #endif
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index be941d382c8d..97e5922cb52e 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -130,10 +130,9 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
>   {
>   	unsigned long start_pfn = start >> PAGE_SHIFT;
>   	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct page *page = pfn_to_page(start_pfn) + vmem_altmap_offset(altmap);
>   	int ret;
>   
> -	__remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>   
>   	/* Remove htab bolted mappings for this section of memory */
>   	start = (unsigned long)__va(start);
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index a124f19f7b3c..c1d96e588152 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -291,10 +291,8 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>   {
>   	unsigned long start_pfn = start >> PAGE_SHIFT;
>   	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct zone *zone;
>   
> -	zone = page_zone(pfn_to_page(start_pfn));
> -	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>   	vmem_remove_mapping(start, size);
>   }
>   #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> index dfdbaa50946e..d1b1ff2be17a 100644
> --- a/arch/sh/mm/init.c
> +++ b/arch/sh/mm/init.c
> @@ -434,9 +434,7 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>   {
>   	unsigned long start_pfn = PFN_DOWN(start);
>   	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct zone *zone;
>   
> -	zone = page_zone(pfn_to_page(start_pfn));
> -	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>   }
>   #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index 930edeb41ec3..0a74407ef92e 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -865,10 +865,8 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>   {
>   	unsigned long start_pfn = start >> PAGE_SHIFT;
>   	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct zone *zone;
>   
> -	zone = page_zone(pfn_to_page(start_pfn));
> -	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>   }
>   #endif
>   
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index a6b5c653727b..b8541d77452c 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1212,10 +1212,8 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
>   {
>   	unsigned long start_pfn = start >> PAGE_SHIFT;
>   	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct page *page = pfn_to_page(start_pfn) + vmem_altmap_offset(altmap);
> -	struct zone *zone = page_zone(page);
>   
> -	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>   	kernel_physical_mapping_remove(start, start + size);
>   }
>   #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index bc477e98a310..517b70943732 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -126,8 +126,8 @@ static inline bool movable_node_is_enabled(void)
>   
>   extern void arch_remove_memory(int nid, u64 start, u64 size,
>   			       struct vmem_altmap *altmap);
> -extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
> -			   unsigned long nr_pages, struct vmem_altmap *altmap);
> +extern void __remove_pages(unsigned long start_pfn, unsigned long nr_pages,
> +			   struct vmem_altmap *altmap);
>   
>   /* reasonably generic interface to expand the physical pages */
>   extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> @@ -346,6 +346,9 @@ extern int add_memory(int nid, u64 start, u64 size);
>   extern int add_memory_resource(int nid, struct resource *resource);
>   extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>   		unsigned long nr_pages, struct vmem_altmap *altmap);
> +extern void remove_pfn_range_from_zone(struct zone *zone,
> +				       unsigned long start_pfn,
> +				       unsigned long nr_pages);
>   extern bool is_memblock_offlined(struct memory_block *mem);
>   extern int sparse_add_section(int nid, unsigned long pfn,
>   		unsigned long nr_pages, struct vmem_altmap *altmap);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index f96608d24f6a..5b003ffa5dc9 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -457,8 +457,9 @@ static void update_pgdat_span(struct pglist_data *pgdat)
>   	pgdat->node_spanned_pages = node_end_pfn - node_start_pfn;
>   }
>   
> -static void __remove_zone(struct zone *zone, unsigned long start_pfn,
> -		unsigned long nr_pages)
> +void __ref remove_pfn_range_from_zone(struct zone *zone,
> +				      unsigned long start_pfn,
> +				      unsigned long nr_pages)
>   {
>   	struct pglist_data *pgdat = zone->zone_pgdat;
>   	unsigned long flags;
> @@ -473,28 +474,30 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn,
>   		return;
>   #endif
>   
> +	clear_zone_contiguous(zone);
> +
>   	pgdat_resize_lock(zone->zone_pgdat, &flags);
>   	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
>   	update_pgdat_span(pgdat);
>   	pgdat_resize_unlock(zone->zone_pgdat, &flags);
> +
> +	set_zone_contiguous(zone);
>   }
>   
> -static void __remove_section(struct zone *zone, unsigned long pfn,
> -		unsigned long nr_pages, unsigned long map_offset,
> -		struct vmem_altmap *altmap)
> +static void __remove_section(unsigned long pfn, unsigned long nr_pages,
> +			     unsigned long map_offset,
> +			     struct vmem_altmap *altmap)
>   {
>   	struct mem_section *ms = __nr_to_section(pfn_to_section_nr(pfn));
>   
>   	if (WARN_ON_ONCE(!valid_section(ms)))
>   		return;
>   
> -	__remove_zone(zone, pfn, nr_pages);
>   	sparse_remove_section(ms, pfn, nr_pages, map_offset, altmap);
>   }
>   
>   /**
> - * __remove_pages() - remove sections of pages from a zone
> - * @zone: zone from which pages need to be removed
> + * __remove_pages() - remove sections of pages
>    * @pfn: starting pageframe (must be aligned to start of a section)
>    * @nr_pages: number of pages to remove (must be multiple of section size)
>    * @altmap: alternative device page map or %NULL if default memmap is used
> @@ -504,16 +507,14 @@ static void __remove_section(struct zone *zone, unsigned long pfn,
>    * sure that pages are marked reserved and zones are adjust properly by
>    * calling offline_pages().
>    */
> -void __remove_pages(struct zone *zone, unsigned long pfn,
> -		    unsigned long nr_pages, struct vmem_altmap *altmap)
> +void __remove_pages(unsigned long pfn, unsigned long nr_pages,
> +		    struct vmem_altmap *altmap)
>   {
>   	unsigned long map_offset = 0;
>   	unsigned long nr, start_sec, end_sec;
>   
>   	map_offset = vmem_altmap_offset(altmap);
>   
> -	clear_zone_contiguous(zone);
> -
>   	if (check_pfn_span(pfn, nr_pages, "remove"))
>   		return;
>   
> @@ -525,13 +526,11 @@ void __remove_pages(struct zone *zone, unsigned long pfn,
>   		cond_resched();
>   		pfns = min(nr_pages, PAGES_PER_SECTION
>   				- (pfn & ~PAGE_SECTION_MASK));
> -		__remove_section(zone, pfn, pfns, map_offset, altmap);
> +		__remove_section(pfn, pfns, map_offset, altmap);
>   		pfn += pfns;
>   		nr_pages -= pfns;
>   		map_offset = 0;
>   	}
> -
> -	set_zone_contiguous(zone);
>   }
>   
>   int set_online_page_callback(online_page_callback_t callback)
> @@ -859,6 +858,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>   		 (unsigned long long) pfn << PAGE_SHIFT,
>   		 (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
>   	memory_notify(MEM_CANCEL_ONLINE, &arg);
> +	remove_pfn_range_from_zone(zone, pfn, nr_pages);
>   	mem_hotplug_done();
>   	return ret;
>   }
> @@ -1605,6 +1605,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
>   	writeback_set_ratelimit();
>   
>   	memory_notify(MEM_OFFLINE, &arg);
> +	remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
>   	mem_hotplug_done();
>   	return 0;
>   
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 8c2fb44c3b4d..70263e6f093e 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -140,7 +140,7 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>   
>   	mem_hotplug_begin();
>   	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> -		__remove_pages(page_zone(first_page), PHYS_PFN(res->start),
> +		__remove_pages(PHYS_PFN(res->start),
>   			       PHYS_PFN(resource_size(res)), NULL);
>   	} else {
>   		arch_remove_memory(nid, res->start, resource_size(res),
> 

I think I just found an issue with try_offline_node(). 
try_offline_node() is pretty much broken already (touches garbage 
memmaps and will not considers mixed NIDs within sections), however, 
relies on the node span to look for memory sections to probe. So it 
seems to rely on the nodes getting shrunk when removing memory, not when 
offlining.

As we shrink the node span when offlining now and not when removing, 
this can go wrong once we offline the last memory block of the node and 
offline the last CPU. We could still have memory around that we could 
re-online, however, the node would already be offline. Unlikely, but 
possible.

Note that the same is also broken without this patch in case memory is 
never onlined. The "pfn_to_nid(pfn) != nid" can easily succeed on the 
garbage memmap, resulting in  no memory being detected as belonging to 
the node. Also, resize_pgdat_range() is called when onlining memory, not 
when adding it. :/ Oh this is so broken :)

The right fix is probably to walk over all memory blocks that could 
exist and test if they belong to the nid (if offline, check the 
block->nid, if online check all pageblocks). A fix we can then move in 
front of this patch.

Will look into this this week.

-- 

Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()
  2019-10-14 19:17     ` Andrew Morton
@ 2019-11-19 14:16       ` David Hildenbrand
  2019-11-19 20:44         ` Andrew Morton
  0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2019-11-19 14:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, linux-arm-kernel, linux-ia64,
	linuxppc-dev, linux-s390, linux-sh, x86, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Aneesh Kumar K . V,
	Toshiki Fukasawa, Alexander Duyck

On 14.10.19 21:17, Andrew Morton wrote:
> On Mon, 14 Oct 2019 11:32:13 +0200 David Hildenbrand <david@redhat.com> wrote:
> 
>>> Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
>>
>> @Andrew, can you convert that to
>>
>> Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
>> memory to zones until online") # visible after d0dc12e86b319
>>
>> and add
>>
>> Cc: stable@vger.kernel.org # v4.13+
> 
> Done, thanks.
> 

Just a note that Toshiki reported a BUG (race between delayed
initialization of ZONE_DEVICE memmaps without holding the memory
hotplug lock and concurrent zone shrinking).

https://lkml.org/lkml/2019/11/14/1040

"Iteration of create and destroy namespace causes the panic as below:

[   41.207694] kernel BUG at mm/page_alloc.c:535!
[   41.208109] invalid opcode: 0000 [#1] SMP PTI
[   41.208508] CPU: 7 PID: 2766 Comm: ndctl Not tainted 5.4.0-rc4 #6
[   41.209064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[   41.210175] RIP: 0010:set_pfnblock_flags_mask+0x95/0xf0
[   41.210643] Code: 04 41 83 e2 3c 48 8d 04 a8 48 c1 e0 07 48 03 04 dd e0 59 55 bb 48 8b 58 68 48 39 da 73 0e 48 c7 c6 70 ac 11 bb e8 1b b2 fd ff <0f> 0b 48 03 58 78 48 39 da 73 e9 49 01 ca b9 3f 00 00 00 4f 8d 0c
[   41.212354] RSP: 0018:ffffac0d41557c80 EFLAGS: 00010246
[   41.212821] RAX: 000000000000004a RBX: 0000000000244a00 RCX: 0000000000000000
[   41.213459] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffbb1197dc
[   41.214100] RBP: 000000000000000c R08: 0000000000000439 R09: 0000000000000059
[   41.214736] R10: 0000000000000000 R11: ffffac0d41557b08 R12: ffff8be475ea72b0
[   41.215376] R13: 000000000000fa00 R14: 0000000000250000 R15: 00000000fffc0bb5
[   41.216008] FS:  00007f30862ab600(0000) GS:ffff8be57bc40000(0000) knlGS:0000000000000000
[   41.216771] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   41.217299] CR2: 000055e824d0d508 CR3: 0000000231dac000 CR4: 00000000000006e0
[   41.217934] Call Trace:
[   41.218225]  memmap_init_zone_device+0x165/0x17c
[   41.218642]  memremap_pages+0x4c1/0x540
[   41.218989]  devm_memremap_pages+0x1d/0x60
[   41.219367]  pmem_attach_disk+0x16b/0x600 [nd_pmem]
[   41.219804]  ? devm_nsio_enable+0xb8/0xe0
[   41.220172]  nvdimm_bus_probe+0x69/0x1c0
[   41.220526]  really_probe+0x1c2/0x3e0
[   41.220856]  driver_probe_device+0xb4/0x100
[   41.221238]  device_driver_attach+0x4f/0x60
[   41.221611]  bind_store+0xc9/0x110
[   41.221919]  kernfs_fop_write+0x116/0x190
[   41.222326]  vfs_write+0xa5/0x1a0
[   41.222626]  ksys_write+0x59/0xd0
[   41.222927]  do_syscall_64+0x5b/0x180
[   41.223264]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   41.223714] RIP: 0033:0x7f30865d0ed8
[   41.224037] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 45 78 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[   41.225920] RSP: 002b:00007fffe5d30a78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   41.226608] RAX: ffffffffffffffda RBX: 000055e824d07f40 RCX: 00007f30865d0ed8
[   41.227242] RDX: 0000000000000007 RSI: 000055e824d07f40 RDI: 0000000000000004
[   41.227870] RBP: 0000000000000007 R08: 0000000000000007 R09: 0000000000000006
[   41.228753] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
[   41.229419] R13: 00007f30862ab528 R14: 0000000000000001 R15: 000055e824d07f40

While creating a namespace and initializing memmap, if you destroy the namespace
and shrink the zone, it will initialize the memmap outside the zone and
trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page) in
set_pfnblock_flags_mask()."


This BUG is also mitigated by this commit, where we for now stop to
shrink the ZONE_DEVICE zone until we can do it in a safe and clean
way.

-- 

Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()
  2019-11-19 14:16       ` David Hildenbrand
@ 2019-11-19 20:44         ` Andrew Morton
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2019-11-19 20:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-arm-kernel, linux-ia64,
	linuxppc-dev, linux-s390, linux-sh, x86, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Aneesh Kumar K . V,
	Toshiki Fukasawa, Alexander Duyck

On Tue, 19 Nov 2019 15:16:22 +0100 David Hildenbrand <david@redhat.com> wrote:

> On 14.10.19 21:17, Andrew Morton wrote:
> > On Mon, 14 Oct 2019 11:32:13 +0200 David Hildenbrand <david@redhat.com> wrote:
> > 
> >>> Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
> >>
> >> @Andrew, can you convert that to
> >>
> >> Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
> >> memory to zones until online") # visible after d0dc12e86b319
> >>
> >> and add
> >>
> >> Cc: stable@vger.kernel.org # v4.13+
> > 
> > Done, thanks.
> > 
> 
> Just a note that Toshiki reported a BUG (race between delayed
> initialization of ZONE_DEVICE memmaps without holding the memory
> hotplug lock and concurrent zone shrinking).
> 
> https://lkml.org/lkml/2019/11/14/1040
> 
> "Iteration of create and destroy namespace causes the panic as below:
> 
> [   41.207694] kernel BUG at mm/page_alloc.c:535!
> [   41.208109] invalid opcode: 0000 [#1] SMP PTI
> [   41.208508] CPU: 7 PID: 2766 Comm: ndctl Not tainted 5.4.0-rc4 #6
> [   41.209064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
> [   41.210175] RIP: 0010:set_pfnblock_flags_mask+0x95/0xf0
> [   41.210643] Code: 04 41 83 e2 3c 48 8d 04 a8 48 c1 e0 07 48 03 04 dd e0 59 55 bb 48 8b 58 68 48 39 da 73 0e 48 c7 c6 70 ac 11 bb e8 1b b2 fd ff <0f> 0b 48 03 58 78 48 39 da 73 e9 49 01 ca b9 3f 00 00 00 4f 8d 0c
> [   41.212354] RSP: 0018:ffffac0d41557c80 EFLAGS: 00010246
> [   41.212821] RAX: 000000000000004a RBX: 0000000000244a00 RCX: 0000000000000000
> [   41.213459] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffbb1197dc
> [   41.214100] RBP: 000000000000000c R08: 0000000000000439 R09: 0000000000000059
> [   41.214736] R10: 0000000000000000 R11: ffffac0d41557b08 R12: ffff8be475ea72b0
> [   41.215376] R13: 000000000000fa00 R14: 0000000000250000 R15: 00000000fffc0bb5
> [   41.216008] FS:  00007f30862ab600(0000) GS:ffff8be57bc40000(0000) knlGS:0000000000000000
> [   41.216771] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   41.217299] CR2: 000055e824d0d508 CR3: 0000000231dac000 CR4: 00000000000006e0
> [   41.217934] Call Trace:
> [   41.218225]  memmap_init_zone_device+0x165/0x17c
> [   41.218642]  memremap_pages+0x4c1/0x540
> [   41.218989]  devm_memremap_pages+0x1d/0x60
> [   41.219367]  pmem_attach_disk+0x16b/0x600 [nd_pmem]
> [   41.219804]  ? devm_nsio_enable+0xb8/0xe0
> [   41.220172]  nvdimm_bus_probe+0x69/0x1c0
> [   41.220526]  really_probe+0x1c2/0x3e0
> [   41.220856]  driver_probe_device+0xb4/0x100
> [   41.221238]  device_driver_attach+0x4f/0x60
> [   41.221611]  bind_store+0xc9/0x110
> [   41.221919]  kernfs_fop_write+0x116/0x190
> [   41.222326]  vfs_write+0xa5/0x1a0
> [   41.222626]  ksys_write+0x59/0xd0
> [   41.222927]  do_syscall_64+0x5b/0x180
> [   41.223264]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   41.223714] RIP: 0033:0x7f30865d0ed8
> [   41.224037] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 45 78 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
> [   41.225920] RSP: 002b:00007fffe5d30a78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [   41.226608] RAX: ffffffffffffffda RBX: 000055e824d07f40 RCX: 00007f30865d0ed8
> [   41.227242] RDX: 0000000000000007 RSI: 000055e824d07f40 RDI: 0000000000000004
> [   41.227870] RBP: 0000000000000007 R08: 0000000000000007 R09: 0000000000000006
> [   41.228753] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
> [   41.229419] R13: 00007f30862ab528 R14: 0000000000000001 R15: 000055e824d07f40
> 
> While creating a namespace and initializing memmap, if you destroy the namespace
> and shrink the zone, it will initialize the memmap outside the zone and
> trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page) in
> set_pfnblock_flags_mask()."
> 
> 
> This BUG is also mitigated by this commit, where we for now stop to
> shrink the ZONE_DEVICE zone until we can do it in a safe and clean
> way.
> 

OK, thanks.  I updated the changelog, added Reported-by:Toshiki and
shall squeeze this fix into 5.4.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
  2019-10-27 22:45   ` David Hildenbrand
@ 2019-11-30 23:21     ` Andrew Morton
  2019-11-30 23:43       ` David Hildenbrand
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2019-11-30 23:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-arm-kernel, linux-ia64,
	linuxppc-dev, linux-s390, linux-sh, x86, Catalin Marinas,
	Will Deacon, Tony Luck, Fenghua Yu, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Mark Rutland, Steve Capper,
	Mike Rapoport, Anshuman Khandual, Yu Zhao, Jun Yao, Robin Murphy,
	Michal Hocko, Oscar Salvador, Matthew Wilcox (Oracle),
	Christophe Leroy, Aneesh Kumar K.V, Pavel Tatashin,
	Gerald Schaefer, Halil Pasic, Tom Lendacky, Greg Kroah-Hartman,
	Masahiro Yamada, Dan Williams, Wei Yang, Qian Cai,
	Jason Gunthorpe, Logan Gunthorpe, Ira Weiny

On Sun, 27 Oct 2019 23:45:52 +0100 David Hildenbrand <david@redhat.com> wrote:

> I think I just found an issue with try_offline_node(). 
> try_offline_node() is pretty much broken already (touches garbage 
> memmaps and will not considers mixed NIDs within sections), however, 
> relies on the node span to look for memory sections to probe. So it 
> seems to rely on the nodes getting shrunk when removing memory, not when 
> offlining.
> 
> As we shrink the node span when offlining now and not when removing, 
> this can go wrong once we offline the last memory block of the node and 
> offline the last CPU. We could still have memory around that we could 
> re-online, however, the node would already be offline. Unlikely, but 
> possible.
> 
> Note that the same is also broken without this patch in case memory is 
> never onlined. The "pfn_to_nid(pfn) != nid" can easily succeed on the 
> garbage memmap, resulting in  no memory being detected as belonging to 
> the node. Also, resize_pgdat_range() is called when onlining memory, not 
> when adding it. :/ Oh this is so broken :)
> 
> The right fix is probably to walk over all memory blocks that could 
> exist and test if they belong to the nid (if offline, check the 
> block->nid, if online check all pageblocks). A fix we can then move in 
> front of this patch.
> 
> Will look into this this week.

And this series shows almost no sign of having been reviewed.  I'll hold
it over for 5.6.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
  2019-11-30 23:21     ` Andrew Morton
@ 2019-11-30 23:43       ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-11-30 23:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, linux-kernel, linux-mm, linux-arm-kernel,
	linux-ia64, linuxppc-dev, linux-s390, linux-sh, x86,
	Catalin Marinas, Will Deacon, Tony Luck, Fenghua Yu,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Yoshinori Sato, Rich Felker, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Mark Rutland, Steve Capper, Mike Rapoport,
	Anshuman Khandual, Yu Zhao, Jun Yao, Robin Murphy, Michal Hocko,
	Oscar Salvador, Matthew Wilcox (Oracle),
	Christophe Leroy, Aneesh Kumar K.V, Pavel Tatashin,
	Gerald Schaefer, Halil Pasic, Tom Lendacky, Greg Kroah-Hartman,
	Masahiro Yamada, Dan Williams, Wei Yang, Qian Cai,
	Jason Gunthorpe, Logan Gunthorpe, Ira Weiny



> Am 01.12.2019 um 00:22 schrieb Andrew Morton <akpm@linux-foundation.org>:
> 
> On Sun, 27 Oct 2019 23:45:52 +0100 David Hildenbrand <david@redhat.com> wrote:
> 
>> I think I just found an issue with try_offline_node(). 
>> try_offline_node() is pretty much broken already (touches garbage 
>> memmaps and will not considers mixed NIDs within sections), however, 
>> relies on the node span to look for memory sections to probe. So it 
>> seems to rely on the nodes getting shrunk when removing memory, not when 
>> offlining.
>> 
>> As we shrink the node span when offlining now and not when removing, 
>> this can go wrong once we offline the last memory block of the node and 
>> offline the last CPU. We could still have memory around that we could 
>> re-online, however, the node would already be offline. Unlikely, but 
>> possible.
>> 
>> Note that the same is also broken without this patch in case memory is 
>> never onlined. The "pfn_to_nid(pfn) != nid" can easily succeed on the 
>> garbage memmap, resulting in  no memory being detected as belonging to 
>> the node. Also, resize_pgdat_range() is called when onlining memory, not 
>> when adding it. :/ Oh this is so broken :)
>> 
>> The right fix is probably to walk over all memory blocks that could 
>> exist and test if they belong to the nid (if offline, check the 
>> block->nid, if online check all pageblocks). A fix we can then move in 
>> front of this patch.
>> 
>> Will look into this this week.
> 
> And this series shows almost no sign of having been reviewed.  I'll hold
> it over for 5.6.
> 

Makes sense, can‘t do anything about it. Btw, this one is the last stable patch to fix access of uninitialized memmaps that is not upstream yet... so it has to remain broken for some longer.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory
  2019-10-06  8:56 [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
                   ` (9 preceding siblings ...)
  2019-10-06  8:56 ` [PATCH v6 10/10] mm/memory_hotplug: Cleanup __remove_pages() David Hildenbrand
@ 2019-12-02  9:09 ` David Hildenbrand
  2019-12-03 13:36   ` Oscar Salvador
  10 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2019-12-02  9:09 UTC (permalink / raw)
  To: linux-kernel, Michal Hocko, Oscar Salvador
  Cc: linux-mm, linux-arm-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, x86, Aneesh Kumar K . V, Andrew Morton, Dan Williams,
	Alexander Duyck, Alexander Potapenko, Andy Lutomirski,
	Anshuman Khandual, Benjamin Herrenschmidt, Borislav Petkov,
	Catalin Marinas, Christian Borntraeger, Christophe Leroy,
	Dave Hansen, Fenghua Yu, Gerald Schaefer, Greg Kroah-Hartman,
	Halil Pasic, Heiko Carstens, H. Peter Anvin, Ingo Molnar,
	Ira Weiny, Jason Gunthorpe, Jun Yao, Logan Gunthorpe,
	Mark Rutland, Masahiro Yamada, Matthew Wilcox (Oracle),
	Mel Gorman, Michael Ellerman, Mike Rapoport, Pankaj Gupta,
	Paul Mackerras, Pavel Tatashin, Pavel Tatashin, Peter Zijlstra,
	Qian Cai, Rich Felker, Robin Murphy, Steve Capper,
	Thomas Gleixner, Tom Lendacky, Tony Luck, Vasily Gorbik,
	Vlastimil Babka, Wei Yang, Wei Yang, Will Deacon, Yoshinori Sato,
	Yu Zhao

On 06.10.19 10:56, David Hildenbrand wrote:
> This series fixes the access of uninitialized memmaps when shrinking
> zones/nodes and when removing memory. Also, it contains all fixes for
> crashes that can be triggered when removing certain namespace using
> memunmap_pages() - ZONE_DEVICE, reported by Aneesh.
> 
> We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be
> more involved (we don't have SECTION_IS_ONLINE as an indicator), and
> shrinking is only of limited use (set_zone_contiguous() cannot detect
> the ZONE_DEVICE as contiguous).
> 
> We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount of
> code to a minimum. Shrinking is especially necessary to keep
> zone->contiguous set where possible, especially, on memory unplug of
> DIMMs at zone boundaries.
> 
> --------------------------------------------------------------------------
> 
> Zones are now properly shrunk when offlining memory blocks or when
> onlining failed. This allows to properly shrink zones on memory unplug
> even if the separate memory blocks of a DIMM were onlined to different
> zones or re-onlined to a different zone after offlining.
> 
> Example:
> 
> :/# cat /proc/zoneinfo
> Node 1, zone  Movable
>         spanned  0
>         present  0
>         managed  0
> :/# echo "online_movable" > /sys/devices/system/memory/memory41/state
> :/# echo "online_movable" > /sys/devices/system/memory/memory43/state
> :/# cat /proc/zoneinfo
> Node 1, zone  Movable
>         spanned  98304
>         present  65536
>         managed  65536
> :/# echo 0 > /sys/devices/system/memory/memory43/online
> :/# cat /proc/zoneinfo
> Node 1, zone  Movable
>         spanned  32768
>         present  32768
>         managed  32768
> :/# echo 0 > /sys/devices/system/memory/memory41/online
> :/# cat /proc/zoneinfo
> Node 1, zone  Movable
>         spanned  0
>         present  0
>         managed  0
> 
> --------------------------------------------------------------------------
> 
> I tested this with DIMMs on x86. I didn't test the ZONE_DEVICE part.
> 
> v4 -> v6:
> - "mm/memunmap: Don't access uninitialized memmap in memunmap_pages()"
> -- Minimize code changes, rephrase subject and description
> - "mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()"
> -- Add ifdef to make it compile without ZONE_DEVICE
> 
> v4 -> v5:
> - "mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()"
> -- Add more details why ZONE_DEVICE is special
> - Include two patches from Aneesh
> -- "mm/memunmap: Use the correct start and end pfn when removing pages
>     from zone"
> -- "mm/memmap_init: Update variable name in memmap_init_zone"
> 
> v3 -> v4:
> - Drop "mm/memremap: Get rid of memmap_init_zone_device()"
> -- As Alexander noticed, it was messy either way
> - Drop "mm/memory_hotplug: Exit early in __remove_pages() on BUGs"
> - Drop "mm: Exit early in set_zone_contiguous() if already contiguous"
> - Drop "mm/memory_hotplug: Optimize zone shrinking code when checking for
>   holes"
> - Merged "mm/memory_hotplug: Remove pages from a zone before removing
>   memory" and "mm/memory_hotplug: Remove zone parameter from
>   __remove_pages()" into "mm/memory_hotplug: Shrink zones when offlining
>   memory"
> - Added "mm/memory_hotplug: Poison memmap in remove_pfn_range_from_zone()"
> - Stop shrinking ZONE_DEVICE
> - Reshuffle patches, moving all fixes to the front. Add Fixes: tags.
> - Change subject/description of various patches
> - Minor changes (too many to mention)
> 
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Michal Hocko <mhocko@suse.com>
> 

So, patch #1-#4 are already upstream. The other patches have been in
-next for quite a long time, and I (+other RH people) ran excessive
tests on them.

Especially patch #5 is a BUG fix I want to see upstream rather sooner
than later (last know "uninitialized memmap" access).

Andrew decided not to send these for 5.5 due to lack of ack/review -
which is unfortunate, but the right thing to do.

@Michal, @Oscar, can some of you at least have a patch #5 now so we can
proceed with that? (the other patches can stay in -next some time longer)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory
  2019-12-02  9:09 ` [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
@ 2019-12-03 13:36   ` Oscar Salvador
  0 siblings, 0 replies; 29+ messages in thread
From: Oscar Salvador @ 2019-12-03 13:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, Michal Hocko, linux-mm, linux-arm-kernel,
	linux-ia64, linuxppc-dev, linux-s390, linux-sh, x86,
	Aneesh Kumar K . V, Andrew Morton, Dan Williams, Alexander Duyck,
	Alexander Potapenko, Andy Lutomirski, Anshuman Khandual,
	Benjamin Herrenschmidt, Borislav Petkov, Catalin Marinas,
	Christian Borntraeger, Christophe Leroy, Dave Hansen, Fenghua Yu,
	Gerald Schaefer, Greg Kroah-Hartman, Halil Pasic, Heiko Carstens,
	H. Peter Anvin, Ingo Molnar, Ira Weiny, Jason Gunthorpe, Jun Yao,
	Logan Gunthorpe, Mark Rutland, Masahiro Yamada,
	Matthew Wilcox (Oracle),
	Mel Gorman, Michael Ellerman, Mike Rapoport, Pankaj Gupta,
	Paul Mackerras, Pavel Tatashin, Pavel Tatashin, Peter Zijlstra,
	Qian Cai, Rich Felker, Robin Murphy, Steve Capper,
	Thomas Gleixner, Tom Lendacky, Tony Luck, Vasily Gorbik,
	Vlastimil Babka, Wei Yang, Wei Yang, Will Deacon, Yoshinori Sato,
	Yu Zhao

On Mon, Dec 02, 2019 at 10:09:51AM +0100, David Hildenbrand wrote:
> @Michal, @Oscar, can some of you at least have a patch #5 now so we can
> proceed with that? (the other patches can stay in -next some time longer)

Hi, 

I will be having a look at patch#5 shortly.

Thanks for the reminder

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
  2019-10-06  8:56 ` [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory David Hildenbrand
  2019-10-14  9:39   ` David Hildenbrand
  2019-10-27 22:45   ` David Hildenbrand
@ 2019-12-03 15:10   ` Oscar Salvador
  2019-12-03 15:27     ` David Hildenbrand
  2 siblings, 1 reply; 29+ messages in thread
From: Oscar Salvador @ 2019-12-03 15:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linux-arm-kernel, linux-ia64,
	linuxppc-dev, linux-s390, linux-sh, x86, Catalin Marinas,
	Will Deacon, Tony Luck, Fenghua Yu, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Andrew Morton, Mark Rutland,
	Steve Capper, Mike Rapoport, Anshuman Khandual, Yu Zhao, Jun Yao,
	Robin Murphy, Michal Hocko, Matthew Wilcox (Oracle),
	Christophe Leroy, Aneesh Kumar K.V, Pavel Tatashin,
	Gerald Schaefer, Halil Pasic, Tom Lendacky, Greg Kroah-Hartman,
	Masahiro Yamada, Dan Williams, Wei Yang, Qian Cai,
	Jason Gunthorpe, Logan Gunthorpe, Ira Weiny

On Sun, Oct 06, 2019 at 10:56:41AM +0200, David Hildenbrand wrote:
> Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
> Signed-off-by: David Hildenbrand <david@redhat.com>

I did not see anything wrong with the taken approach, and makes sense to me.
The only thing that puzzles me is we seem to not balance spanned_pages
for ZONE_DEVICE anymore.
memremap_pages() increments them via move_pfn_range_to_zone, but we skip
ZONE_DEVICE in remove_pfn_range_from_zone.

That is not really related to this patch, so I might be missing something,
but it caught my eye while reviewing this.

Anyway, for this one:

Reviewed-by: Oscar Salvador <osalvador@suse.de>


off-topic: I __think__ we really need to trim the CC list.

> ---
>  arch/arm64/mm/mmu.c            |  4 +---
>  arch/ia64/mm/init.c            |  4 +---
>  arch/powerpc/mm/mem.c          |  3 +--
>  arch/s390/mm/init.c            |  4 +---
>  arch/sh/mm/init.c              |  4 +---
>  arch/x86/mm/init_32.c          |  4 +---
>  arch/x86/mm/init_64.c          |  4 +---
>  include/linux/memory_hotplug.h |  7 +++++--
>  mm/memory_hotplug.c            | 31 ++++++++++++++++---------------
>  mm/memremap.c                  |  2 +-
>  10 files changed, 29 insertions(+), 38 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 60c929f3683b..d10247fab0fd 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1069,7 +1069,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  {
>  	unsigned long start_pfn = start >> PAGE_SHIFT;
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct zone *zone;
>  
>  	/*
>  	 * FIXME: Cleanup page tables (also in arch_add_memory() in case
> @@ -1078,7 +1077,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  	 * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
>  	 * unlocked yet.
>  	 */
> -	zone = page_zone(pfn_to_page(start_pfn));
> -	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>  }
>  #endif
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index bf9df2625bc8..a6dd80a2c939 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -689,9 +689,7 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  {
>  	unsigned long start_pfn = start >> PAGE_SHIFT;
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct zone *zone;
>  
> -	zone = page_zone(pfn_to_page(start_pfn));
> -	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>  }
>  #endif
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index be941d382c8d..97e5922cb52e 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -130,10 +130,9 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
>  {
>  	unsigned long start_pfn = start >> PAGE_SHIFT;
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct page *page = pfn_to_page(start_pfn) + vmem_altmap_offset(altmap);
>  	int ret;
>  
> -	__remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>  
>  	/* Remove htab bolted mappings for this section of memory */
>  	start = (unsigned long)__va(start);
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index a124f19f7b3c..c1d96e588152 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -291,10 +291,8 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  {
>  	unsigned long start_pfn = start >> PAGE_SHIFT;
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct zone *zone;
>  
> -	zone = page_zone(pfn_to_page(start_pfn));
> -	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>  	vmem_remove_mapping(start, size);
>  }
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> index dfdbaa50946e..d1b1ff2be17a 100644
> --- a/arch/sh/mm/init.c
> +++ b/arch/sh/mm/init.c
> @@ -434,9 +434,7 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  {
>  	unsigned long start_pfn = PFN_DOWN(start);
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct zone *zone;
>  
> -	zone = page_zone(pfn_to_page(start_pfn));
> -	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>  }
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index 930edeb41ec3..0a74407ef92e 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -865,10 +865,8 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  {
>  	unsigned long start_pfn = start >> PAGE_SHIFT;
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct zone *zone;
>  
> -	zone = page_zone(pfn_to_page(start_pfn));
> -	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>  }
>  #endif
>  
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index a6b5c653727b..b8541d77452c 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1212,10 +1212,8 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
>  {
>  	unsigned long start_pfn = start >> PAGE_SHIFT;
>  	unsigned long nr_pages = size >> PAGE_SHIFT;
> -	struct page *page = pfn_to_page(start_pfn) + vmem_altmap_offset(altmap);
> -	struct zone *zone = page_zone(page);
>  
> -	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +	__remove_pages(start_pfn, nr_pages, altmap);
>  	kernel_physical_mapping_remove(start, start + size);
>  }
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index bc477e98a310..517b70943732 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -126,8 +126,8 @@ static inline bool movable_node_is_enabled(void)
>  
>  extern void arch_remove_memory(int nid, u64 start, u64 size,
>  			       struct vmem_altmap *altmap);
> -extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
> -			   unsigned long nr_pages, struct vmem_altmap *altmap);
> +extern void __remove_pages(unsigned long start_pfn, unsigned long nr_pages,
> +			   struct vmem_altmap *altmap);
>  
>  /* reasonably generic interface to expand the physical pages */
>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> @@ -346,6 +346,9 @@ extern int add_memory(int nid, u64 start, u64 size);
>  extern int add_memory_resource(int nid, struct resource *resource);
>  extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  		unsigned long nr_pages, struct vmem_altmap *altmap);
> +extern void remove_pfn_range_from_zone(struct zone *zone,
> +				       unsigned long start_pfn,
> +				       unsigned long nr_pages);
>  extern bool is_memblock_offlined(struct memory_block *mem);
>  extern int sparse_add_section(int nid, unsigned long pfn,
>  		unsigned long nr_pages, struct vmem_altmap *altmap);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index f96608d24f6a..5b003ffa5dc9 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -457,8 +457,9 @@ static void update_pgdat_span(struct pglist_data *pgdat)
>  	pgdat->node_spanned_pages = node_end_pfn - node_start_pfn;
>  }
>  
> -static void __remove_zone(struct zone *zone, unsigned long start_pfn,
> -		unsigned long nr_pages)
> +void __ref remove_pfn_range_from_zone(struct zone *zone,
> +				      unsigned long start_pfn,
> +				      unsigned long nr_pages)
>  {
>  	struct pglist_data *pgdat = zone->zone_pgdat;
>  	unsigned long flags;
> @@ -473,28 +474,30 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn,
>  		return;
>  #endif
>  
> +	clear_zone_contiguous(zone);
> +
>  	pgdat_resize_lock(zone->zone_pgdat, &flags);
>  	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
>  	update_pgdat_span(pgdat);
>  	pgdat_resize_unlock(zone->zone_pgdat, &flags);
> +
> +	set_zone_contiguous(zone);
>  }
>  
> -static void __remove_section(struct zone *zone, unsigned long pfn,
> -		unsigned long nr_pages, unsigned long map_offset,
> -		struct vmem_altmap *altmap)
> +static void __remove_section(unsigned long pfn, unsigned long nr_pages,
> +			     unsigned long map_offset,
> +			     struct vmem_altmap *altmap)
>  {
>  	struct mem_section *ms = __nr_to_section(pfn_to_section_nr(pfn));
>  
>  	if (WARN_ON_ONCE(!valid_section(ms)))
>  		return;
>  
> -	__remove_zone(zone, pfn, nr_pages);
>  	sparse_remove_section(ms, pfn, nr_pages, map_offset, altmap);
>  }
>  
>  /**
> - * __remove_pages() - remove sections of pages from a zone
> - * @zone: zone from which pages need to be removed
> + * __remove_pages() - remove sections of pages
>   * @pfn: starting pageframe (must be aligned to start of a section)
>   * @nr_pages: number of pages to remove (must be multiple of section size)
>   * @altmap: alternative device page map or %NULL if default memmap is used
> @@ -504,16 +507,14 @@ static void __remove_section(struct zone *zone, unsigned long pfn,
>   * sure that pages are marked reserved and zones are adjust properly by
>   * calling offline_pages().
>   */
> -void __remove_pages(struct zone *zone, unsigned long pfn,
> -		    unsigned long nr_pages, struct vmem_altmap *altmap)
> +void __remove_pages(unsigned long pfn, unsigned long nr_pages,
> +		    struct vmem_altmap *altmap)
>  {
>  	unsigned long map_offset = 0;
>  	unsigned long nr, start_sec, end_sec;
>  
>  	map_offset = vmem_altmap_offset(altmap);
>  
> -	clear_zone_contiguous(zone);
> -
>  	if (check_pfn_span(pfn, nr_pages, "remove"))
>  		return;
>  
> @@ -525,13 +526,11 @@ void __remove_pages(struct zone *zone, unsigned long pfn,
>  		cond_resched();
>  		pfns = min(nr_pages, PAGES_PER_SECTION
>  				- (pfn & ~PAGE_SECTION_MASK));
> -		__remove_section(zone, pfn, pfns, map_offset, altmap);
> +		__remove_section(pfn, pfns, map_offset, altmap);
>  		pfn += pfns;
>  		nr_pages -= pfns;
>  		map_offset = 0;
>  	}
> -
> -	set_zone_contiguous(zone);
>  }
>  
>  int set_online_page_callback(online_page_callback_t callback)
> @@ -859,6 +858,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>  		 (unsigned long long) pfn << PAGE_SHIFT,
>  		 (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
>  	memory_notify(MEM_CANCEL_ONLINE, &arg);
> +	remove_pfn_range_from_zone(zone, pfn, nr_pages);
>  	mem_hotplug_done();
>  	return ret;
>  }
> @@ -1605,6 +1605,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  	writeback_set_ratelimit();
>  
>  	memory_notify(MEM_OFFLINE, &arg);
> +	remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
>  	mem_hotplug_done();
>  	return 0;
>  
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 8c2fb44c3b4d..70263e6f093e 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -140,7 +140,7 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>  
>  	mem_hotplug_begin();
>  	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> -		__remove_pages(page_zone(first_page), PHYS_PFN(res->start),
> +		__remove_pages(PHYS_PFN(res->start),
>  			       PHYS_PFN(resource_size(res)), NULL);
>  	} else {
>  		arch_remove_memory(nid, res->start, resource_size(res),
> -- 
> 2.21.0
> 

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
  2019-12-03 15:10   ` Oscar Salvador
@ 2019-12-03 15:27     ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2019-12-03 15:27 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, linux-mm, linux-arm-kernel, linux-ia64,
	linuxppc-dev, linux-s390, linux-sh, x86, Andrew Morton,
	Michal Hocko, Matthew Wilcox (Oracle),
	Aneesh Kumar K.V, Pavel Tatashin, Greg Kroah-Hartman,
	Dan Williams, Logan Gunthorpe

On 03.12.19 16:10, Oscar Salvador wrote:
> On Sun, Oct 06, 2019 at 10:56:41AM +0200, David Hildenbrand wrote:
>> Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> I did not see anything wrong with the taken approach, and makes sense to me.
> The only thing that puzzles me is we seem to not balance spanned_pages
> for ZONE_DEVICE anymore.
> memremap_pages() increments them via move_pfn_range_to_zone, but we skip
> ZONE_DEVICE in remove_pfn_range_from_zone.

Yes, documented e.g., in

commit 7ce700bf11b5e2cb84e4352bbdf2123a7a239c84
Author: David Hildenbrand <david@redhat.com>
Date:   Thu Nov 21 17:53:56 2019 -0800

    mm/memory_hotplug: don't access uninitialized memmaps in
shrink_zone_span()

Needs some more thought - but is definitely not urgent (well, now it's
at least no longer completely broken).

> 
> That is not really related to this patch, so I might be missing something,
> but it caught my eye while reviewing this.
> 
> Anyway, for this one:
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> 

Thanks!

> 
> off-topic: I __think__ we really need to trim the CC list.

Yes we should :) - done.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, back to index

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-06  8:56 [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
2019-10-06  8:56 ` [PATCH v6 01/10] mm/memunmap: Don't access uninitialized memmap in memunmap_pages() David Hildenbrand
2019-10-06 19:58   ` Damian Tometzki
2019-10-06 20:13     ` David Hildenbrand
2019-10-14  9:05   ` David Hildenbrand
2019-10-06  8:56 ` [PATCH v6 02/10] mm/memmap_init: Update variable name in memmap_init_zone David Hildenbrand
2019-10-06  8:56 ` [PATCH v6 03/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_pgdat_span() David Hildenbrand
2019-10-14  9:31   ` David Hildenbrand
2019-10-06  8:56 ` [PATCH v6 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span() David Hildenbrand
2019-10-14  9:32   ` David Hildenbrand
2019-10-14 19:17     ` Andrew Morton
2019-11-19 14:16       ` David Hildenbrand
2019-11-19 20:44         ` Andrew Morton
2019-10-06  8:56 ` [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory David Hildenbrand
2019-10-14  9:39   ` David Hildenbrand
2019-10-14 19:16     ` Andrew Morton
2019-10-27 22:45   ` David Hildenbrand
2019-11-30 23:21     ` Andrew Morton
2019-11-30 23:43       ` David Hildenbrand
2019-12-03 15:10   ` Oscar Salvador
2019-12-03 15:27     ` David Hildenbrand
2019-10-06  8:56 ` [PATCH v6 06/10] mm/memory_hotplug: Poison memmap in remove_pfn_range_from_zone() David Hildenbrand
2019-10-16 14:01   ` David Hildenbrand
2019-10-06  8:56 ` [PATCH v6 07/10] mm/memory_hotplug: We always have a zone in find_(smallest|biggest)_section_pfn David Hildenbrand
2019-10-06  8:56 ` [PATCH v6 08/10] mm/memory_hotplug: Don't check for "all holes" in shrink_zone_span() David Hildenbrand
2019-10-06  8:56 ` [PATCH v6 09/10] mm/memory_hotplug: Drop local variables " David Hildenbrand
2019-10-06  8:56 ` [PATCH v6 10/10] mm/memory_hotplug: Cleanup __remove_pages() David Hildenbrand
2019-12-02  9:09 ` [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory David Hildenbrand
2019-12-03 13:36   ` Oscar Salvador

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git