linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 01/11] mm/memremap: Get rid of memmap_init_zone_device()
       [not found] <20190829070019.12714-1-david@redhat.com>
@ 2019-08-29  7:00 ` David Hildenbrand
  2019-08-29 16:39   ` Alexander Duyck
  2019-08-29  7:00 ` [PATCH v3 02/11] mm/memory_hotplug: Simplify shrink_pgdat_span() David Hildenbrand
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2019-08-29  7:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang, Qian Cai,
	Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Stephen Rothwell, Dave Airlie,
	Andrey Konovalov, Alexander Duyck, Ira Weiny, John Hubbard,
	Arun KS, Souptick Joarder, Robin Murphy, Yang Shi,
	Jason Gunthorpe, Logan Gunthorpe, Vlastimil Babka, Mel Gorman,
	Mike Rapoport, Alexander Potapenko

As far as I can see, the original split wanted to speed up a duplicate
initialization. We now only initialize once - and we really should
initialize under the lock, when resizing the zones.

As soon as we drop the lock we might have memory unplug running, trying
to shrink the zone again. In case the memmap was not properly initialized,
the zone/node shrinking code might get false negatives when search for
the new zone/node boundaries - bad. We suddenly could no longer span the
memory we just added.

Also, once we want to fix set_zone_contiguous(zone) inside
move_pfn_range_to_zone() to actually work with ZONE_DEVICE (instead of
always immediately stopping and never setting zone->contiguous) we have
to have the whole memmap initialized at that point. (not sure if we
want that in the future, just a note)

Let's just keep things simple and initialize the memmap when resizing
the zones under the lock.

If this is a real performance issue, we have to watch out for
alternatives.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/memory_hotplug.h |  2 +-
 include/linux/mm.h             |  4 +---
 mm/memory_hotplug.c            |  4 ++--
 mm/memremap.c                  |  9 +-------
 mm/page_alloc.c                | 42 ++++++++++++----------------------
 5 files changed, 20 insertions(+), 41 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f46ea71b4ffd..235530cdface 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -344,7 +344,7 @@ extern int __add_memory(int nid, u64 start, u64 size);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap);
+		unsigned long nr_pages, struct dev_pagemap *pgmap);
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern int sparse_add_section(int nid, unsigned long pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad6766a08f9b..2bd445c4d3b4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -962,8 +962,6 @@ static inline bool is_zone_device_page(const struct page *page)
 {
 	return page_zonenum(page) == ZONE_DEVICE;
 }
-extern void memmap_init_zone_device(struct zone *, unsigned long,
-				    unsigned long, struct dev_pagemap *);
 #else
 static inline bool is_zone_device_page(const struct page *page)
 {
@@ -2243,7 +2241,7 @@ static inline void zero_resv_unavail(void) {}
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
 extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
-		enum memmap_context, struct vmem_altmap *);
+		enum memmap_context, struct dev_pagemap *);
 extern void setup_per_zone_wmarks(void);
 extern int __meminit init_per_zone_wmark_min(void);
 extern void mem_init(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 49f7bf91c25a..35a395d195c6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -719,7 +719,7 @@ static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned lon
  * call, all affected pages are PG_reserved.
  */
 void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap)
+		unsigned long nr_pages, struct dev_pagemap *pgmap)
 {
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	int nid = pgdat->node_id;
@@ -744,7 +744,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 	 * are reserved so nobody should be touching them so we should be safe
 	 */
 	memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
-			MEMMAP_HOTPLUG, altmap);
+			 MEMMAP_HOTPLUG, pgmap);
 
 	set_zone_contiguous(zone);
 }
diff --git a/mm/memremap.c b/mm/memremap.c
index f6c17339cd0d..9ee23374e6da 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -308,20 +308,13 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 
 		zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
 		move_pfn_range_to_zone(zone, PHYS_PFN(res->start),
-				PHYS_PFN(resource_size(res)), restrictions.altmap);
+				PHYS_PFN(resource_size(res)), pgmap);
 	}
 
 	mem_hotplug_done();
 	if (error)
 		goto err_add_memory;
 
-	/*
-	 * Initialization of the pages has been deferred until now in order
-	 * to allow us to do the work while not holding the hotplug lock.
-	 */
-	memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
-				PHYS_PFN(res->start),
-				PHYS_PFN(resource_size(res)), pgmap);
 	percpu_ref_get_many(pgmap->ref, pfn_end(pgmap) - pfn_first(pgmap));
 	return __va(res->start);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5d62f1c2851..44038665fe8e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5845,7 +5845,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
  */
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		unsigned long start_pfn, enum memmap_context context,
-		struct vmem_altmap *altmap)
+		struct dev_pagemap *pgmap)
 {
 	unsigned long pfn, end_pfn = start_pfn + size;
 	struct page *page;
@@ -5853,24 +5853,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
 
-#ifdef CONFIG_ZONE_DEVICE
-	/*
-	 * Honor reservation requested by the driver for this ZONE_DEVICE
-	 * memory. We limit the total number of pages to initialize to just
-	 * those that might contain the memory mapping. We will defer the
-	 * ZONE_DEVICE page initialization until after we have released
-	 * the hotplug lock.
-	 */
-	if (zone == ZONE_DEVICE) {
-		if (!altmap)
-			return;
-
-		if (start_pfn == altmap->base_pfn)
-			start_pfn += altmap->reserve;
-		end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
-	}
-#endif
-
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		/*
 		 * There can be holes in boot-time mem_map[]s handed to this
@@ -5892,6 +5874,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		if (context == MEMMAP_HOTPLUG)
 			__SetPageReserved(page);
 
+#ifdef CONFIG_ZONE_DEVICE
+		if (zone == ZONE_DEVICE) {
+			WARN_ON_ONCE(!pgmap);
+			/*
+			 * ZONE_DEVICE pages union ->lru with a ->pgmap back
+			 * pointer and zone_device_data. It is a bug if a
+			 * ZONE_DEVICE page is ever freed or placed on a driver
+			 * private list.
+			 */
+			page->pgmap = pgmap;
+			page->zone_device_data = NULL;
+		}
+#endif
+
 		/*
 		 * Mark the block movable so that blocks are reserved for
 		 * movable at startup. This will force kernel allocations
@@ -5951,14 +5947,6 @@ void __ref memmap_init_zone_device(struct zone *zone,
 		 */
 		__SetPageReserved(page);
 
-		/*
-		 * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer
-		 * and zone_device_data.  It is a bug if a ZONE_DEVICE page is
-		 * ever freed or placed on a driver-private list.
-		 */
-		page->pgmap = pgmap;
-		page->zone_device_data = NULL;
-
 		/*
 		 * Mark the block movable so that blocks are reserved for
 		 * movable at startup. This will force kernel allocations
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 02/11] mm/memory_hotplug: Simplify shrink_pgdat_span()
       [not found] <20190829070019.12714-1-david@redhat.com>
  2019-08-29  7:00 ` [PATCH v3 01/11] mm/memremap: Get rid of memmap_init_zone_device() David Hildenbrand
@ 2019-08-29  7:00 ` David Hildenbrand
  2019-08-29  7:00 ` [PATCH v3 03/11] mm/memory_hotplug: We always have a zone in find_(smallest|biggest)_section_pfn David Hildenbrand
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2019-08-29  7:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang

Let's update it based on all the zones, which is much easier. No need to
iterate over all subsections again.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 72 ++++++++++-----------------------------------
 1 file changed, 15 insertions(+), 57 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 35a395d195c6..70cd6c59c168 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -436,67 +436,25 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 	zone_span_writeunlock(zone);
 }
 
-static void shrink_pgdat_span(struct pglist_data *pgdat,
-			      unsigned long start_pfn, unsigned long end_pfn)
+static void update_pgdat_span(struct pglist_data *pgdat)
 {
-	unsigned long pgdat_start_pfn = pgdat->node_start_pfn;
-	unsigned long p = pgdat_end_pfn(pgdat); /* pgdat_end_pfn namespace clash */
-	unsigned long pgdat_end_pfn = p;
-	unsigned long pfn;
-	int nid = pgdat->node_id;
-
-	if (pgdat_start_pfn == start_pfn) {
-		/*
-		 * If the section is smallest section in the pgdat, it need
-		 * shrink pgdat->node_start_pfn and pgdat->node_spanned_pages.
-		 * In this case, we find second smallest valid mem_section
-		 * for shrinking zone.
-		 */
-		pfn = find_smallest_section_pfn(nid, NULL, end_pfn,
-						pgdat_end_pfn);
-		if (pfn) {
-			pgdat->node_start_pfn = pfn;
-			pgdat->node_spanned_pages = pgdat_end_pfn - pfn;
-		}
-	} else if (pgdat_end_pfn == end_pfn) {
-		/*
-		 * If the section is biggest section in the pgdat, it need
-		 * shrink pgdat->node_spanned_pages.
-		 * In this case, we find second biggest valid mem_section for
-		 * shrinking zone.
-		 */
-		pfn = find_biggest_section_pfn(nid, NULL, pgdat_start_pfn,
-					       start_pfn);
-		if (pfn)
-			pgdat->node_spanned_pages = pfn - pgdat_start_pfn + 1;
-	}
-
-	/*
-	 * If the section is not biggest or smallest mem_section in the pgdat,
-	 * it only creates a hole in the pgdat. So in this case, we need not
-	 * change the pgdat.
-	 * But perhaps, the pgdat has only hole data. Thus it check the pgdat
-	 * has only hole or not.
-	 */
-	pfn = pgdat_start_pfn;
-	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUBSECTION) {
-		if (unlikely(!pfn_valid(pfn)))
-			continue;
-
-		if (pfn_to_nid(pfn) != nid)
-			continue;
+	unsigned long node_start_pfn = 0, node_end_pfn = 0;
+	struct zone *zone;
 
-		/* Skip range to be removed */
-		if (pfn >= start_pfn && pfn < end_pfn)
-			continue;
+	for (zone = pgdat->node_zones;
+	     zone < pgdat->node_zones + MAX_NR_ZONES; zone++) {
+		unsigned long zone_end_pfn = zone->zone_start_pfn +
+					     zone->spanned_pages;
 
-		/* If we find valid section, we have nothing to do */
-		return;
+		/* No need to lock the zones, they can't change. */
+		if (zone_end_pfn > node_end_pfn)
+			node_end_pfn = zone_end_pfn;
+		if (zone->zone_start_pfn < node_start_pfn)
+			node_start_pfn = zone->zone_start_pfn;
 	}
 
-	/* The pgdat has no valid section */
-	pgdat->node_start_pfn = 0;
-	pgdat->node_spanned_pages = 0;
+	pgdat->node_start_pfn = node_start_pfn;
+	pgdat->node_spanned_pages = node_end_pfn - node_start_pfn;
 }
 
 static void __remove_zone(struct zone *zone, unsigned long start_pfn,
@@ -507,7 +465,7 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn,
 
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
 	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
-	shrink_pgdat_span(pgdat, start_pfn, start_pfn + nr_pages);
+	update_pgdat_span(pgdat);
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 03/11] mm/memory_hotplug: We always have a zone in find_(smallest|biggest)_section_pfn
       [not found] <20190829070019.12714-1-david@redhat.com>
  2019-08-29  7:00 ` [PATCH v3 01/11] mm/memremap: Get rid of memmap_init_zone_device() David Hildenbrand
  2019-08-29  7:00 ` [PATCH v3 02/11] mm/memory_hotplug: Simplify shrink_pgdat_span() David Hildenbrand
@ 2019-08-29  7:00 ` David Hildenbrand
  2019-08-29  7:00 ` [PATCH v3 04/11] mm/memory_hotplug: Drop local variables in shrink_zone_span() David Hildenbrand
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2019-08-29  7:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang

With shrink_pgdat_span() out of the way, we now always have a valid
zone.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 70cd6c59c168..0a21f6f99753 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -337,7 +337,7 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
 		if (unlikely(pfn_to_nid(start_pfn) != nid))
 			continue;
 
-		if (zone && zone != page_zone(pfn_to_page(start_pfn)))
+		if (zone != page_zone(pfn_to_page(start_pfn)))
 			continue;
 
 		return start_pfn;
@@ -362,7 +362,7 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
 		if (unlikely(pfn_to_nid(pfn) != nid))
 			continue;
 
-		if (zone && zone != page_zone(pfn_to_page(pfn)))
+		if (zone != page_zone(pfn_to_page(pfn)))
 			continue;
 
 		return pfn;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 04/11] mm/memory_hotplug: Drop local variables in shrink_zone_span()
       [not found] <20190829070019.12714-1-david@redhat.com>
                   ` (2 preceding siblings ...)
  2019-08-29  7:00 ` [PATCH v3 03/11] mm/memory_hotplug: We always have a zone in find_(smallest|biggest)_section_pfn David Hildenbrand
@ 2019-08-29  7:00 ` David Hildenbrand
  2019-08-29  7:00 ` [PATCH v3 05/11] mm/memory_hotplug: Optimize zone shrinking code when checking for holes David Hildenbrand
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2019-08-29  7:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang

Especially, when checking for "all holes" we can avoid rechecking already
processed pieces (that we are removing) - use the updated zone data
instead (possibly already zero).

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0a21f6f99753..d3c34bbeb36d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -374,14 +374,11 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
 static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 			     unsigned long end_pfn)
 {
-	unsigned long zone_start_pfn = zone->zone_start_pfn;
-	unsigned long z = zone_end_pfn(zone); /* zone_end_pfn namespace clash */
-	unsigned long zone_end_pfn = z;
 	unsigned long pfn;
 	int nid = zone_to_nid(zone);
 
 	zone_span_writelock(zone);
-	if (zone_start_pfn == start_pfn) {
+	if (zone->zone_start_pfn == start_pfn) {
 		/*
 		 * If the section is smallest section in the zone, it need
 		 * shrink zone->zone_start_pfn and zone->zone_spanned_pages.
@@ -389,22 +386,29 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 		 * for shrinking zone.
 		 */
 		pfn = find_smallest_section_pfn(nid, zone, end_pfn,
-						zone_end_pfn);
+						zone_end_pfn(zone));
 		if (pfn) {
+			zone->spanned_pages = zone_end_pfn(zone) - pfn;
 			zone->zone_start_pfn = pfn;
-			zone->spanned_pages = zone_end_pfn - pfn;
+		} else {
+			zone->zone_start_pfn = 0;
+			zone->spanned_pages = 0;
 		}
-	} else if (zone_end_pfn == end_pfn) {
+	} else if (zone_end_pfn(zone) == end_pfn) {
 		/*
 		 * If the section is biggest section in the zone, it need
 		 * shrink zone->spanned_pages.
 		 * In this case, we find second biggest valid mem_section for
 		 * shrinking zone.
 		 */
-		pfn = find_biggest_section_pfn(nid, zone, zone_start_pfn,
+		pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
 					       start_pfn);
 		if (pfn)
-			zone->spanned_pages = pfn - zone_start_pfn + 1;
+			zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
+		else {
+			zone->zone_start_pfn = 0;
+			zone->spanned_pages = 0;
+		}
 	}
 
 	/*
@@ -413,8 +417,8 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 	 * change the zone. But perhaps, the zone has only hole data. Thus
 	 * it check the zone has only hole or not.
 	 */
-	pfn = zone_start_pfn;
-	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUBSECTION) {
+	for (pfn = zone->zone_start_pfn;
+	     pfn < zone_end_pfn(zone); pfn += PAGES_PER_SUBSECTION) {
 		if (unlikely(!pfn_valid(pfn)))
 			continue;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 05/11] mm/memory_hotplug: Optimize zone shrinking code when checking for holes
       [not found] <20190829070019.12714-1-david@redhat.com>
                   ` (3 preceding siblings ...)
  2019-08-29  7:00 ` [PATCH v3 04/11] mm/memory_hotplug: Drop local variables in shrink_zone_span() David Hildenbrand
@ 2019-08-29  7:00 ` David Hildenbrand
  2019-08-29  7:00 ` [PATCH v3 06/11] mm/memory_hotplug: Fix crashes in shrink_zone_span() David Hildenbrand
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2019-08-29  7:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang

... and clarify why this is needed at all right now. It all boils down
to false positives. We will try to remove the false positives for
!ZONE_DEVICE memory, soon, however, for ZONE_DEVICE memory we won't be
able to easily get rid of false positives.

Don't only detect "all holes" but try to shrink using the existing
functions we have.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 45 +++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d3c34bbeb36d..663853bf97ed 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -411,32 +411,33 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 		}
 	}
 
-	/*
-	 * The section is not biggest or smallest mem_section in the zone, it
-	 * only creates a hole in the zone. So in this case, we need not
-	 * change the zone. But perhaps, the zone has only hole data. Thus
-	 * it check the zone has only hole or not.
-	 */
-	for (pfn = zone->zone_start_pfn;
-	     pfn < zone_end_pfn(zone); pfn += PAGES_PER_SUBSECTION) {
-		if (unlikely(!pfn_valid(pfn)))
-			continue;
-
-		if (page_zone(pfn_to_page(pfn)) != zone)
-			continue;
-
-		/* Skip range to be removed */
-		if (pfn >= start_pfn && pfn < end_pfn)
-			continue;
-
-		/* If we find valid section, we have nothing to do */
+	if (!zone->spanned_pages) {
 		zone_span_writeunlock(zone);
 		return;
 	}
 
-	/* The zone has no valid section */
-	zone->zone_start_pfn = 0;
-	zone->spanned_pages = 0;
+	/*
+	 * Due to false positives in previous skrink attempts, it can happen
+	 * that we can shrink the zones further (possibly to zero). Once we
+	 * can reliably detect which PFNs actually belong to a zone
+	 * (especially for ZONE_DEVICE memory where we don't have online
+	 * sections), this can go.
+	 */
+	pfn = find_smallest_section_pfn(nid, zone, zone->zone_start_pfn,
+					zone_end_pfn(zone));
+	if (pfn) {
+		zone->spanned_pages = zone_end_pfn(zone) - pfn;
+		zone->zone_start_pfn = pfn;
+
+		pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
+					       zone_end_pfn(zone));
+		if (pfn)
+			zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
+	}
+	if (!pfn) {
+		zone->zone_start_pfn = 0;
+		zone->spanned_pages = 0;
+	}
 	zone_span_writeunlock(zone);
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 06/11] mm/memory_hotplug: Fix crashes in shrink_zone_span()
       [not found] <20190829070019.12714-1-david@redhat.com>
                   ` (4 preceding siblings ...)
  2019-08-29  7:00 ` [PATCH v3 05/11] mm/memory_hotplug: Optimize zone shrinking code when checking for holes David Hildenbrand
@ 2019-08-29  7:00 ` David Hildenbrand
  2019-08-29  7:00 ` [PATCH v3 07/11] mm/memory_hotplug: Exit early in __remove_pages() on BUGs David Hildenbrand
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2019-08-29  7:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang,
	Aneesh Kumar K . V

We can currently crash in shrink_zone_span() in case we access an
uninitialized memmap (via page_to_nid()). Root issue is that we cannot
always identify which memmap was actually initialized.

Let's improve the situation by looking only at online PFNs for
!ZONE_DEVICE memory. This is now very reliable - similar to
set_zone_contiguous(). (Side note: set_zone_contiguous() will never
succeed on ZONE_DEVICE memory right now as we have no online PFNs ...).

For ZONE_DEVICE memory, make sure we don't crash by special-casing
poisoned pages and always checking that the NID has a sane value. We
might still read garbage and get false positives, but it certainly
improves the situation.

Note: Especially subsections make it very hard to detect which parts of
a ZONE_DEVICE memmap were actually initialized - otherwise we could just
have reused SECTION_IS_ONLINE. This needs more thought.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 663853bf97ed..65b3fdf7f838 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -334,6 +334,17 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
 		if (unlikely(!pfn_valid(start_pfn)))
 			continue;
 
+		/*
+		 * TODO: There is no way we can identify whether the memmap
+		 * of ZONE_DEVICE memory was initialized. We might get
+		 * false positives when reading garbage.
+		 */
+		if (zone_idx(zone) == ZONE_DEVICE) {
+			if (PagePoisoned(pfn_to_page(start_pfn)))
+				continue;
+		} else if (!pfn_to_online_page(start_pfn))
+			continue;
+
 		if (unlikely(pfn_to_nid(start_pfn) != nid))
 			continue;
 
@@ -359,6 +370,17 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
 		if (unlikely(!pfn_valid(pfn)))
 			continue;
 
+		/*
+		 * TODO: There is no way we can identify whether the memmap
+		 * of ZONE_DEVICE memory was initialized. We might get
+		 * false positives when reading garbage.
+		 */
+		if (zone_idx(zone) == ZONE_DEVICE) {
+			if (PagePoisoned(pfn_to_page(pfn)))
+				continue;
+		} else if (!pfn_to_online_page(pfn))
+			continue;
+
 		if (unlikely(pfn_to_nid(pfn) != nid))
 			continue;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 07/11] mm/memory_hotplug: Exit early in __remove_pages() on BUGs
       [not found] <20190829070019.12714-1-david@redhat.com>
                   ` (5 preceding siblings ...)
  2019-08-29  7:00 ` [PATCH v3 06/11] mm/memory_hotplug: Fix crashes in shrink_zone_span() David Hildenbrand
@ 2019-08-29  7:00 ` David Hildenbrand
  2019-08-29  7:00 ` [PATCH v3 08/11] mm: Exit early in set_zone_contiguous() if already contiguous David Hildenbrand
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2019-08-29  7:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang

The error path should never happen in practice (unless bringing up a new
device driver, or on BUGs). However, it's clearer to not touch anything
in case we are going to return right away. Move the check/return.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 65b3fdf7f838..56eabd22cbae 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -527,13 +527,13 @@ void __remove_pages(struct zone *zone, unsigned long pfn,
 	unsigned long map_offset = 0;
 	unsigned long nr, start_sec, end_sec;
 
+	if (check_pfn_span(pfn, nr_pages, "remove"))
+		return;
+
 	map_offset = vmem_altmap_offset(altmap);
 
 	clear_zone_contiguous(zone);
 
-	if (check_pfn_span(pfn, nr_pages, "remove"))
-		return;
-
 	start_sec = pfn_to_section_nr(pfn);
 	end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
 	for (nr = start_sec; nr <= end_sec; nr++) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 08/11] mm: Exit early in set_zone_contiguous() if already contiguous
       [not found] <20190829070019.12714-1-david@redhat.com>
                   ` (6 preceding siblings ...)
  2019-08-29  7:00 ` [PATCH v3 07/11] mm/memory_hotplug: Exit early in __remove_pages() on BUGs David Hildenbrand
@ 2019-08-29  7:00 ` David Hildenbrand
  2019-08-29  7:00 ` [PATCH v3 09/11] mm/memory_hotplug: Remove pages from a zone before removing memory David Hildenbrand
  2019-08-29  7:00 ` [PATCH v3 11/11] mm/memory_hotplug: Cleanup __remove_pages() David Hildenbrand
  9 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2019-08-29  7:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Michal Hocko,
	Vlastimil Babka, Oscar Salvador, Pavel Tatashin, Mel Gorman,
	Mike Rapoport, Dan Williams, Alexander Duyck

No need to recompute in case the zone is already marked contiguous.
We will soon exploit this on the memory removal path, where we will only
clear zone->contiguous on zones that intersect with the memory to be
removed.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 44038665fe8e..a9935dc19f5b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1546,6 +1546,9 @@ void set_zone_contiguous(struct zone *zone)
 	unsigned long block_start_pfn = zone->zone_start_pfn;
 	unsigned long block_end_pfn;
 
+	if (zone->contiguous)
+		return;
+
 	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
 	for (; block_start_pfn < zone_end_pfn(zone);
 			block_start_pfn = block_end_pfn,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 09/11] mm/memory_hotplug: Remove pages from a zone before removing memory
       [not found] <20190829070019.12714-1-david@redhat.com>
                   ` (7 preceding siblings ...)
  2019-08-29  7:00 ` [PATCH v3 08/11] mm: Exit early in set_zone_contiguous() if already contiguous David Hildenbrand
@ 2019-08-29  7:00 ` David Hildenbrand
  2019-08-29  7:00 ` [PATCH v3 11/11] mm/memory_hotplug: Cleanup __remove_pages() David Hildenbrand
  9 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2019-08-29  7:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang, Qian Cai,
	Jason Gunthorpe, Logan Gunthorpe, Ira Weiny

Remove memory from the zone when offlining and when onlining
failed (we only have a single zone) in case of !ZONE_DEVICE memory. Do
the same with ZONE_DEVICE memory before removing memory. Introduce
remove_pfn_range_from_zone().

This fixes a whole bunch of BUGs we have in our code right now when
removing memory whereby
- Single memory blocks that fall into no zone (never onlined)
- Single memory blocks that fall into multiple zones (offlined+re-onlined)
- Multiple memory blocks that fall into different zones
Right now, the zones don't get updated properly in these cases. And we
can trigger kernel bugs when removing memory that was never onlined:

:/# [   23.912993] BUG: unable to handle page fault for address: 000000000000353d
[   23.914219] #PF: supervisor write access in kernel mode
[   23.915199] #PF: error_code(0x0002) - not-present page
[   23.916160] PGD 0 P4D 0
[   23.916627] Oops: 0002 [#1] SMP PTI
[   23.917256] CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317
[   23.918900] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
[   23.921194] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[   23.922249] RIP: 0010:clear_zone_contiguous+0x5/0x10
[   23.923173] Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840
[   23.926876] RSP: 0018:ffffad2400043c98 EFLAGS: 00010246
[   23.927928] RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
[   23.929458] RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40
[   23.930899] RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001
[   23.932362] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
[   23.933603] R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680
[   23.934913] FS:  0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000
[   23.936294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.937481] CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0
[   23.938687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.939889] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   23.941168] Call Trace:
[   23.941580]  __remove_pages+0x4b/0x640
[   23.942303]  ? mark_held_locks+0x49/0x70
[   23.943149]  arch_remove_memory+0x63/0x8d
[   23.943921]  try_remove_memory+0xdb/0x130
[   23.944766]  ? walk_memory_blocks+0x7f/0x9e
[   23.945616]  __remove_memory+0xa/0x11
[   23.946274]  acpi_memory_device_remove+0x70/0x100
[   23.947308]  acpi_bus_trim+0x55/0x90
[   23.947914]  acpi_device_hotplug+0x227/0x3a0
[   23.948714]  acpi_hotplug_work_fn+0x1a/0x30
[   23.949433]  process_one_work+0x221/0x550
[   23.950190]  worker_thread+0x50/0x3b0
[   23.950993]  kthread+0x105/0x140
[   23.951644]  ? process_one_work+0x550/0x550
[   23.952508]  ? kthread_park+0x80/0x80
[   23.953367]  ret_from_fork+0x3a/0x50
[   23.954025] Modules linked in:
[   23.954613] CR2: 000000000000353d
[   23.955248] ---[ end trace 93d982b1fb3e1a69 ]---

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/memory_hotplug.h |  3 +++
 mm/memory_hotplug.c            | 16 +++++++++-------
 mm/memremap.c                  |  7 ++++---
 3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 235530cdface..f27559f11b64 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -345,6 +345,9 @@ extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct dev_pagemap *pgmap);
+extern void remove_pfn_range_from_zone(struct zone *zone,
+				       unsigned long start_pfn,
+				       unsigned long nr_pages);
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern int sparse_add_section(int nid, unsigned long pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 56eabd22cbae..75859a57ecda 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -484,16 +484,21 @@ static void update_pgdat_span(struct pglist_data *pgdat)
 	pgdat->node_spanned_pages = node_end_pfn - node_start_pfn;
 }
 
-static void __remove_zone(struct zone *zone, unsigned long start_pfn,
-		unsigned long nr_pages)
+void __ref remove_pfn_range_from_zone(struct zone *zone,
+				      unsigned long start_pfn,
+				      unsigned long nr_pages)
 {
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	unsigned long flags;
 
+	clear_zone_contiguous(zone);
+
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
 	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
 	update_pgdat_span(pgdat);
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
+
+	set_zone_contiguous(zone);
 }
 
 static void __remove_section(struct zone *zone, unsigned long pfn,
@@ -505,7 +510,6 @@ static void __remove_section(struct zone *zone, unsigned long pfn,
 	if (WARN_ON_ONCE(!valid_section(ms)))
 		return;
 
-	__remove_zone(zone, pfn, nr_pages);
 	sparse_remove_section(ms, pfn, nr_pages, map_offset, altmap);
 }
 
@@ -532,8 +536,6 @@ void __remove_pages(struct zone *zone, unsigned long pfn,
 
 	map_offset = vmem_altmap_offset(altmap);
 
-	clear_zone_contiguous(zone);
-
 	start_sec = pfn_to_section_nr(pfn);
 	end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
 	for (nr = start_sec; nr <= end_sec; nr++) {
@@ -547,8 +549,6 @@ void __remove_pages(struct zone *zone, unsigned long pfn,
 		nr_pages -= pfns;
 		map_offset = 0;
 	}
-
-	set_zone_contiguous(zone);
 }
 
 int set_online_page_callback(online_page_callback_t callback)
@@ -875,6 +875,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	pr_debug("online_pages [mem %#010llx-%#010llx] failed\n",
 		 (unsigned long long) pfn << PAGE_SHIFT,
 		 (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
+	remove_pfn_range_from_zone(zone, pfn, nr_pages);
 	memory_notify(MEM_CANCEL_ONLINE, &arg);
 	mem_hotplug_done();
 	return ret;
@@ -1586,6 +1587,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	/* removal success */
+	remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
 	adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages);
 	zone->present_pages -= offlined_pages;
 
diff --git a/mm/memremap.c b/mm/memremap.c
index 9ee23374e6da..7c662643a0aa 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -125,7 +125,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
 void memunmap_pages(struct dev_pagemap *pgmap)
 {
 	struct resource *res = &pgmap->res;
-	unsigned long pfn;
+	unsigned long pfn = PHYS_PFN(res->start);
 	int nid;
 
 	dev_pagemap_kill(pgmap);
@@ -134,11 +134,12 @@ void memunmap_pages(struct dev_pagemap *pgmap)
 	dev_pagemap_cleanup(pgmap);
 
 	/* pages are dead and unused, undo the arch mapping */
-	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
+	nid = page_to_nid(pfn_to_page(pfn));
 
 	mem_hotplug_begin();
+	remove_pfn_range_from_zone(page_zone(pfn_to_page(pfn)), pfn,
+				   PHYS_PFN(resource_size(res)));
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-		pfn = PHYS_PFN(res->start);
 		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
 				 PHYS_PFN(resource_size(res)), NULL);
 	} else {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 11/11] mm/memory_hotplug: Cleanup __remove_pages()
       [not found] <20190829070019.12714-1-david@redhat.com>
                   ` (8 preceding siblings ...)
  2019-08-29  7:00 ` [PATCH v3 09/11] mm/memory_hotplug: Remove pages from a zone before removing memory David Hildenbrand
@ 2019-08-29  7:00 ` David Hildenbrand
  9 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2019-08-29  7:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Dan Williams, Wei Yang

Let's drop the basically unused section stuff and simplify.

Also, let's use a shorter variant to calculate the number of pages to
the next section boundary.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index fe29c637c0a8..da56cb57a8aa 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -527,25 +527,20 @@ static void __remove_section(unsigned long pfn, unsigned long nr_pages,
 void __remove_pages(unsigned long pfn, unsigned long nr_pages,
 		    struct vmem_altmap *altmap)
 {
+	const unsigned long end_pfn = pfn + nr_pages;
+	unsigned long cur_nr_pages;
 	unsigned long map_offset = 0;
-	unsigned long nr, start_sec, end_sec;
 
 	if (check_pfn_span(pfn, nr_pages, "remove"))
 		return;
 
 	map_offset = vmem_altmap_offset(altmap);
 
-	start_sec = pfn_to_section_nr(pfn);
-	end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
-	for (nr = start_sec; nr <= end_sec; nr++) {
-		unsigned long pfns;
-
+	for (; pfn < end_pfn; pfn += cur_nr_pages) {
 		cond_resched();
-		pfns = min(nr_pages, PAGES_PER_SECTION
-				- (pfn & ~PAGE_SECTION_MASK));
-		__remove_section(pfn, pfns, map_offset, altmap);
-		pfn += pfns;
-		nr_pages -= pfns;
+		/* Select all remaining pages up to the next section boundary */
+		cur_nr_pages = min(end_pfn - pfn, -(pfn | PAGE_SECTION_MASK));
+		__remove_section(pfn, cur_nr_pages, map_offset, altmap);
 		map_offset = 0;
 	}
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 01/11] mm/memremap: Get rid of memmap_init_zone_device()
  2019-08-29  7:00 ` [PATCH v3 01/11] mm/memremap: Get rid of memmap_init_zone_device() David Hildenbrand
@ 2019-08-29 16:39   ` Alexander Duyck
  2019-08-29 16:55     ` David Hildenbrand
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Duyck @ 2019-08-29 16:39 UTC (permalink / raw)
  To: David Hildenbrand, Dan Williams
  Cc: LKML, linux-mm, Andrew Morton, Oscar Salvador, Michal Hocko,
	Pavel Tatashin, Wei Yang, Qian Cai, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Stephen Rothwell, Dave Airlie,
	Andrey Konovalov, Alexander Duyck, Ira Weiny, John Hubbard,
	Arun KS, Souptick Joarder, Robin Murphy, Yang Shi,
	Jason Gunthorpe, Logan Gunthorpe, Vlastimil Babka, Mel Gorman,
	Mike Rapoport, Alexander Potapenko

On Thu, Aug 29, 2019 at 12:00 AM David Hildenbrand <david@redhat.com> wrote:
>
> As far as I can see, the original split wanted to speed up a duplicate
> initialization. We now only initialize once - and we really should
> initialize under the lock, when resizing the zones.

What do you mean by duplicate initialization? Basically the issue was
that we can have systems with massive memory footprints and I was just
trying to get the initialization time under a minute. The compromise I
made was to split the initialization so that we only initialized the
pages in the altmap and defer the rest so that they can be initialized
in parallel.

What this patch does is serialize the initialization so it will likely
take 2 to 4 minutes or more to initialize memory on a system where I
had brought the init time under about 30 seconds.

> As soon as we drop the lock we might have memory unplug running, trying
> to shrink the zone again. In case the memmap was not properly initialized,
> the zone/node shrinking code might get false negatives when search for
> the new zone/node boundaries - bad. We suddenly could no longer span the
> memory we just added.

The issue as I see it is that we can have systems with multiple nodes
and each node can populate a fairly significant amount of data. By
pushing this all back under the hotplug lock you are serializing the
initialization for each node resulting in a 4x or 8x increase in
initialization time on some of these bigger systems.

> Also, once we want to fix set_zone_contiguous(zone) inside
> move_pfn_range_to_zone() to actually work with ZONE_DEVICE (instead of
> always immediately stopping and never setting zone->contiguous) we have
> to have the whole memmap initialized at that point. (not sure if we
> want that in the future, just a note)
>
> Let's just keep things simple and initialize the memmap when resizing
> the zones under the lock.
>
> If this is a real performance issue, we have to watch out for
> alternatives.

My concern is that this is going to become a performance issue, but I
don't have access to the hardware at the moment to test how much of
one. I'll have to check to see if I can get access to a system to test
this patch set.

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Andrey Konovalov <andreyknvl@google.com>
> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Souptick Joarder <jrdr.linux@gmail.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: Alexander Potapenko <glider@google.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/linux/memory_hotplug.h |  2 +-
>  include/linux/mm.h             |  4 +---
>  mm/memory_hotplug.c            |  4 ++--
>  mm/memremap.c                  |  9 +-------
>  mm/page_alloc.c                | 42 ++++++++++++----------------------
>  5 files changed, 20 insertions(+), 41 deletions(-)
>
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index f46ea71b4ffd..235530cdface 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -344,7 +344,7 @@ extern int __add_memory(int nid, u64 start, u64 size);
>  extern int add_memory(int nid, u64 start, u64 size);
>  extern int add_memory_resource(int nid, struct resource *resource);
>  extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> -               unsigned long nr_pages, struct vmem_altmap *altmap);
> +               unsigned long nr_pages, struct dev_pagemap *pgmap);
>  extern bool is_memblock_offlined(struct memory_block *mem);
>  extern int sparse_add_section(int nid, unsigned long pfn,
>                 unsigned long nr_pages, struct vmem_altmap *altmap);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ad6766a08f9b..2bd445c4d3b4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -962,8 +962,6 @@ static inline bool is_zone_device_page(const struct page *page)
>  {
>         return page_zonenum(page) == ZONE_DEVICE;
>  }
> -extern void memmap_init_zone_device(struct zone *, unsigned long,
> -                                   unsigned long, struct dev_pagemap *);
>  #else
>  static inline bool is_zone_device_page(const struct page *page)
>  {
> @@ -2243,7 +2241,7 @@ static inline void zero_resv_unavail(void) {}
>
>  extern void set_dma_reserve(unsigned long new_dma_reserve);
>  extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
> -               enum memmap_context, struct vmem_altmap *);
> +               enum memmap_context, struct dev_pagemap *);
>  extern void setup_per_zone_wmarks(void);
>  extern int __meminit init_per_zone_wmark_min(void);
>  extern void mem_init(void);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 49f7bf91c25a..35a395d195c6 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -719,7 +719,7 @@ static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned lon
>   * call, all affected pages are PG_reserved.
>   */
>  void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> -               unsigned long nr_pages, struct vmem_altmap *altmap)
> +               unsigned long nr_pages, struct dev_pagemap *pgmap)
>  {
>         struct pglist_data *pgdat = zone->zone_pgdat;
>         int nid = pgdat->node_id;
> @@ -744,7 +744,7 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>          * are reserved so nobody should be touching them so we should be safe
>          */
>         memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
> -                       MEMMAP_HOTPLUG, altmap);
> +                        MEMMAP_HOTPLUG, pgmap);
>
>         set_zone_contiguous(zone);
>  }
> diff --git a/mm/memremap.c b/mm/memremap.c
> index f6c17339cd0d..9ee23374e6da 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -308,20 +308,13 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>
>                 zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
>                 move_pfn_range_to_zone(zone, PHYS_PFN(res->start),
> -                               PHYS_PFN(resource_size(res)), restrictions.altmap);
> +                               PHYS_PFN(resource_size(res)), pgmap);
>         }
>
>         mem_hotplug_done();
>         if (error)
>                 goto err_add_memory;
>
> -       /*
> -        * Initialization of the pages has been deferred until now in order
> -        * to allow us to do the work while not holding the hotplug lock.
> -        */
> -       memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
> -                               PHYS_PFN(res->start),
> -                               PHYS_PFN(resource_size(res)), pgmap);
>         percpu_ref_get_many(pgmap->ref, pfn_end(pgmap) - pfn_first(pgmap));
>         return __va(res->start);
>

So if you are moving this all under the lock then this is going to
serialize initialization and it is going to be quite expensive on bit
systems. I was only ever able to get the init time down to something
like ~20s with the optimized function. Since that has been torn apart
and you are back to doing things with memmap_init_zone we are probably
looking at more like 25-30s for each node, and on a 4 node system we
are looking at 2 minutes or so which may lead to issues if people are
mounting this.

Instead of forcing this all to be done under the hotplug lock is there
some way we could do this under the zone span_seqlock instead to
achieve the same result?

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c5d62f1c2851..44038665fe8e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5845,7 +5845,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>   */
>  void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>                 unsigned long start_pfn, enum memmap_context context,
> -               struct vmem_altmap *altmap)
> +               struct dev_pagemap *pgmap)
>  {
>         unsigned long pfn, end_pfn = start_pfn + size;
>         struct page *page;
> @@ -5853,24 +5853,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>         if (highest_memmap_pfn < end_pfn - 1)
>                 highest_memmap_pfn = end_pfn - 1;
>
> -#ifdef CONFIG_ZONE_DEVICE
> -       /*
> -        * Honor reservation requested by the driver for this ZONE_DEVICE
> -        * memory. We limit the total number of pages to initialize to just
> -        * those that might contain the memory mapping. We will defer the
> -        * ZONE_DEVICE page initialization until after we have released
> -        * the hotplug lock.
> -        */
> -       if (zone == ZONE_DEVICE) {
> -               if (!altmap)
> -                       return;
> -
> -               if (start_pfn == altmap->base_pfn)
> -                       start_pfn += altmap->reserve;
> -               end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
> -       }
> -#endif
> -
>         for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>                 /*
>                  * There can be holes in boot-time mem_map[]s handed to this
> @@ -5892,6 +5874,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>                 if (context == MEMMAP_HOTPLUG)
>                         __SetPageReserved(page);
>
> +#ifdef CONFIG_ZONE_DEVICE
> +               if (zone == ZONE_DEVICE) {
> +                       WARN_ON_ONCE(!pgmap);
> +                       /*
> +                        * ZONE_DEVICE pages union ->lru with a ->pgmap back
> +                        * pointer and zone_device_data. It is a bug if a
> +                        * ZONE_DEVICE page is ever freed or placed on a driver
> +                        * private list.
> +                        */
> +                       page->pgmap = pgmap;
> +                       page->zone_device_data = NULL;
> +               }
> +#endif
> +

So I am not sure this is right. Part of the reason for the split is
that the pages that were a part of the altmap had an LRU setup, not
the pgmap/zone_device_data setup. This is changing that.

Also, I am more a fan of just testing pgmap and if it is present then
assign page->pgmap and reset zone_device_data. Then you can avoid the
test for zone on every iteration and the WARN_ON_ONCE check, or at
least you could move it outside the loop so we don't incur the cost
with each page. Are there situations where you would have pgmap but
not a ZONE_DEVICE page?

>                 /*
>                  * Mark the block movable so that blocks are reserved for
>                  * movable at startup. This will force kernel allocations
> @@ -5951,14 +5947,6 @@ void __ref memmap_init_zone_device(struct zone *zone,
>                  */
>                 __SetPageReserved(page);
>
> -               /*
> -                * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer
> -                * and zone_device_data.  It is a bug if a ZONE_DEVICE page is
> -                * ever freed or placed on a driver-private list.
> -                */
> -               page->pgmap = pgmap;
> -               page->zone_device_data = NULL;
> -
>                 /*
>                  * Mark the block movable so that blocks are reserved for
>                  * movable at startup. This will force kernel allocations

Shouldn't you be removing this function instead of just gutting it?
I'm kind of surprised you aren't getting warnings about unused code
since you also pulled the declaration for it from the header.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 01/11] mm/memremap: Get rid of memmap_init_zone_device()
  2019-08-29 16:39   ` Alexander Duyck
@ 2019-08-29 16:55     ` David Hildenbrand
  0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2019-08-29 16:55 UTC (permalink / raw)
  To: Alexander Duyck, Dan Williams
  Cc: LKML, linux-mm, Andrew Morton, Oscar Salvador, Michal Hocko,
	Pavel Tatashin, Wei Yang, Qian Cai, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Stephen Rothwell, Dave Airlie,
	Andrey Konovalov, Alexander Duyck, Ira Weiny, John Hubbard,
	Arun KS, Souptick Joarder, Robin Murphy, Yang Shi,
	Jason Gunthorpe, Logan Gunthorpe, Vlastimil Babka, Mel Gorman,
	Mike Rapoport, Alexander Potapenko

On 29.08.19 18:39, Alexander Duyck wrote:
> On Thu, Aug 29, 2019 at 12:00 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> As far as I can see, the original split wanted to speed up a duplicate
>> initialization. We now only initialize once - and we really should
>> initialize under the lock, when resizing the zones.
> 
> What do you mean by duplicate initialization? Basically the issue was
> that we can have systems with massive memory footprints and I was just
> trying to get the initialization time under a minute. The compromise I
> made was to split the initialization so that we only initialized the
> pages in the altmap and defer the rest so that they can be initialized
> in parallel.
> 
> What this patch does is serialize the initialization so it will likely
> take 2 to 4 minutes or more to initialize memory on a system where I
> had brought the init time under about 30 seconds.
> 
>> As soon as we drop the lock we might have memory unplug running, trying
>> to shrink the zone again. In case the memmap was not properly initialized,
>> the zone/node shrinking code might get false negatives when search for
>> the new zone/node boundaries - bad. We suddenly could no longer span the
>> memory we just added.
> 
> The issue as I see it is that we can have systems with multiple nodes
> and each node can populate a fairly significant amount of data. By
> pushing this all back under the hotplug lock you are serializing the
> initialization for each node resulting in a 4x or 8x increase in
> initialization time on some of these bigger systems.
> 
>> Also, once we want to fix set_zone_contiguous(zone) inside
>> move_pfn_range_to_zone() to actually work with ZONE_DEVICE (instead of
>> always immediately stopping and never setting zone->contiguous) we have
>> to have the whole memmap initialized at that point. (not sure if we
>> want that in the future, just a note)
>>
>> Let's just keep things simple and initialize the memmap when resizing
>> the zones under the lock.
>>
>> If this is a real performance issue, we have to watch out for
>> alternatives.
> 
> My concern is that this is going to become a performance issue, but I
> don't have access to the hardware at the moment to test how much of
> one. I'll have to check to see if I can get access to a system to test
> this patch set.
> 

Thanks for having a look - I already dropped this patch again. We will
rather stop shrinking the ZONE_DEVICE. So assume this patch is gone.

[...]

> So if you are moving this all under the lock then this is going to
> serialize initialization and it is going to be quite expensive on bit
> systems. I was only ever able to get the init time down to something
> like ~20s with the optimized function. Since that has been torn apart
> and you are back to doing things with memmap_init_zone we are probably
> looking at more like 25-30s for each node, and on a 4 node system we
> are looking at 2 minutes or so which may lead to issues if people are
> mounting this.
> 
> Instead of forcing this all to be done under the hotplug lock is there
> some way we could do this under the zone span_seqlock instead to
> achieve the same result?

I guess the right approach really is as Michal suggest to not shrink at
all (at least ZONE_DEVICE) :)

> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index c5d62f1c2851..44038665fe8e 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5845,7 +5845,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>>   */
>>  void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>                 unsigned long start_pfn, enum memmap_context context,
>> -               struct vmem_altmap *altmap)
>> +               struct dev_pagemap *pgmap)
>>  {
>>         unsigned long pfn, end_pfn = start_pfn + size;
>>         struct page *page;
>> @@ -5853,24 +5853,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>         if (highest_memmap_pfn < end_pfn - 1)
>>                 highest_memmap_pfn = end_pfn - 1;
>>
>> -#ifdef CONFIG_ZONE_DEVICE
>> -       /*
>> -        * Honor reservation requested by the driver for this ZONE_DEVICE
>> -        * memory. We limit the total number of pages to initialize to just
>> -        * those that might contain the memory mapping. We will defer the
>> -        * ZONE_DEVICE page initialization until after we have released
>> -        * the hotplug lock.
>> -        */
>> -       if (zone == ZONE_DEVICE) {
>> -               if (!altmap)
>> -                       return;
>> -
>> -               if (start_pfn == altmap->base_pfn)
>> -                       start_pfn += altmap->reserve;
>> -               end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
>> -       }
>> -#endif
>> -
>>         for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>                 /*
>>                  * There can be holes in boot-time mem_map[]s handed to this
>> @@ -5892,6 +5874,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>                 if (context == MEMMAP_HOTPLUG)
>>                         __SetPageReserved(page);
>>
>> +#ifdef CONFIG_ZONE_DEVICE
>> +               if (zone == ZONE_DEVICE) {
>> +                       WARN_ON_ONCE(!pgmap);
>> +                       /*
>> +                        * ZONE_DEVICE pages union ->lru with a ->pgmap back
>> +                        * pointer and zone_device_data. It is a bug if a
>> +                        * ZONE_DEVICE page is ever freed or placed on a driver
>> +                        * private list.
>> +                        */
>> +                       page->pgmap = pgmap;
>> +                       page->zone_device_data = NULL;
>> +               }
>> +#endif
>> +
> 
> So I am not sure this is right. Part of the reason for the split is
> that the pages that were a part of the altmap had an LRU setup, not
> the pgmap/zone_device_data setup. This is changing that.

Yeah, you might be right, we would have to handle the altmap part
separately.

> 
> Also, I am more a fan of just testing pgmap and if it is present then
> assign page->pgmap and reset zone_device_data. Then you can avoid the
> test for zone on every iteration and the WARN_ON_ONCE check, or at
> least you could move it outside the loop so we don't incur the cost
> with each page. Are there situations where you would have pgmap but
> not a ZONE_DEVICE page?

Don't think so. But I am definitely not a ZONE_DEVICE expert.

> 
>>                 /*
>>                  * Mark the block movable so that blocks are reserved for
>>                  * movable at startup. This will force kernel allocations
>> @@ -5951,14 +5947,6 @@ void __ref memmap_init_zone_device(struct zone *zone,
>>                  */
>>                 __SetPageReserved(page);
>>
>> -               /*
>> -                * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer
>> -                * and zone_device_data.  It is a bug if a ZONE_DEVICE page is
>> -                * ever freed or placed on a driver-private list.
>> -                */
>> -               page->pgmap = pgmap;
>> -               page->zone_device_data = NULL;
>> -
>>                 /*
>>                  * Mark the block movable so that blocks are reserved for
>>                  * movable at startup. This will force kernel allocations
> 
> Shouldn't you be removing this function instead of just gutting it?
> I'm kind of surprised you aren't getting warnings about unused code
> since you also pulled the declaration for it from the header.
> 

Don't ask me what went wrong here, I guess I messed this up while rebasing.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-08-29 16:55 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190829070019.12714-1-david@redhat.com>
2019-08-29  7:00 ` [PATCH v3 01/11] mm/memremap: Get rid of memmap_init_zone_device() David Hildenbrand
2019-08-29 16:39   ` Alexander Duyck
2019-08-29 16:55     ` David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 02/11] mm/memory_hotplug: Simplify shrink_pgdat_span() David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 03/11] mm/memory_hotplug: We always have a zone in find_(smallest|biggest)_section_pfn David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 04/11] mm/memory_hotplug: Drop local variables in shrink_zone_span() David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 05/11] mm/memory_hotplug: Optimize zone shrinking code when checking for holes David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 06/11] mm/memory_hotplug: Fix crashes in shrink_zone_span() David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 07/11] mm/memory_hotplug: Exit early in __remove_pages() on BUGs David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 08/11] mm: Exit early in set_zone_contiguous() if already contiguous David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 09/11] mm/memory_hotplug: Remove pages from a zone before removing memory David Hildenbrand
2019-08-29  7:00 ` [PATCH v3 11/11] mm/memory_hotplug: Cleanup __remove_pages() David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).