linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 01/10] mm/memunmap: Use the correct start and end pfn when removing pages from zone
       [not found] <20191001144011.3801-1-david@redhat.com>
@ 2019-10-01 14:40 ` David Hildenbrand
  2019-10-01 14:57   ` David Hildenbrand
  2019-10-01 14:40 ` [PATCH v5 02/10] mm/memmap_init: Update variable name in memmap_init_zone David Hildenbrand
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2019-10-01 14:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, linux-ia64, Ira Weiny, linux-sh, Jason Gunthorpe,
	Aneesh Kumar K.V, Pankaj Gupta, David Hildenbrand, linux-mm,
	Logan Gunthorpe, Dan Williams, linuxppc-dev, Andrew Morton,
	linux-arm-kernel

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>

With altmap, all the resource pfns are not initialized. While initializing
pfn, altmap reserve space is skipped. Hence when removing pfn from zone
skip pfns that were never initialized.

Update memunmap_pages to calculate start and end pfn based on altmap
values. This fixes a kernel crash that is observed when destroying
a namespace.

[   81.356173] kernel BUG at include/linux/mm.h:1107!
cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
    pc: c0000000004b9728: memunmap_pages+0x238/0x340
    lr: c0000000004b9724: memunmap_pages+0x234/0x340
...
    pid   = 3669, comm = ndctl
kernel BUG at include/linux/mm.h:1107!
[c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
[c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
[c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
[c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
[c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
[c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
[c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
[c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
[c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
[c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
[c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[ move all pfn-realted declarations into a single line ]
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memremap.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/memremap.c b/mm/memremap.c
index 557e53c6fb46..026788b2ac69 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -123,7 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
 void memunmap_pages(struct dev_pagemap *pgmap)
 {
 	struct resource *res = &pgmap->res;
-	unsigned long pfn;
+	unsigned long pfn, nr_pages, start_pfn, end_pfn;
 	int nid;
 
 	dev_pagemap_kill(pgmap);
@@ -131,14 +131,17 @@ void memunmap_pages(struct dev_pagemap *pgmap)
 		put_page(pfn_to_page(pfn));
 	dev_pagemap_cleanup(pgmap);
 
+	start_pfn = pfn_first(pgmap);
+	end_pfn = pfn_end(pgmap);
+	nr_pages = end_pfn - start_pfn;
+
 	/* pages are dead and unused, undo the arch mapping */
-	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
+	nid = page_to_nid(pfn_to_page(start_pfn));
 
 	mem_hotplug_begin();
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-		pfn = PHYS_PFN(res->start);
-		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
-				 PHYS_PFN(resource_size(res)), NULL);
+		__remove_pages(page_zone(pfn_to_page(start_pfn)), start_pfn,
+			       nr_pages, NULL);
 	} else {
 		arch_remove_memory(nid, res->start, resource_size(res),
 				pgmap_altmap(pgmap));
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 02/10] mm/memmap_init: Update variable name in memmap_init_zone
       [not found] <20191001144011.3801-1-david@redhat.com>
  2019-10-01 14:40 ` [PATCH v5 01/10] mm/memunmap: Use the correct start and end pfn when removing pages from zone David Hildenbrand
@ 2019-10-01 14:40 ` David Hildenbrand
  2019-10-01 14:40 ` [PATCH v5 03/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_pgdat_span() David Hildenbrand
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2019-10-01 14:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, Michal Hocko, linux-ia64, linux-sh, Pankaj Gupta,
	Aneesh Kumar K.V, Alexander Duyck, David Hildenbrand, Mel Gorman,
	Mike Rapoport, linux-mm, Pavel Tatashin, Alexander Potapenko,
	Vlastimil Babka, Andrew Morton, linuxppc-dev, Dan Williams,
	linux-arm-kernel, Oscar Salvador

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>

The third argument is actually number of pages. Changes the variable name
from size to nr_pages to indicate this better.

No functional change in this patch.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Alexander Potapenko <glider@google.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 15c2050c629b..b0b2d5464000 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5936,10 +5936,10 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 #ifdef CONFIG_ZONE_DEVICE
 void __ref memmap_init_zone_device(struct zone *zone,
 				   unsigned long start_pfn,
-				   unsigned long size,
+				   unsigned long nr_pages,
 				   struct dev_pagemap *pgmap)
 {
-	unsigned long pfn, end_pfn = start_pfn + size;
+	unsigned long pfn, end_pfn = start_pfn + nr_pages;
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	struct vmem_altmap *altmap = pgmap_altmap(pgmap);
 	unsigned long zone_idx = zone_idx(zone);
@@ -5956,7 +5956,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
 	 */
 	if (altmap) {
 		start_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
-		size = end_pfn - start_pfn;
+		nr_pages = end_pfn - start_pfn;
 	}
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
@@ -6003,7 +6003,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
 	}
 
 	pr_info("%s initialised %lu pages in %ums\n", __func__,
-		size, jiffies_to_msecs(jiffies - start));
+		nr_pages, jiffies_to_msecs(jiffies - start));
 }
 
 #endif
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 03/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_pgdat_span()
       [not found] <20191001144011.3801-1-david@redhat.com>
  2019-10-01 14:40 ` [PATCH v5 01/10] mm/memunmap: Use the correct start and end pfn when removing pages from zone David Hildenbrand
  2019-10-01 14:40 ` [PATCH v5 02/10] mm/memmap_init: Update variable name in memmap_init_zone David Hildenbrand
@ 2019-10-01 14:40 ` David Hildenbrand
  2019-10-01 14:40 ` [PATCH v5 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span() David Hildenbrand
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2019-10-01 14:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, linux-sh,
	Aneesh Kumar K . V, David Hildenbrand, linux-mm, Wei Yang,
	Andrew Morton, linuxppc-dev, Dan Williams, linux-arm-kernel,
	Oscar Salvador

We might use the nid of memmaps that were never initialized. For
example, if the memmap was poisoned, we will crash the kernel in
pfn_to_nid() right now. Let's use the calculated boundaries of the separate
zones instead. This now also avoids having to iterate over a whole bunch of
subsections again, after shrinking one zone.

Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized to 0 and the node was set to the
right value. After that commit, the node might be garbage.

We'll have to fix shrink_zone_span() next.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 72 ++++++++++-----------------------------------
 1 file changed, 15 insertions(+), 57 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 680b4b3e57d9..86b4dc18e831 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -436,67 +436,25 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 	zone_span_writeunlock(zone);
 }
 
-static void shrink_pgdat_span(struct pglist_data *pgdat,
-			      unsigned long start_pfn, unsigned long end_pfn)
+static void update_pgdat_span(struct pglist_data *pgdat)
 {
-	unsigned long pgdat_start_pfn = pgdat->node_start_pfn;
-	unsigned long p = pgdat_end_pfn(pgdat); /* pgdat_end_pfn namespace clash */
-	unsigned long pgdat_end_pfn = p;
-	unsigned long pfn;
-	int nid = pgdat->node_id;
-
-	if (pgdat_start_pfn == start_pfn) {
-		/*
-		 * If the section is smallest section in the pgdat, it need
-		 * shrink pgdat->node_start_pfn and pgdat->node_spanned_pages.
-		 * In this case, we find second smallest valid mem_section
-		 * for shrinking zone.
-		 */
-		pfn = find_smallest_section_pfn(nid, NULL, end_pfn,
-						pgdat_end_pfn);
-		if (pfn) {
-			pgdat->node_start_pfn = pfn;
-			pgdat->node_spanned_pages = pgdat_end_pfn - pfn;
-		}
-	} else if (pgdat_end_pfn == end_pfn) {
-		/*
-		 * If the section is biggest section in the pgdat, it need
-		 * shrink pgdat->node_spanned_pages.
-		 * In this case, we find second biggest valid mem_section for
-		 * shrinking zone.
-		 */
-		pfn = find_biggest_section_pfn(nid, NULL, pgdat_start_pfn,
-					       start_pfn);
-		if (pfn)
-			pgdat->node_spanned_pages = pfn - pgdat_start_pfn + 1;
-	}
-
-	/*
-	 * If the section is not biggest or smallest mem_section in the pgdat,
-	 * it only creates a hole in the pgdat. So in this case, we need not
-	 * change the pgdat.
-	 * But perhaps, the pgdat has only hole data. Thus it check the pgdat
-	 * has only hole or not.
-	 */
-	pfn = pgdat_start_pfn;
-	for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUBSECTION) {
-		if (unlikely(!pfn_valid(pfn)))
-			continue;
-
-		if (pfn_to_nid(pfn) != nid)
-			continue;
+	unsigned long node_start_pfn = 0, node_end_pfn = 0;
+	struct zone *zone;
 
-		/* Skip range to be removed */
-		if (pfn >= start_pfn && pfn < end_pfn)
-			continue;
+	for (zone = pgdat->node_zones;
+	     zone < pgdat->node_zones + MAX_NR_ZONES; zone++) {
+		unsigned long zone_end_pfn = zone->zone_start_pfn +
+					     zone->spanned_pages;
 
-		/* If we find valid section, we have nothing to do */
-		return;
+		/* No need to lock the zones, they can't change. */
+		if (zone_end_pfn > node_end_pfn)
+			node_end_pfn = zone_end_pfn;
+		if (zone->zone_start_pfn < node_start_pfn)
+			node_start_pfn = zone->zone_start_pfn;
 	}
 
-	/* The pgdat has no valid section */
-	pgdat->node_start_pfn = 0;
-	pgdat->node_spanned_pages = 0;
+	pgdat->node_start_pfn = node_start_pfn;
+	pgdat->node_spanned_pages = node_end_pfn - node_start_pfn;
 }
 
 static void __remove_zone(struct zone *zone, unsigned long start_pfn,
@@ -507,7 +465,7 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn,
 
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
 	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
-	shrink_pgdat_span(pgdat, start_pfn, start_pfn + nr_pages);
+	update_pgdat_span(pgdat);
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
 }
 
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()
       [not found] <20191001144011.3801-1-david@redhat.com>
                   ` (2 preceding siblings ...)
  2019-10-01 14:40 ` [PATCH v5 03/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_pgdat_span() David Hildenbrand
@ 2019-10-01 14:40 ` David Hildenbrand
  2019-10-01 23:47   ` kbuild test robot
  2019-10-02  0:06   ` kbuild test robot
  2019-10-01 14:40 ` [PATCH v5 06/10] mm/memory_hotplug: Poison memmap in remove_pfn_range_from_zone() David Hildenbrand
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 20+ messages in thread
From: David Hildenbrand @ 2019-10-01 14:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, linux-sh,
	Aneesh Kumar K . V, David Hildenbrand, linux-mm, Andrew Morton,
	linuxppc-dev, Dan Williams, linux-arm-kernel, Oscar Salvador

Let's limit shrinking to !ZONE_DEVICE so we can fix the current code. We
should never try to touch the memmap of offline sections where we could
have uninitialized memmaps and could trigger BUGs when calling
page_to_nid() on poisoned pages.

There is no reliable way to distinguish an uninitialized memmap from an
initialized memmap that belongs to ZONE_DEVICE, as we don't have
anything like SECTION_IS_ONLINE we can use similar to
pfn_to_online_section() for !ZONE_DEVICE memory. E.g.,
set_zone_contiguous() similarly relies on pfn_to_online_section() and
will therefore never set a ZONE_DEVICE zone consecutive. Stopping to
shrink the ZONE_DEVICE therefore results in no observable changes,
besides /proc/zoneinfo indicating different boundaries - something we
can totally live with.

Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized with 0 and the node with the
right value. So the zone might be wrong but not garbage. After that
commit, both the zone and the node will be garbage when touching
uninitialized memmaps.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 86b4dc18e831..afed8331332b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -331,7 +331,7 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
 				     unsigned long end_pfn)
 {
 	for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUBSECTION) {
-		if (unlikely(!pfn_valid(start_pfn)))
+		if (unlikely(!pfn_to_online_page(start_pfn)))
 			continue;
 
 		if (unlikely(pfn_to_nid(start_pfn) != nid))
@@ -356,7 +356,7 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
 	/* pfn is the end pfn of a memory section. */
 	pfn = end_pfn - 1;
 	for (; pfn >= start_pfn; pfn -= PAGES_PER_SUBSECTION) {
-		if (unlikely(!pfn_valid(pfn)))
+		if (unlikely(!pfn_to_online_page(pfn)))
 			continue;
 
 		if (unlikely(pfn_to_nid(pfn) != nid))
@@ -415,7 +415,7 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 	 */
 	pfn = zone_start_pfn;
 	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUBSECTION) {
-		if (unlikely(!pfn_valid(pfn)))
+		if (unlikely(!pfn_to_online_page(pfn)))
 			continue;
 
 		if (page_zone(pfn_to_page(pfn)) != zone)
@@ -463,6 +463,14 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn,
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	unsigned long flags;
 
+	/*
+	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
+	 * we will not try to shrink the zones - which is okay as
+	 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+	 */
+	if (zone_idx(zone) == ZONE_DEVICE)
+		return;
+
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
 	shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
 	update_pgdat_span(pgdat);
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 06/10] mm/memory_hotplug: Poison memmap in remove_pfn_range_from_zone()
       [not found] <20191001144011.3801-1-david@redhat.com>
                   ` (3 preceding siblings ...)
  2019-10-01 14:40 ` [PATCH v5 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span() David Hildenbrand
@ 2019-10-01 14:40 ` David Hildenbrand
  2019-10-01 14:40 ` [PATCH v5 07/10] mm/memory_hotplug: We always have a zone in find_(smallest|biggest)_section_pfn David Hildenbrand
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2019-10-01 14:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, linux-sh,
	David Hildenbrand, linux-mm, Andrew Morton, linuxppc-dev,
	Dan Williams, linux-arm-kernel, Oscar Salvador

Let's poison the pages similar to when adding new memory in
sparse_add_section(). Also call remove_pfn_range_from_zone() from
memunmap_pages(), so we can poison the memmap from there as well.

While at it, calculate the pfn in memunmap_pages() only once.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 3 +++
 mm/memremap.c       | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index cef909ebd807..640309236a58 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -464,6 +464,9 @@ void __ref remove_pfn_range_from_zone(struct zone *zone,
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	unsigned long flags;
 
+	/* Poison struct pages because they are now uninitialized again. */
+	page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages);
+
 	/*
 	 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
 	 * we will not try to shrink the zones - which is okay as
diff --git a/mm/memremap.c b/mm/memremap.c
index 734afeaad811..371939f92b69 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -139,6 +139,8 @@ void memunmap_pages(struct dev_pagemap *pgmap)
 	nid = page_to_nid(pfn_to_page(start_pfn));
 
 	mem_hotplug_begin();
+	remove_pfn_range_from_zone(page_zone(pfn_to_page(start_pfn)),
+				   start_pfn, nr_pages);
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
 		__remove_pages(start_pfn, nr_pages, NULL);
 	} else {
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 07/10] mm/memory_hotplug: We always have a zone in find_(smallest|biggest)_section_pfn
       [not found] <20191001144011.3801-1-david@redhat.com>
                   ` (4 preceding siblings ...)
  2019-10-01 14:40 ` [PATCH v5 06/10] mm/memory_hotplug: Poison memmap in remove_pfn_range_from_zone() David Hildenbrand
@ 2019-10-01 14:40 ` David Hildenbrand
  2019-10-01 14:40 ` [PATCH v5 08/10] mm/memory_hotplug: Don't check for "all holes" in shrink_zone_span() David Hildenbrand
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2019-10-01 14:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, linux-sh,
	David Hildenbrand, linux-mm, Wei Yang, Andrew Morton,
	linuxppc-dev, Dan Williams, linux-arm-kernel, Oscar Salvador

With shrink_pgdat_span() out of the way, we now always have a valid
zone.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 640309236a58..273833612774 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -337,7 +337,7 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
 		if (unlikely(pfn_to_nid(start_pfn) != nid))
 			continue;
 
-		if (zone && zone != page_zone(pfn_to_page(start_pfn)))
+		if (zone != page_zone(pfn_to_page(start_pfn)))
 			continue;
 
 		return start_pfn;
@@ -362,7 +362,7 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
 		if (unlikely(pfn_to_nid(pfn) != nid))
 			continue;
 
-		if (zone && zone != page_zone(pfn_to_page(pfn)))
+		if (zone != page_zone(pfn_to_page(pfn)))
 			continue;
 
 		return pfn;
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 08/10] mm/memory_hotplug: Don't check for "all holes" in shrink_zone_span()
       [not found] <20191001144011.3801-1-david@redhat.com>
                   ` (5 preceding siblings ...)
  2019-10-01 14:40 ` [PATCH v5 07/10] mm/memory_hotplug: We always have a zone in find_(smallest|biggest)_section_pfn David Hildenbrand
@ 2019-10-01 14:40 ` David Hildenbrand
  2019-10-01 14:40 ` [PATCH v5 09/10] mm/memory_hotplug: Drop local variables " David Hildenbrand
  2019-10-01 14:40 ` [PATCH v5 10/10] mm/memory_hotplug: Cleanup __remove_pages() David Hildenbrand
  8 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2019-10-01 14:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, linux-sh,
	David Hildenbrand, linux-mm, Wei Yang, Andrew Morton,
	linuxppc-dev, Dan Williams, linux-arm-kernel, Oscar Salvador

If we have holes, the holes will automatically get detected and removed
once we remove the next bigger/smaller section. The extra checks can
go.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 34 +++++++---------------------------
 1 file changed, 7 insertions(+), 27 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 273833612774..22b3623768c8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -393,6 +393,9 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 		if (pfn) {
 			zone->zone_start_pfn = pfn;
 			zone->spanned_pages = zone_end_pfn - pfn;
+		} else {
+			zone->zone_start_pfn = 0;
+			zone->spanned_pages = 0;
 		}
 	} else if (zone_end_pfn == end_pfn) {
 		/*
@@ -405,34 +408,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 					       start_pfn);
 		if (pfn)
 			zone->spanned_pages = pfn - zone_start_pfn + 1;
+		else {
+			zone->zone_start_pfn = 0;
+			zone->spanned_pages = 0;
+		}
 	}
-
-	/*
-	 * The section is not biggest or smallest mem_section in the zone, it
-	 * only creates a hole in the zone. So in this case, we need not
-	 * change the zone. But perhaps, the zone has only hole data. Thus
-	 * it check the zone has only hole or not.
-	 */
-	pfn = zone_start_pfn;
-	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUBSECTION) {
-		if (unlikely(!pfn_to_online_page(pfn)))
-			continue;
-
-		if (page_zone(pfn_to_page(pfn)) != zone)
-			continue;
-
-		/* Skip range to be removed */
-		if (pfn >= start_pfn && pfn < end_pfn)
-			continue;
-
-		/* If we find valid section, we have nothing to do */
-		zone_span_writeunlock(zone);
-		return;
-	}
-
-	/* The zone has no valid section */
-	zone->zone_start_pfn = 0;
-	zone->spanned_pages = 0;
 	zone_span_writeunlock(zone);
 }
 
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 09/10] mm/memory_hotplug: Drop local variables in shrink_zone_span()
       [not found] <20191001144011.3801-1-david@redhat.com>
                   ` (6 preceding siblings ...)
  2019-10-01 14:40 ` [PATCH v5 08/10] mm/memory_hotplug: Don't check for "all holes" in shrink_zone_span() David Hildenbrand
@ 2019-10-01 14:40 ` David Hildenbrand
  2019-10-01 14:40 ` [PATCH v5 10/10] mm/memory_hotplug: Cleanup __remove_pages() David Hildenbrand
  8 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2019-10-01 14:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, linux-sh,
	David Hildenbrand, linux-mm, Wei Yang, Andrew Morton,
	linuxppc-dev, Dan Williams, linux-arm-kernel, Oscar Salvador

Get rid of the unnecessary local variables.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 22b3623768c8..ffb514e3b090 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -374,14 +374,11 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
 static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 			     unsigned long end_pfn)
 {
-	unsigned long zone_start_pfn = zone->zone_start_pfn;
-	unsigned long z = zone_end_pfn(zone); /* zone_end_pfn namespace clash */
-	unsigned long zone_end_pfn = z;
 	unsigned long pfn;
 	int nid = zone_to_nid(zone);
 
 	zone_span_writelock(zone);
-	if (zone_start_pfn == start_pfn) {
+	if (zone->zone_start_pfn == start_pfn) {
 		/*
 		 * If the section is smallest section in the zone, it need
 		 * shrink zone->zone_start_pfn and zone->zone_spanned_pages.
@@ -389,25 +386,25 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 		 * for shrinking zone.
 		 */
 		pfn = find_smallest_section_pfn(nid, zone, end_pfn,
-						zone_end_pfn);
+						zone_end_pfn(zone));
 		if (pfn) {
+			zone->spanned_pages = zone_end_pfn(zone) - pfn;
 			zone->zone_start_pfn = pfn;
-			zone->spanned_pages = zone_end_pfn - pfn;
 		} else {
 			zone->zone_start_pfn = 0;
 			zone->spanned_pages = 0;
 		}
-	} else if (zone_end_pfn == end_pfn) {
+	} else if (zone_end_pfn(zone) == end_pfn) {
 		/*
 		 * If the section is biggest section in the zone, it need
 		 * shrink zone->spanned_pages.
 		 * In this case, we find second biggest valid mem_section for
 		 * shrinking zone.
 		 */
-		pfn = find_biggest_section_pfn(nid, zone, zone_start_pfn,
+		pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
 					       start_pfn);
 		if (pfn)
-			zone->spanned_pages = pfn - zone_start_pfn + 1;
+			zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
 		else {
 			zone->zone_start_pfn = 0;
 			zone->spanned_pages = 0;
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 10/10] mm/memory_hotplug: Cleanup __remove_pages()
       [not found] <20191001144011.3801-1-david@redhat.com>
                   ` (7 preceding siblings ...)
  2019-10-01 14:40 ` [PATCH v5 09/10] mm/memory_hotplug: Drop local variables " David Hildenbrand
@ 2019-10-01 14:40 ` David Hildenbrand
  8 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2019-10-01 14:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, linux-sh,
	David Hildenbrand, linux-mm, Wei Yang, Andrew Morton,
	linuxppc-dev, Dan Williams, linux-arm-kernel, Oscar Salvador

Let's drop the basically unused section stuff and simplify.

Also, let's use a shorter variant to calculate the number of pages to
the next section boundary.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ffb514e3b090..0fa99e5a657e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -488,25 +488,20 @@ static void __remove_section(unsigned long pfn, unsigned long nr_pages,
 void __remove_pages(unsigned long pfn, unsigned long nr_pages,
 		    struct vmem_altmap *altmap)
 {
+	const unsigned long end_pfn = pfn + nr_pages;
+	unsigned long cur_nr_pages;
 	unsigned long map_offset = 0;
-	unsigned long nr, start_sec, end_sec;
 
 	map_offset = vmem_altmap_offset(altmap);
 
 	if (check_pfn_span(pfn, nr_pages, "remove"))
 		return;
 
-	start_sec = pfn_to_section_nr(pfn);
-	end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
-	for (nr = start_sec; nr <= end_sec; nr++) {
-		unsigned long pfns;
-
+	for (; pfn < end_pfn; pfn += cur_nr_pages) {
 		cond_resched();
-		pfns = min(nr_pages, PAGES_PER_SECTION
-				- (pfn & ~PAGE_SECTION_MASK));
-		__remove_section(pfn, pfns, map_offset, altmap);
-		pfn += pfns;
-		nr_pages -= pfns;
+		/* Select all remaining pages up to the next section boundary */
+		cur_nr_pages = min(end_pfn - pfn, -(pfn | PAGE_SECTION_MASK));
+		__remove_section(pfn, cur_nr_pages, map_offset, altmap);
 		map_offset = 0;
 	}
 }
-- 
2.21.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 01/10] mm/memunmap: Use the correct start and end pfn when removing pages from zone
  2019-10-01 14:40 ` [PATCH v5 01/10] mm/memunmap: Use the correct start and end pfn when removing pages from zone David Hildenbrand
@ 2019-10-01 14:57   ` David Hildenbrand
  2019-10-01 15:03     ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2019-10-01 14:57 UTC (permalink / raw)
  To: linux-kernel, Dan Williams
  Cc: linux-s390, linux-ia64, Ira Weiny, linux-sh, Jason Gunthorpe,
	Aneesh Kumar K.V, Pankaj Gupta, linux-mm, Logan Gunthorpe,
	Andrew Morton, linuxppc-dev, linux-arm-kernel

On 01.10.19 16:40, David Hildenbrand wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> 
> With altmap, all the resource pfns are not initialized. While initializing
> pfn, altmap reserve space is skipped. Hence when removing pfn from zone
> skip pfns that were never initialized.
> 
> Update memunmap_pages to calculate start and end pfn based on altmap
> values. This fixes a kernel crash that is observed when destroying
> a namespace.
> 
> [   81.356173] kernel BUG at include/linux/mm.h:1107!
> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>     pc: c0000000004b9728: memunmap_pages+0x238/0x340
>     lr: c0000000004b9724: memunmap_pages+0x234/0x340
> ...
>     pid   = 3669, comm = ndctl
> kernel BUG at include/linux/mm.h:1107!
> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> [ move all pfn-realted declarations into a single line ]
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memremap.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 557e53c6fb46..026788b2ac69 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -123,7 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>  void memunmap_pages(struct dev_pagemap *pgmap)
>  {
>  	struct resource *res = &pgmap->res;
> -	unsigned long pfn;
> +	unsigned long pfn, nr_pages, start_pfn, end_pfn;
>  	int nid;
>  
>  	dev_pagemap_kill(pgmap);
> @@ -131,14 +131,17 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>  		put_page(pfn_to_page(pfn));
>  	dev_pagemap_cleanup(pgmap);
>  
> +	start_pfn = pfn_first(pgmap);
> +	end_pfn = pfn_end(pgmap);
> +	nr_pages = end_pfn - start_pfn;
> +
>  	/* pages are dead and unused, undo the arch mapping */
> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
> +	nid = page_to_nid(pfn_to_page(start_pfn));
>  
>  	mem_hotplug_begin();
>  	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> -		pfn = PHYS_PFN(res->start);
> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
> -				 PHYS_PFN(resource_size(res)), NULL);
> +		__remove_pages(page_zone(pfn_to_page(start_pfn)), start_pfn,
> +			       nr_pages, NULL);
>  	} else {
>  		arch_remove_memory(nid, res->start, resource_size(res),
>  				pgmap_altmap(pgmap));
> 

Aneesh, I was wondering why the use of "res->start" is correct (and we
shouldn't also witch to start_pfn/nr_pages here. It would be good if Dan
could review.

-- 

Thanks,

David / dhildenb

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 01/10] mm/memunmap: Use the correct start and end pfn when removing pages from zone
  2019-10-01 14:57   ` David Hildenbrand
@ 2019-10-01 15:03     ` David Hildenbrand
  2019-10-03 16:48       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2019-10-01 15:03 UTC (permalink / raw)
  To: linux-kernel, Dan Williams
  Cc: linux-s390, linux-ia64, Ira Weiny, linux-sh, Jason Gunthorpe,
	Aneesh Kumar K.V, Pankaj Gupta, linux-mm, Logan Gunthorpe,
	Andrew Morton, linuxppc-dev, linux-arm-kernel

On 01.10.19 16:57, David Hildenbrand wrote:
> On 01.10.19 16:40, David Hildenbrand wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>
>> With altmap, all the resource pfns are not initialized. While initializing
>> pfn, altmap reserve space is skipped. Hence when removing pfn from zone
>> skip pfns that were never initialized.
>>
>> Update memunmap_pages to calculate start and end pfn based on altmap
>> values. This fixes a kernel crash that is observed when destroying
>> a namespace.
>>
>> [   81.356173] kernel BUG at include/linux/mm.h:1107!
>> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>>     pc: c0000000004b9728: memunmap_pages+0x238/0x340
>>     lr: c0000000004b9724: memunmap_pages+0x234/0x340
>> ...
>>     pid   = 3669, comm = ndctl
>> kernel BUG at include/linux/mm.h:1107!
>> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
>> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
>> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
>> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
>> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
>> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
>> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
>> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
>> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
>> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
>> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
>>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>> Cc: Logan Gunthorpe <logang@deltatee.com>
>> Cc: Ira Weiny <ira.weiny@intel.com>
>> Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> [ move all pfn-realted declarations into a single line ]
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  mm/memremap.c | 13 ++++++++-----
>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memremap.c b/mm/memremap.c
>> index 557e53c6fb46..026788b2ac69 100644
>> --- a/mm/memremap.c
>> +++ b/mm/memremap.c
>> @@ -123,7 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>>  void memunmap_pages(struct dev_pagemap *pgmap)
>>  {
>>  	struct resource *res = &pgmap->res;
>> -	unsigned long pfn;
>> +	unsigned long pfn, nr_pages, start_pfn, end_pfn;
>>  	int nid;
>>  
>>  	dev_pagemap_kill(pgmap);
>> @@ -131,14 +131,17 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>  		put_page(pfn_to_page(pfn));
>>  	dev_pagemap_cleanup(pgmap);
>>  
>> +	start_pfn = pfn_first(pgmap);
>> +	end_pfn = pfn_end(pgmap);
>> +	nr_pages = end_pfn - start_pfn;
>> +
>>  	/* pages are dead and unused, undo the arch mapping */
>> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
>> +	nid = page_to_nid(pfn_to_page(start_pfn));
>>  
>>  	mem_hotplug_begin();
>>  	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>> -		pfn = PHYS_PFN(res->start);
>> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
>> -				 PHYS_PFN(resource_size(res)), NULL);
>> +		__remove_pages(page_zone(pfn_to_page(start_pfn)), start_pfn,
>> +			       nr_pages, NULL);
>>  	} else {
>>  		arch_remove_memory(nid, res->start, resource_size(res),
>>  				pgmap_altmap(pgmap));
>>
> 
> Aneesh, I was wondering why the use of "res->start" is correct (and we
> shouldn't also witch to start_pfn/nr_pages here. It would be good if Dan
> could review.
> 

To be more precise, I wonder if it should actually be

__remove_pages(page_zone(pfn_to_page(start_pfn)), res->start,
               resource_size(res))

IOW, keep calling __remove_pages() with the same parameters but read
nid/zone from the offset one.

Hope some memunmap_pages() expert can clarify.

-- 

Thanks,

David / dhildenb

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()
  2019-10-01 14:40 ` [PATCH v5 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span() David Hildenbrand
@ 2019-10-01 23:47   ` kbuild test robot
  2019-10-02  0:06   ` kbuild test robot
  1 sibling, 0 replies; 20+ messages in thread
From: kbuild test robot @ 2019-10-01 23:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, linux-sh,
	Aneesh Kumar K . V, David Hildenbrand, linux-kernel, linux-mm,
	kbuild-all, Andrew Morton, linuxppc-dev, Dan Williams,
	linux-arm-kernel, Oscar Salvador

[-- Attachment #1: Type: text/plain, Size: 1962 bytes --]

Hi David,

I love your patch! Yet something to improve:

[auto build test ERROR on mmotm/master]

url:    https://github.com/0day-ci/linux/commits/David-Hildenbrand/mm-memory_hotplug-Shrink-zones-before-removing-memory/20191002-054310
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: i386-randconfig-g004-201939 (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   mm/memory_hotplug.c: In function '__remove_zone':
>> mm/memory_hotplug.c:471:24: error: 'ZONE_DEVICE' undeclared (first use in this function); did you mean 'ZONE_MOVABLE'?
     if (zone_idx(zone) == ZONE_DEVICE)
                           ^~~~~~~~~~~
                           ZONE_MOVABLE
   mm/memory_hotplug.c:471:24: note: each undeclared identifier is reported only once for each function it appears in

vim +471 mm/memory_hotplug.c

   459	
   460	static void __remove_zone(struct zone *zone, unsigned long start_pfn,
   461			unsigned long nr_pages)
   462	{
   463		struct pglist_data *pgdat = zone->zone_pgdat;
   464		unsigned long flags;
   465	
   466		/*
   467		 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
   468		 * we will not try to shrink the zones - which is okay as
   469		 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
   470		 */
 > 471		if (zone_idx(zone) == ZONE_DEVICE)
   472			return;
   473	
   474		pgdat_resize_lock(zone->zone_pgdat, &flags);
   475		shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
   476		update_pgdat_span(pgdat);
   477		pgdat_resize_unlock(zone->zone_pgdat, &flags);
   478	}
   479	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29129 bytes --]

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()
  2019-10-01 14:40 ` [PATCH v5 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span() David Hildenbrand
  2019-10-01 23:47   ` kbuild test robot
@ 2019-10-02  0:06   ` kbuild test robot
  2019-10-02  7:05     ` David Hildenbrand
  1 sibling, 1 reply; 20+ messages in thread
From: kbuild test robot @ 2019-10-02  0:06 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, linux-sh,
	Aneesh Kumar K . V, David Hildenbrand, linux-kernel, linux-mm,
	kbuild-all, Andrew Morton, linuxppc-dev, Dan Williams,
	linux-arm-kernel, Oscar Salvador

[-- Attachment #1: Type: text/plain, Size: 3001 bytes --]

Hi David,

I love your patch! Perhaps something to improve:

[auto build test WARNING on mmotm/master]

url:    https://github.com/0day-ci/linux/commits/David-Hildenbrand/mm-memory_hotplug-Shrink-zones-before-removing-memory/20191002-054310
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-b002-201939 (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/asm-generic/bug.h:5:0,
                    from arch/x86/include/asm/bug.h:83,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:9,
                    from mm/memory_hotplug.c:9:
   mm/memory_hotplug.c: In function '__remove_zone':
   mm/memory_hotplug.c:471:24: error: 'ZONE_DEVICE' undeclared (first use in this function); did you mean 'ZONE_MOVABLE'?
     if (zone_idx(zone) == ZONE_DEVICE)
                           ^
   include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
    #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                       ^~~~
>> mm/memory_hotplug.c:471:2: note: in expansion of macro 'if'
     if (zone_idx(zone) == ZONE_DEVICE)
     ^~
   mm/memory_hotplug.c:471:24: note: each undeclared identifier is reported only once for each function it appears in
     if (zone_idx(zone) == ZONE_DEVICE)
                           ^
   include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
    #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                                       ^~~~
>> mm/memory_hotplug.c:471:2: note: in expansion of macro 'if'
     if (zone_idx(zone) == ZONE_DEVICE)
     ^~

vim +/if +471 mm/memory_hotplug.c

   459	
   460	static void __remove_zone(struct zone *zone, unsigned long start_pfn,
   461			unsigned long nr_pages)
   462	{
   463		struct pglist_data *pgdat = zone->zone_pgdat;
   464		unsigned long flags;
   465	
   466		/*
   467		 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
   468		 * we will not try to shrink the zones - which is okay as
   469		 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
   470		 */
 > 471		if (zone_idx(zone) == ZONE_DEVICE)
   472			return;
   473	
   474		pgdat_resize_lock(zone->zone_pgdat, &flags);
   475		shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
   476		update_pgdat_span(pgdat);
   477		pgdat_resize_unlock(zone->zone_pgdat, &flags);
   478	}
   479	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30379 bytes --]

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span()
  2019-10-02  0:06   ` kbuild test robot
@ 2019-10-02  7:05     ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2019-10-02  7:05 UTC (permalink / raw)
  To: kbuild test robot
  Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, linux-sh,
	Aneesh Kumar K . V, linux-kernel, linux-mm, kbuild-all,
	Andrew Morton, linuxppc-dev, Dan Williams, linux-arm-kernel,
	Oscar Salvador

On 02.10.19 02:06, kbuild test robot wrote:
> Hi David,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on mmotm/master]
> 
> url:    https://github.com/0day-ci/linux/commits/David-Hildenbrand/mm-memory_hotplug-Shrink-zones-before-removing-memory/20191002-054310
> base:   git://git.cmpxchg.org/linux-mmotm.git master
> config: x86_64-randconfig-b002-201939 (attached as .config)
> compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
> 
> All warnings (new ones prefixed by >>):
> 
>    In file included from include/asm-generic/bug.h:5:0,
>                     from arch/x86/include/asm/bug.h:83,
>                     from include/linux/bug.h:5,
>                     from include/linux/mmdebug.h:5,
>                     from include/linux/mm.h:9,
>                     from mm/memory_hotplug.c:9:
>    mm/memory_hotplug.c: In function '__remove_zone':
>    mm/memory_hotplug.c:471:24: error: 'ZONE_DEVICE' undeclared (first use in this function); did you mean 'ZONE_MOVABLE'?
>      if (zone_idx(zone) == ZONE_DEVICE)
>                            ^
>    include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
>     #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
>                                                        ^~~~
>>> mm/memory_hotplug.c:471:2: note: in expansion of macro 'if'
>      if (zone_idx(zone) == ZONE_DEVICE)
>      ^~
>    mm/memory_hotplug.c:471:24: note: each undeclared identifier is reported only once for each function it appears in
>      if (zone_idx(zone) == ZONE_DEVICE)
>                            ^
>    include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
>     #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
>                                                        ^~~~
>>> mm/memory_hotplug.c:471:2: note: in expansion of macro 'if'
>      if (zone_idx(zone) == ZONE_DEVICE)
>      ^~
> 
> vim +/if +471 mm/memory_hotplug.c
> 
>    459	
>    460	static void __remove_zone(struct zone *zone, unsigned long start_pfn,
>    461			unsigned long nr_pages)
>    462	{
>    463		struct pglist_data *pgdat = zone->zone_pgdat;
>    464		unsigned long flags;
>    465	
>    466		/*
>    467		 * Zone shrinking code cannot properly deal with ZONE_DEVICE. So
>    468		 * we will not try to shrink the zones - which is okay as
>    469		 * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
>    470		 */
>  > 471		if (zone_idx(zone) == ZONE_DEVICE)
>    472			return;
>    473	
>    474		pgdat_resize_lock(zone->zone_pgdat, &flags);
>    475		shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
>    476		update_pgdat_span(pgdat);
>    477		pgdat_resize_unlock(zone->zone_pgdat, &flags);
>    478	}
>    479	
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
> 

That should be easy to fix with some ifdef-ery :)

-- 

Thanks,

David / dhildenb

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 01/10] mm/memunmap: Use the correct start and end pfn when removing pages from zone
  2019-10-01 15:03     ` David Hildenbrand
@ 2019-10-03 16:48       ` Aneesh Kumar K.V
  2019-10-04  9:00         ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Aneesh Kumar K.V @ 2019-10-03 16:48 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel, Dan Williams
  Cc: linux-s390, linux-ia64, Ira Weiny, linux-sh, Jason Gunthorpe,
	Logan Gunthorpe, Pankaj Gupta, linux-mm, Andrew Morton,
	linuxppc-dev, linux-arm-kernel

On 10/1/19 8:33 PM, David Hildenbrand wrote:
> On 01.10.19 16:57, David Hildenbrand wrote:
>> On 01.10.19 16:40, David Hildenbrand wrote:
>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>>
>>> With altmap, all the resource pfns are not initialized. While initializing
>>> pfn, altmap reserve space is skipped. Hence when removing pfn from zone
>>> skip pfns that were never initialized.
>>>
>>> Update memunmap_pages to calculate start and end pfn based on altmap
>>> values. This fixes a kernel crash that is observed when destroying
>>> a namespace.
>>>
>>> [   81.356173] kernel BUG at include/linux/mm.h:1107!
>>> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>>>      pc: c0000000004b9728: memunmap_pages+0x238/0x340
>>>      lr: c0000000004b9724: memunmap_pages+0x234/0x340
>>> ...
>>>      pid   = 3669, comm = ndctl
>>> kernel BUG at include/linux/mm.h:1107!
>>> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
>>> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
>>> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
>>> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
>>> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
>>> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
>>> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
>>> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
>>> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
>>> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
>>> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
>>>
>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>>> Cc: Logan Gunthorpe <logang@deltatee.com>
>>> Cc: Ira Weiny <ira.weiny@intel.com>
>>> Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> [ move all pfn-realted declarations into a single line ]
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>   mm/memremap.c | 13 ++++++++-----
>>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>> index 557e53c6fb46..026788b2ac69 100644
>>> --- a/mm/memremap.c
>>> +++ b/mm/memremap.c
>>> @@ -123,7 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>>>   void memunmap_pages(struct dev_pagemap *pgmap)
>>>   {
>>>   	struct resource *res = &pgmap->res;
>>> -	unsigned long pfn;
>>> +	unsigned long pfn, nr_pages, start_pfn, end_pfn;
>>>   	int nid;
>>>   
>>>   	dev_pagemap_kill(pgmap);
>>> @@ -131,14 +131,17 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>>   		put_page(pfn_to_page(pfn));
>>>   	dev_pagemap_cleanup(pgmap);
>>>   
>>> +	start_pfn = pfn_first(pgmap);
>>> +	end_pfn = pfn_end(pgmap);
>>> +	nr_pages = end_pfn - start_pfn;
>>> +
>>>   	/* pages are dead and unused, undo the arch mapping */
>>> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
>>> +	nid = page_to_nid(pfn_to_page(start_pfn));
>>>   
>>>   	mem_hotplug_begin();
>>>   	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>>> -		pfn = PHYS_PFN(res->start);
>>> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
>>> -				 PHYS_PFN(resource_size(res)), NULL);
>>> +		__remove_pages(page_zone(pfn_to_page(start_pfn)), start_pfn,
>>> +			       nr_pages, NULL);
>>>   	} else {
>>>   		arch_remove_memory(nid, res->start, resource_size(res),
>>>   				pgmap_altmap(pgmap));
>>>
>>
>> Aneesh, I was wondering why the use of "res->start" is correct (and we
>> shouldn't also witch to start_pfn/nr_pages here. It would be good if Dan
>> could review.
>>
> 
> To be more precise, I wonder if it should actually be
> 
> __remove_pages(page_zone(pfn_to_page(start_pfn)), res->start,
>                 resource_size(res))
> 

yes, that would be make it much clear.

But for MEMORY_DEVICE_PRIVATE start_pfn and pfn should be same?


> IOW, keep calling __remove_pages() with the same parameters but read
> nid/zone from the offset one.
> 
> Hope some memunmap_pages() expert can clarify.
> 

-aneesh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 01/10] mm/memunmap: Use the correct start and end pfn when removing pages from zone
  2019-10-03 16:48       ` Aneesh Kumar K.V
@ 2019-10-04  9:00         ` David Hildenbrand
  2019-10-04  9:03           ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2019-10-04  9:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-kernel, Dan Williams
  Cc: linux-s390, linux-ia64, Ira Weiny, linux-sh, Jason Gunthorpe,
	Logan Gunthorpe, Pankaj Gupta, linux-mm, Andrew Morton,
	linuxppc-dev, linux-arm-kernel

On 03.10.19 18:48, Aneesh Kumar K.V wrote:
> On 10/1/19 8:33 PM, David Hildenbrand wrote:
>> On 01.10.19 16:57, David Hildenbrand wrote:
>>> On 01.10.19 16:40, David Hildenbrand wrote:
>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>>>
>>>> With altmap, all the resource pfns are not initialized. While initializing
>>>> pfn, altmap reserve space is skipped. Hence when removing pfn from zone
>>>> skip pfns that were never initialized.
>>>>
>>>> Update memunmap_pages to calculate start and end pfn based on altmap
>>>> values. This fixes a kernel crash that is observed when destroying
>>>> a namespace.
>>>>
>>>> [   81.356173] kernel BUG at include/linux/mm.h:1107!
>>>> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>>>>      pc: c0000000004b9728: memunmap_pages+0x238/0x340
>>>>      lr: c0000000004b9724: memunmap_pages+0x234/0x340
>>>> ...
>>>>      pid   = 3669, comm = ndctl
>>>> kernel BUG at include/linux/mm.h:1107!
>>>> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
>>>> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
>>>> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
>>>> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
>>>> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
>>>> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
>>>> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
>>>> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
>>>> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
>>>> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
>>>> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
>>>>
>>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>>>> Cc: Logan Gunthorpe <logang@deltatee.com>
>>>> Cc: Ira Weiny <ira.weiny@intel.com>
>>>> Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> [ move all pfn-realted declarations into a single line ]
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>   mm/memremap.c | 13 ++++++++-----
>>>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>> index 557e53c6fb46..026788b2ac69 100644
>>>> --- a/mm/memremap.c
>>>> +++ b/mm/memremap.c
>>>> @@ -123,7 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>>>>   void memunmap_pages(struct dev_pagemap *pgmap)
>>>>   {
>>>>   	struct resource *res = &pgmap->res;
>>>> -	unsigned long pfn;
>>>> +	unsigned long pfn, nr_pages, start_pfn, end_pfn;
>>>>   	int nid;
>>>>   
>>>>   	dev_pagemap_kill(pgmap);
>>>> @@ -131,14 +131,17 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>>>   		put_page(pfn_to_page(pfn));
>>>>   	dev_pagemap_cleanup(pgmap);
>>>>   
>>>> +	start_pfn = pfn_first(pgmap);
>>>> +	end_pfn = pfn_end(pgmap);
>>>> +	nr_pages = end_pfn - start_pfn;
>>>> +
>>>>   	/* pages are dead and unused, undo the arch mapping */
>>>> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
>>>> +	nid = page_to_nid(pfn_to_page(start_pfn));
>>>>   
>>>>   	mem_hotplug_begin();
>>>>   	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>>>> -		pfn = PHYS_PFN(res->start);
>>>> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
>>>> -				 PHYS_PFN(resource_size(res)), NULL);
>>>> +		__remove_pages(page_zone(pfn_to_page(start_pfn)), start_pfn,
>>>> +			       nr_pages, NULL);
>>>>   	} else {
>>>>   		arch_remove_memory(nid, res->start, resource_size(res),
>>>>   				pgmap_altmap(pgmap));
>>>>
>>>
>>> Aneesh, I was wondering why the use of "res->start" is correct (and we
>>> shouldn't also witch to start_pfn/nr_pages here. It would be good if Dan
>>> could review.
>>>
>>
>> To be more precise, I wonder if it should actually be
>>
>> __remove_pages(page_zone(pfn_to_page(start_pfn)), res->start,
>>                 resource_size(res))
>>
> 
> yes, that would be make it much clear.
> 
> But for MEMORY_DEVICE_PRIVATE start_pfn and pfn should be same?

Okay, let's recap. We should call add_pages()/__remove_pages()
and arch_add_memory()/arch_remove_memory() with the exact same ranges.

So with PHYS_PFN(res->start) and PHYS_PFN(resource_size(res)

Now, only a subset of the pages gets actually initialized,
meaning the NID and the ZONE we read could be stale.
That, we have to fix.

What about something like this (am I missing something?):

From d77b5c50f86570819a437517a897cc40ed29eefb Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Date: Fri, 27 Sep 2019 16:02:24 +0530
Subject: [PATCH] mm/memunmap: Don't access uninitialized memmap in
 memunmap_pages()

With an altmap, the memmap falling into the reserved altmap space are
not initialized and, therefore, contain a garbage NID and a garbage
zone. Make sure to read the NID/zone from a memmap that was initialzed.

This fixes a kernel crash that is observed when destroying a namespace:

[   81.356173] kernel BUG at include/linux/mm.h:1107!
cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
    pc: c0000000004b9728: memunmap_pages+0x238/0x340
    lr: c0000000004b9724: memunmap_pages+0x234/0x340
...
    pid   = 3669, comm = ndctl
kernel BUG at include/linux/mm.h:1107!
[c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
[c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
[c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
[c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
[c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
[c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
[c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
[c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
[c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
[c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
[c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[ minimze code changes, rephrase description ]
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memremap.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/memremap.c b/mm/memremap.c
index 557e53c6fb46..8b11c0da345c 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -123,6 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
 void memunmap_pages(struct dev_pagemap *pgmap)
 {
 	struct resource *res = &pgmap->res;
+	struct page *first_page;
 	unsigned long pfn;
 	int nid;
 
@@ -131,14 +132,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
 		put_page(pfn_to_page(pfn));
 	dev_pagemap_cleanup(pgmap);
 
+	/* make sure to access a memmap that was actually initialized */
+	first_page = pfn_to_page(pfn_first(pgmap));
+
 	/* pages are dead and unused, undo the arch mapping */
-	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
+	nid = page_to_nid(first_page);
 
 	mem_hotplug_begin();
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-		pfn = PHYS_PFN(res->start);
-		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
-				 PHYS_PFN(resource_size(res)), NULL);
+		__remove_pages(page_zone(first_page), res->start,
+			       resource_size(res), NULL);
 	} else {
 		arch_remove_memory(nid, res->start, resource_size(res),
 				pgmap_altmap(pgmap));
-- 
2.21.0




-- 

Thanks,

David / dhildenb

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 01/10] mm/memunmap: Use the correct start and end pfn when removing pages from zone
  2019-10-04  9:00         ` David Hildenbrand
@ 2019-10-04  9:03           ` David Hildenbrand
  2019-10-04  9:33             ` David Hildenbrand
  2019-10-05  6:13             ` Aneesh Kumar K.V
  0 siblings, 2 replies; 20+ messages in thread
From: David Hildenbrand @ 2019-10-04  9:03 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-kernel, Dan Williams
  Cc: linux-s390, linux-ia64, Ira Weiny, linux-sh, Jason Gunthorpe,
	Logan Gunthorpe, Pankaj Gupta, linux-mm, Andrew Morton,
	linuxppc-dev, linux-arm-kernel

On 04.10.19 11:00, David Hildenbrand wrote:
> On 03.10.19 18:48, Aneesh Kumar K.V wrote:
>> On 10/1/19 8:33 PM, David Hildenbrand wrote:
>>> On 01.10.19 16:57, David Hildenbrand wrote:
>>>> On 01.10.19 16:40, David Hildenbrand wrote:
>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>>>>
>>>>> With altmap, all the resource pfns are not initialized. While initializing
>>>>> pfn, altmap reserve space is skipped. Hence when removing pfn from zone
>>>>> skip pfns that were never initialized.
>>>>>
>>>>> Update memunmap_pages to calculate start and end pfn based on altmap
>>>>> values. This fixes a kernel crash that is observed when destroying
>>>>> a namespace.
>>>>>
>>>>> [   81.356173] kernel BUG at include/linux/mm.h:1107!
>>>>> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>>>>>      pc: c0000000004b9728: memunmap_pages+0x238/0x340
>>>>>      lr: c0000000004b9724: memunmap_pages+0x234/0x340
>>>>> ...
>>>>>      pid   = 3669, comm = ndctl
>>>>> kernel BUG at include/linux/mm.h:1107!
>>>>> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
>>>>> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
>>>>> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
>>>>> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
>>>>> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
>>>>> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
>>>>> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
>>>>> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
>>>>> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
>>>>> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
>>>>> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
>>>>>
>>>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>>>>> Cc: Logan Gunthorpe <logang@deltatee.com>
>>>>> Cc: Ira Weiny <ira.weiny@intel.com>
>>>>> Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>>> [ move all pfn-realted declarations into a single line ]
>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>> ---
>>>>>   mm/memremap.c | 13 ++++++++-----
>>>>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>> index 557e53c6fb46..026788b2ac69 100644
>>>>> --- a/mm/memremap.c
>>>>> +++ b/mm/memremap.c
>>>>> @@ -123,7 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>>>>>   void memunmap_pages(struct dev_pagemap *pgmap)
>>>>>   {
>>>>>   	struct resource *res = &pgmap->res;
>>>>> -	unsigned long pfn;
>>>>> +	unsigned long pfn, nr_pages, start_pfn, end_pfn;
>>>>>   	int nid;
>>>>>   
>>>>>   	dev_pagemap_kill(pgmap);
>>>>> @@ -131,14 +131,17 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>>>>   		put_page(pfn_to_page(pfn));
>>>>>   	dev_pagemap_cleanup(pgmap);
>>>>>   
>>>>> +	start_pfn = pfn_first(pgmap);
>>>>> +	end_pfn = pfn_end(pgmap);
>>>>> +	nr_pages = end_pfn - start_pfn;
>>>>> +
>>>>>   	/* pages are dead and unused, undo the arch mapping */
>>>>> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
>>>>> +	nid = page_to_nid(pfn_to_page(start_pfn));
>>>>>   
>>>>>   	mem_hotplug_begin();
>>>>>   	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>>>>> -		pfn = PHYS_PFN(res->start);
>>>>> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
>>>>> -				 PHYS_PFN(resource_size(res)), NULL);
>>>>> +		__remove_pages(page_zone(pfn_to_page(start_pfn)), start_pfn,
>>>>> +			       nr_pages, NULL);
>>>>>   	} else {
>>>>>   		arch_remove_memory(nid, res->start, resource_size(res),
>>>>>   				pgmap_altmap(pgmap));
>>>>>
>>>>
>>>> Aneesh, I was wondering why the use of "res->start" is correct (and we
>>>> shouldn't also witch to start_pfn/nr_pages here. It would be good if Dan
>>>> could review.
>>>>
>>>
>>> To be more precise, I wonder if it should actually be
>>>
>>> __remove_pages(page_zone(pfn_to_page(start_pfn)), res->start,
>>>                 resource_size(res))
>>>
>>
>> yes, that would be make it much clear.
>>
>> But for MEMORY_DEVICE_PRIVATE start_pfn and pfn should be same?
> 
> Okay, let's recap. We should call add_pages()/__remove_pages()
> and arch_add_memory()/arch_remove_memory() with the exact same ranges.
> 
> So with PHYS_PFN(res->start) and PHYS_PFN(resource_size(res)
> 
> Now, only a subset of the pages gets actually initialized,
> meaning the NID and the ZONE we read could be stale.
> That, we have to fix.
> 
> What about something like this (am I missing something?):
> 
> From d77b5c50f86570819a437517a897cc40ed29eefb Mon Sep 17 00:00:00 2001
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Date: Fri, 27 Sep 2019 16:02:24 +0530
> Subject: [PATCH] mm/memunmap: Don't access uninitialized memmap in
>  memunmap_pages()
> 
> With an altmap, the memmap falling into the reserved altmap space are
> not initialized and, therefore, contain a garbage NID and a garbage
> zone. Make sure to read the NID/zone from a memmap that was initialzed.
> 
> This fixes a kernel crash that is observed when destroying a namespace:
> 
> [   81.356173] kernel BUG at include/linux/mm.h:1107!
> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>     pc: c0000000004b9728: memunmap_pages+0x238/0x340
>     lr: c0000000004b9724: memunmap_pages+0x234/0x340
> ...
>     pid   = 3669, comm = ndctl
> kernel BUG at include/linux/mm.h:1107!
> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> [ minimze code changes, rephrase description ]
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memremap.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 557e53c6fb46..8b11c0da345c 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -123,6 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>  void memunmap_pages(struct dev_pagemap *pgmap)
>  {
>  	struct resource *res = &pgmap->res;
> +	struct page *first_page;
>  	unsigned long pfn;
>  	int nid;
>  
> @@ -131,14 +132,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>  		put_page(pfn_to_page(pfn));
>  	dev_pagemap_cleanup(pgmap);
>  
> +	/* make sure to access a memmap that was actually initialized */
> +	first_page = pfn_to_page(pfn_first(pgmap));
> +
>  	/* pages are dead and unused, undo the arch mapping */
> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
> +	nid = page_to_nid(first_page);
>  
>  	mem_hotplug_begin();
>  	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> -		pfn = PHYS_PFN(res->start);
> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
> -				 PHYS_PFN(resource_size(res)), NULL);
> +		__remove_pages(page_zone(first_page), res->start,
> +			       resource_size(res), NULL);

Keeping the PHYS_PFN() calls of course ...

>  	} else {
>  		arch_remove_memory(nid, res->start, resource_size(res),
>  				pgmap_altmap(pgmap));
> 


-- 

Thanks,

David / dhildenb

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 01/10] mm/memunmap: Use the correct start and end pfn when removing pages from zone
  2019-10-04  9:03           ` David Hildenbrand
@ 2019-10-04  9:33             ` David Hildenbrand
  2019-10-05  6:13             ` Aneesh Kumar K.V
  1 sibling, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2019-10-04  9:33 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-kernel, Dan Williams
  Cc: linux-s390, linux-ia64, Ira Weiny, linux-sh, Jason Gunthorpe,
	Logan Gunthorpe, Pankaj Gupta, linux-mm, Andrew Morton,
	linuxppc-dev, linux-arm-kernel

On 04.10.19 11:03, David Hildenbrand wrote:
> On 04.10.19 11:00, David Hildenbrand wrote:
>> On 03.10.19 18:48, Aneesh Kumar K.V wrote:
>>> On 10/1/19 8:33 PM, David Hildenbrand wrote:
>>>> On 01.10.19 16:57, David Hildenbrand wrote:
>>>>> On 01.10.19 16:40, David Hildenbrand wrote:
>>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>>>>>
>>>>>> With altmap, all the resource pfns are not initialized. While initializing
>>>>>> pfn, altmap reserve space is skipped. Hence when removing pfn from zone
>>>>>> skip pfns that were never initialized.
>>>>>>
>>>>>> Update memunmap_pages to calculate start and end pfn based on altmap
>>>>>> values. This fixes a kernel crash that is observed when destroying
>>>>>> a namespace.
>>>>>>
>>>>>> [   81.356173] kernel BUG at include/linux/mm.h:1107!
>>>>>> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>>>>>>      pc: c0000000004b9728: memunmap_pages+0x238/0x340
>>>>>>      lr: c0000000004b9724: memunmap_pages+0x234/0x340
>>>>>> ...
>>>>>>      pid   = 3669, comm = ndctl
>>>>>> kernel BUG at include/linux/mm.h:1107!
>>>>>> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
>>>>>> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
>>>>>> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
>>>>>> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
>>>>>> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
>>>>>> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
>>>>>> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
>>>>>> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
>>>>>> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
>>>>>> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
>>>>>> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
>>>>>>
>>>>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>>>>>> Cc: Logan Gunthorpe <logang@deltatee.com>
>>>>>> Cc: Ira Weiny <ira.weiny@intel.com>
>>>>>> Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
>>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>>>> [ move all pfn-realted declarations into a single line ]
>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>> ---
>>>>>>   mm/memremap.c | 13 ++++++++-----
>>>>>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>> index 557e53c6fb46..026788b2ac69 100644
>>>>>> --- a/mm/memremap.c
>>>>>> +++ b/mm/memremap.c
>>>>>> @@ -123,7 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>>>>>>   void memunmap_pages(struct dev_pagemap *pgmap)
>>>>>>   {
>>>>>>   	struct resource *res = &pgmap->res;
>>>>>> -	unsigned long pfn;
>>>>>> +	unsigned long pfn, nr_pages, start_pfn, end_pfn;
>>>>>>   	int nid;
>>>>>>   
>>>>>>   	dev_pagemap_kill(pgmap);
>>>>>> @@ -131,14 +131,17 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>>>>>   		put_page(pfn_to_page(pfn));
>>>>>>   	dev_pagemap_cleanup(pgmap);
>>>>>>   
>>>>>> +	start_pfn = pfn_first(pgmap);
>>>>>> +	end_pfn = pfn_end(pgmap);
>>>>>> +	nr_pages = end_pfn - start_pfn;
>>>>>> +
>>>>>>   	/* pages are dead and unused, undo the arch mapping */
>>>>>> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
>>>>>> +	nid = page_to_nid(pfn_to_page(start_pfn));
>>>>>>   
>>>>>>   	mem_hotplug_begin();
>>>>>>   	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>>>>>> -		pfn = PHYS_PFN(res->start);
>>>>>> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
>>>>>> -				 PHYS_PFN(resource_size(res)), NULL);
>>>>>> +		__remove_pages(page_zone(pfn_to_page(start_pfn)), start_pfn,
>>>>>> +			       nr_pages, NULL);
>>>>>>   	} else {
>>>>>>   		arch_remove_memory(nid, res->start, resource_size(res),
>>>>>>   				pgmap_altmap(pgmap));
>>>>>>
>>>>>
>>>>> Aneesh, I was wondering why the use of "res->start" is correct (and we
>>>>> shouldn't also witch to start_pfn/nr_pages here. It would be good if Dan
>>>>> could review.
>>>>>
>>>>
>>>> To be more precise, I wonder if it should actually be
>>>>
>>>> __remove_pages(page_zone(pfn_to_page(start_pfn)), res->start,
>>>>                 resource_size(res))
>>>>
>>>
>>> yes, that would be make it much clear.
>>>
>>> But for MEMORY_DEVICE_PRIVATE start_pfn and pfn should be same?
>>
>> Okay, let's recap. We should call add_pages()/__remove_pages()
>> and arch_add_memory()/arch_remove_memory() with the exact same ranges.
>>
>> So with PHYS_PFN(res->start) and PHYS_PFN(resource_size(res)
>>
>> Now, only a subset of the pages gets actually initialized,
>> meaning the NID and the ZONE we read could be stale.
>> That, we have to fix.
>>
>> What about something like this (am I missing something?):
>>
>> From d77b5c50f86570819a437517a897cc40ed29eefb Mon Sep 17 00:00:00 2001
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>> Date: Fri, 27 Sep 2019 16:02:24 +0530
>> Subject: [PATCH] mm/memunmap: Don't access uninitialized memmap in
>>  memunmap_pages()
>>
>> With an altmap, the memmap falling into the reserved altmap space are
>> not initialized and, therefore, contain a garbage NID and a garbage
>> zone. Make sure to read the NID/zone from a memmap that was initialzed.
>>
>> This fixes a kernel crash that is observed when destroying a namespace:
>>
>> [   81.356173] kernel BUG at include/linux/mm.h:1107!
>> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>>     pc: c0000000004b9728: memunmap_pages+0x238/0x340
>>     lr: c0000000004b9724: memunmap_pages+0x234/0x340
>> ...
>>     pid   = 3669, comm = ndctl
>> kernel BUG at include/linux/mm.h:1107!
>> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
>> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
>> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
>> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
>> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
>> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
>> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
>> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
>> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
>> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
>> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
>>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>> Cc: Logan Gunthorpe <logang@deltatee.com>
>> Cc: Ira Weiny <ira.weiny@intel.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> [ minimze code changes, rephrase description ]
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  mm/memremap.c | 11 +++++++----
>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memremap.c b/mm/memremap.c
>> index 557e53c6fb46..8b11c0da345c 100644
>> --- a/mm/memremap.c
>> +++ b/mm/memremap.c
>> @@ -123,6 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>>  void memunmap_pages(struct dev_pagemap *pgmap)
>>  {
>>  	struct resource *res = &pgmap->res;
>> +	struct page *first_page;
>>  	unsigned long pfn;
>>  	int nid;
>>  
>> @@ -131,14 +132,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>  		put_page(pfn_to_page(pfn));
>>  	dev_pagemap_cleanup(pgmap);
>>  
>> +	/* make sure to access a memmap that was actually initialized */
>> +	first_page = pfn_to_page(pfn_first(pgmap));
>> +
>>  	/* pages are dead and unused, undo the arch mapping */
>> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
>> +	nid = page_to_nid(first_page);
>>  
>>  	mem_hotplug_begin();
>>  	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>> -		pfn = PHYS_PFN(res->start);
>> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
>> -				 PHYS_PFN(resource_size(res)), NULL);
>> +		__remove_pages(page_zone(first_page), res->start,
>> +			       resource_size(res), NULL);
> 
> Keeping the PHYS_PFN() calls of course ...
> 
>>  	} else {
>>  		arch_remove_memory(nid, res->start, resource_size(res),
>>  				pgmap_altmap(pgmap));
>>
> 
> 

The current state (with the modified patch) can be found at:
	https://github.com/davidhildenbrand/linux/tree/zone_offline

@Aneesh, It would be great if you could test the namespace removal
thingy and tell me if we are still missing something :)

-- 

Thanks,

David / dhildenb

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 01/10] mm/memunmap: Use the correct start and end pfn when removing pages from zone
  2019-10-04  9:03           ` David Hildenbrand
  2019-10-04  9:33             ` David Hildenbrand
@ 2019-10-05  6:13             ` Aneesh Kumar K.V
  2019-10-06  8:13               ` David Hildenbrand
  1 sibling, 1 reply; 20+ messages in thread
From: Aneesh Kumar K.V @ 2019-10-05  6:13 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel, Dan Williams
  Cc: linux-s390, linux-ia64, Ira Weiny, linux-sh, Jason Gunthorpe,
	Logan Gunthorpe, Pankaj Gupta, linux-mm, Andrew Morton,
	linuxppc-dev, linux-arm-kernel

On 10/4/19 2:33 PM, David Hildenbrand wrote:
> On 04.10.19 11:00, David Hildenbrand wrote:
>> On 03.10.19 18:48, Aneesh Kumar K.V wrote:
>>> On 10/1/19 8:33 PM, David Hildenbrand wrote:
>>>> On 01.10.19 16:57, David Hildenbrand wrote:
>>>>> On 01.10.19 16:40, David Hildenbrand wrote:
>>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>>>>>
>>>>>> With altmap, all the resource pfns are not initialized. While initializing
>>>>>> pfn, altmap reserve space is skipped. Hence when removing pfn from zone
>>>>>> skip pfns that were never initialized.
>>>>>>
>>>>>> Update memunmap_pages to calculate start and end pfn based on altmap
>>>>>> values. This fixes a kernel crash that is observed when destroying
>>>>>> a namespace.
>>>>>>
>>>>>> [   81.356173] kernel BUG at include/linux/mm.h:1107!
>>>>>> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>>>>>>       pc: c0000000004b9728: memunmap_pages+0x238/0x340
>>>>>>       lr: c0000000004b9724: memunmap_pages+0x234/0x340
>>>>>> ...
>>>>>>       pid   = 3669, comm = ndctl
>>>>>> kernel BUG at include/linux/mm.h:1107!
>>>>>> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
>>>>>> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
>>>>>> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
>>>>>> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
>>>>>> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
>>>>>> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
>>>>>> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
>>>>>> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
>>>>>> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
>>>>>> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
>>>>>> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
>>>>>>
>>>>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>>>>>> Cc: Logan Gunthorpe <logang@deltatee.com>
>>>>>> Cc: Ira Weiny <ira.weiny@intel.com>
>>>>>> Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
>>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>>>> [ move all pfn-realted declarations into a single line ]
>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>> ---
>>>>>>    mm/memremap.c | 13 ++++++++-----
>>>>>>    1 file changed, 8 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>> index 557e53c6fb46..026788b2ac69 100644
>>>>>> --- a/mm/memremap.c
>>>>>> +++ b/mm/memremap.c
>>>>>> @@ -123,7 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>>>>>>    void memunmap_pages(struct dev_pagemap *pgmap)
>>>>>>    {
>>>>>>    	struct resource *res = &pgmap->res;
>>>>>> -	unsigned long pfn;
>>>>>> +	unsigned long pfn, nr_pages, start_pfn, end_pfn;
>>>>>>    	int nid;
>>>>>>    
>>>>>>    	dev_pagemap_kill(pgmap);
>>>>>> @@ -131,14 +131,17 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>>>>>    		put_page(pfn_to_page(pfn));
>>>>>>    	dev_pagemap_cleanup(pgmap);
>>>>>>    
>>>>>> +	start_pfn = pfn_first(pgmap);
>>>>>> +	end_pfn = pfn_end(pgmap);
>>>>>> +	nr_pages = end_pfn - start_pfn;
>>>>>> +
>>>>>>    	/* pages are dead and unused, undo the arch mapping */
>>>>>> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
>>>>>> +	nid = page_to_nid(pfn_to_page(start_pfn));
>>>>>>    
>>>>>>    	mem_hotplug_begin();
>>>>>>    	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>>>>>> -		pfn = PHYS_PFN(res->start);
>>>>>> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
>>>>>> -				 PHYS_PFN(resource_size(res)), NULL);
>>>>>> +		__remove_pages(page_zone(pfn_to_page(start_pfn)), start_pfn,
>>>>>> +			       nr_pages, NULL);
>>>>>>    	} else {
>>>>>>    		arch_remove_memory(nid, res->start, resource_size(res),
>>>>>>    				pgmap_altmap(pgmap));
>>>>>>
>>>>>
>>>>> Aneesh, I was wondering why the use of "res->start" is correct (and we
>>>>> shouldn't also witch to start_pfn/nr_pages here. It would be good if Dan
>>>>> could review.
>>>>>
>>>>
>>>> To be more precise, I wonder if it should actually be
>>>>
>>>> __remove_pages(page_zone(pfn_to_page(start_pfn)), res->start,
>>>>                  resource_size(res))
>>>>
>>>
>>> yes, that would be make it much clear.
>>>
>>> But for MEMORY_DEVICE_PRIVATE start_pfn and pfn should be same?
>>
>> Okay, let's recap. We should call add_pages()/__remove_pages()
>> and arch_add_memory()/arch_remove_memory() with the exact same ranges.
>>
>> So with PHYS_PFN(res->start) and PHYS_PFN(resource_size(res)
>>
>> Now, only a subset of the pages gets actually initialized,
>> meaning the NID and the ZONE we read could be stale.
>> That, we have to fix.
>>
>> What about something like this (am I missing something?):
>>
>>  From d77b5c50f86570819a437517a897cc40ed29eefb Mon Sep 17 00:00:00 2001
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>> Date: Fri, 27 Sep 2019 16:02:24 +0530
>> Subject: [PATCH] mm/memunmap: Don't access uninitialized memmap in
>>   memunmap_pages()
>>
>> With an altmap, the memmap falling into the reserved altmap space are
>> not initialized and, therefore, contain a garbage NID and a garbage
>> zone. Make sure to read the NID/zone from a memmap that was initialzed.
>>
>> This fixes a kernel crash that is observed when destroying a namespace:
>>
>> [   81.356173] kernel BUG at include/linux/mm.h:1107!
>> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>>      pc: c0000000004b9728: memunmap_pages+0x238/0x340
>>      lr: c0000000004b9724: memunmap_pages+0x234/0x340
>> ...
>>      pid   = 3669, comm = ndctl
>> kernel BUG at include/linux/mm.h:1107!
>> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
>> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
>> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
>> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
>> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
>> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
>> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
>> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
>> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
>> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
>> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
>>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>> Cc: Logan Gunthorpe <logang@deltatee.com>
>> Cc: Ira Weiny <ira.weiny@intel.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> [ minimze code changes, rephrase description ]
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   mm/memremap.c | 11 +++++++----
>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memremap.c b/mm/memremap.c
>> index 557e53c6fb46..8b11c0da345c 100644
>> --- a/mm/memremap.c
>> +++ b/mm/memremap.c
>> @@ -123,6 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>>   void memunmap_pages(struct dev_pagemap *pgmap)
>>   {
>>   	struct resource *res = &pgmap->res;
>> +	struct page *first_page;
>>   	unsigned long pfn;
>>   	int nid;
>>   
>> @@ -131,14 +132,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>   		put_page(pfn_to_page(pfn));
>>   	dev_pagemap_cleanup(pgmap);
>>   
>> +	/* make sure to access a memmap that was actually initialized */
>> +	first_page = pfn_to_page(pfn_first(pgmap));
>> +
>>   	/* pages are dead and unused, undo the arch mapping */
>> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
>> +	nid = page_to_nid(first_page);
>>   
>>   	mem_hotplug_begin();
>>   	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>> -		pfn = PHYS_PFN(res->start);
>> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
>> -				 PHYS_PFN(resource_size(res)), NULL);
>> +		__remove_pages(page_zone(first_page), res->start,
>> +			       resource_size(res), NULL);
> 
> Keeping the PHYS_PFN() calls of course ...
> 


That is not different from what I posted right?  For MEMORY_DEVICE_PRIVATE


start_pfn  = PHYS_PFN(rest->start)


>>   	} else {
>>   		arch_remove_memory(nid, res->start, resource_size(res),
>>   				pgmap_altmap(pgmap));
>>
> 
> 

-aneesh


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 01/10] mm/memunmap: Use the correct start and end pfn when removing pages from zone
  2019-10-05  6:13             ` Aneesh Kumar K.V
@ 2019-10-06  8:13               ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2019-10-06  8:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-kernel, Dan Williams
  Cc: linux-s390, linux-ia64, Ira Weiny, linux-sh, Jason Gunthorpe,
	Logan Gunthorpe, Pankaj Gupta, linux-mm, Andrew Morton,
	linuxppc-dev, linux-arm-kernel

On 05.10.19 08:13, Aneesh Kumar K.V wrote:
> On 10/4/19 2:33 PM, David Hildenbrand wrote:
>> On 04.10.19 11:00, David Hildenbrand wrote:
>>> On 03.10.19 18:48, Aneesh Kumar K.V wrote:
>>>> On 10/1/19 8:33 PM, David Hildenbrand wrote:
>>>>> On 01.10.19 16:57, David Hildenbrand wrote:
>>>>>> On 01.10.19 16:40, David Hildenbrand wrote:
>>>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>>>>>>
>>>>>>> With altmap, all the resource pfns are not initialized. While initializing
>>>>>>> pfn, altmap reserve space is skipped. Hence when removing pfn from zone
>>>>>>> skip pfns that were never initialized.
>>>>>>>
>>>>>>> Update memunmap_pages to calculate start and end pfn based on altmap
>>>>>>> values. This fixes a kernel crash that is observed when destroying
>>>>>>> a namespace.
>>>>>>>
>>>>>>> [   81.356173] kernel BUG at include/linux/mm.h:1107!
>>>>>>> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>>>>>>>       pc: c0000000004b9728: memunmap_pages+0x238/0x340
>>>>>>>       lr: c0000000004b9724: memunmap_pages+0x234/0x340
>>>>>>> ...
>>>>>>>       pid   = 3669, comm = ndctl
>>>>>>> kernel BUG at include/linux/mm.h:1107!
>>>>>>> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
>>>>>>> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
>>>>>>> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
>>>>>>> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
>>>>>>> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
>>>>>>> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
>>>>>>> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
>>>>>>> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
>>>>>>> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
>>>>>>> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
>>>>>>> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
>>>>>>>
>>>>>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>>>>>>> Cc: Logan Gunthorpe <logang@deltatee.com>
>>>>>>> Cc: Ira Weiny <ira.weiny@intel.com>
>>>>>>> Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
>>>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>>>>> [ move all pfn-realted declarations into a single line ]
>>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>>> ---
>>>>>>>    mm/memremap.c | 13 ++++++++-----
>>>>>>>    1 file changed, 8 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>>> index 557e53c6fb46..026788b2ac69 100644
>>>>>>> --- a/mm/memremap.c
>>>>>>> +++ b/mm/memremap.c
>>>>>>> @@ -123,7 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>>>>>>>    void memunmap_pages(struct dev_pagemap *pgmap)
>>>>>>>    {
>>>>>>>    	struct resource *res = &pgmap->res;
>>>>>>> -	unsigned long pfn;
>>>>>>> +	unsigned long pfn, nr_pages, start_pfn, end_pfn;
>>>>>>>    	int nid;
>>>>>>>    
>>>>>>>    	dev_pagemap_kill(pgmap);
>>>>>>> @@ -131,14 +131,17 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>>>>>>    		put_page(pfn_to_page(pfn));
>>>>>>>    	dev_pagemap_cleanup(pgmap);
>>>>>>>    
>>>>>>> +	start_pfn = pfn_first(pgmap);
>>>>>>> +	end_pfn = pfn_end(pgmap);
>>>>>>> +	nr_pages = end_pfn - start_pfn;
>>>>>>> +
>>>>>>>    	/* pages are dead and unused, undo the arch mapping */
>>>>>>> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
>>>>>>> +	nid = page_to_nid(pfn_to_page(start_pfn));
>>>>>>>    
>>>>>>>    	mem_hotplug_begin();
>>>>>>>    	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>>>>>>> -		pfn = PHYS_PFN(res->start);
>>>>>>> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
>>>>>>> -				 PHYS_PFN(resource_size(res)), NULL);
>>>>>>> +		__remove_pages(page_zone(pfn_to_page(start_pfn)), start_pfn,
>>>>>>> +			       nr_pages, NULL);
>>>>>>>    	} else {
>>>>>>>    		arch_remove_memory(nid, res->start, resource_size(res),
>>>>>>>    				pgmap_altmap(pgmap));
>>>>>>>
>>>>>>
>>>>>> Aneesh, I was wondering why the use of "res->start" is correct (and we
>>>>>> shouldn't also witch to start_pfn/nr_pages here. It would be good if Dan
>>>>>> could review.
>>>>>>
>>>>>
>>>>> To be more precise, I wonder if it should actually be
>>>>>
>>>>> __remove_pages(page_zone(pfn_to_page(start_pfn)), res->start,
>>>>>                  resource_size(res))
>>>>>
>>>>
>>>> yes, that would be make it much clear.
>>>>
>>>> But for MEMORY_DEVICE_PRIVATE start_pfn and pfn should be same?
>>>
>>> Okay, let's recap. We should call add_pages()/__remove_pages()
>>> and arch_add_memory()/arch_remove_memory() with the exact same ranges.
>>>
>>> So with PHYS_PFN(res->start) and PHYS_PFN(resource_size(res)
>>>
>>> Now, only a subset of the pages gets actually initialized,
>>> meaning the NID and the ZONE we read could be stale.
>>> That, we have to fix.
>>>
>>> What about something like this (am I missing something?):
>>>
>>>  From d77b5c50f86570819a437517a897cc40ed29eefb Mon Sep 17 00:00:00 2001
>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>> Date: Fri, 27 Sep 2019 16:02:24 +0530
>>> Subject: [PATCH] mm/memunmap: Don't access uninitialized memmap in
>>>   memunmap_pages()
>>>
>>> With an altmap, the memmap falling into the reserved altmap space are
>>> not initialized and, therefore, contain a garbage NID and a garbage
>>> zone. Make sure to read the NID/zone from a memmap that was initialzed.
>>>
>>> This fixes a kernel crash that is observed when destroying a namespace:
>>>
>>> [   81.356173] kernel BUG at include/linux/mm.h:1107!
>>> cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
>>>      pc: c0000000004b9728: memunmap_pages+0x238/0x340
>>>      lr: c0000000004b9724: memunmap_pages+0x234/0x340
>>> ...
>>>      pid   = 3669, comm = ndctl
>>> kernel BUG at include/linux/mm.h:1107!
>>> [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
>>> [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
>>> [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
>>> [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
>>> [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
>>> [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
>>> [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
>>> [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
>>> [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
>>> [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
>>> [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
>>>
>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>>> Cc: Logan Gunthorpe <logang@deltatee.com>
>>> Cc: Ira Weiny <ira.weiny@intel.com>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> [ minimze code changes, rephrase description ]
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>   mm/memremap.c | 11 +++++++----
>>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>> index 557e53c6fb46..8b11c0da345c 100644
>>> --- a/mm/memremap.c
>>> +++ b/mm/memremap.c
>>> @@ -123,6 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
>>>   void memunmap_pages(struct dev_pagemap *pgmap)
>>>   {
>>>   	struct resource *res = &pgmap->res;
>>> +	struct page *first_page;
>>>   	unsigned long pfn;
>>>   	int nid;
>>>   
>>> @@ -131,14 +132,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>>>   		put_page(pfn_to_page(pfn));
>>>   	dev_pagemap_cleanup(pgmap);
>>>   
>>> +	/* make sure to access a memmap that was actually initialized */
>>> +	first_page = pfn_to_page(pfn_first(pgmap));
>>> +
>>>   	/* pages are dead and unused, undo the arch mapping */
>>> -	nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
>>> +	nid = page_to_nid(first_page);
>>>   
>>>   	mem_hotplug_begin();
>>>   	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>>> -		pfn = PHYS_PFN(res->start);
>>> -		__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
>>> -				 PHYS_PFN(resource_size(res)), NULL);
>>> +		__remove_pages(page_zone(first_page), res->start,
>>> +			       resource_size(res), NULL);
>>
>> Keeping the PHYS_PFN() calls of course ...
>>
> 
> 
> That is not different from what I posted right?  For MEMORY_DEVICE_PRIVATE

I'll go with this modified patch for now, which does not change the ranges
passed to  __remove_pages(), because that looked like a unrelated change to me.
__remove_ages() has to be called with the same range as add_pages().


commit f66c2fa09293ae0f99afc4363146941579152409 (HEAD)
Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Date:   Fri Sep 27 16:02:24 2019 +0530

    mm/memunmap: Don't access uninitialized memmap in memunmap_pages()
    
    With an altmap, the memmap falling into the reserved altmap space are
    not initialized and, therefore, contain a garbage NID and a garbage
    zone. Make sure to read the NID/zone from a memmap that was initialzed.
    
    This fixes a kernel crash that is observed when destroying a namespace:
    
    [   81.356173] kernel BUG at include/linux/mm.h:1107!
    cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
        pc: c0000000004b9728: memunmap_pages+0x238/0x340
        lr: c0000000004b9724: memunmap_pages+0x234/0x340
    ...
        pid   = 3669, comm = ndctl
    kernel BUG at include/linux/mm.h:1107!
    [c000000274087ba0] c0000000009e3500 devm_action_release+0x30/0x50
    [c000000274087bc0] c0000000009e4758 release_nodes+0x268/0x2d0
    [c000000274087c30] c0000000009dd144 device_release_driver_internal+0x174/0x240
    [c000000274087c70] c0000000009d9dfc unbind_store+0x13c/0x190
    [c000000274087cb0] c0000000009d8a24 drv_attr_store+0x44/0x60
    [c000000274087cd0] c0000000005a7470 sysfs_kf_write+0x70/0xa0
    [c000000274087d10] c0000000005a5cac kernfs_fop_write+0x1ac/0x290
    [c000000274087d60] c0000000004be45c __vfs_write+0x3c/0x70
    [c000000274087d80] c0000000004c26e4 vfs_write+0xe4/0x200
    [c000000274087dd0] c0000000004c2a6c ksys_write+0x7c/0x140
    [c000000274087e20] c00000000000bbd0 system_call+0x5c/0x68
    
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Logan Gunthorpe <logang@deltatee.com>
    Cc: Ira Weiny <ira.weiny@intel.com>
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    [ minimze code changes, rephrase description ]
    Signed-off-by: David Hildenbrand <david@redhat.com>

diff --git a/mm/memremap.c b/mm/memremap.c
index 557e53c6fb46..8c2fb44c3b4d 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -123,6 +123,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap)
 void memunmap_pages(struct dev_pagemap *pgmap)
 {
        struct resource *res = &pgmap->res;
+       struct page *first_page;
        unsigned long pfn;
        int nid;
 
@@ -131,14 +132,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
                put_page(pfn_to_page(pfn));
        dev_pagemap_cleanup(pgmap);
 
+       /* make sure to access a memmap that was actually initialized */
+       first_page = pfn_to_page(pfn_first(pgmap));
+
        /* pages are dead and unused, undo the arch mapping */
-       nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start)));
+       nid = page_to_nid(first_page);
 
        mem_hotplug_begin();
        if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
-               pfn = PHYS_PFN(res->start);
-               __remove_pages(page_zone(pfn_to_page(pfn)), pfn,
-                                PHYS_PFN(resource_size(res)), NULL);
+               __remove_pages(page_zone(first_page), PHYS_PFN(res->start),
+                              PHYS_PFN(resource_size(res)), NULL);
        } else {
                arch_remove_memory(nid, res->start, resource_size(res),
                                pgmap_altmap(pgmap));



As Andrew wants to give this some testing, I'll send out a new version
shortly, then we can discuss there.


-- 

Thanks,

David / dhildenb

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-10-06  8:13 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20191001144011.3801-1-david@redhat.com>
2019-10-01 14:40 ` [PATCH v5 01/10] mm/memunmap: Use the correct start and end pfn when removing pages from zone David Hildenbrand
2019-10-01 14:57   ` David Hildenbrand
2019-10-01 15:03     ` David Hildenbrand
2019-10-03 16:48       ` Aneesh Kumar K.V
2019-10-04  9:00         ` David Hildenbrand
2019-10-04  9:03           ` David Hildenbrand
2019-10-04  9:33             ` David Hildenbrand
2019-10-05  6:13             ` Aneesh Kumar K.V
2019-10-06  8:13               ` David Hildenbrand
2019-10-01 14:40 ` [PATCH v5 02/10] mm/memmap_init: Update variable name in memmap_init_zone David Hildenbrand
2019-10-01 14:40 ` [PATCH v5 03/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_pgdat_span() David Hildenbrand
2019-10-01 14:40 ` [PATCH v5 04/10] mm/memory_hotplug: Don't access uninitialized memmaps in shrink_zone_span() David Hildenbrand
2019-10-01 23:47   ` kbuild test robot
2019-10-02  0:06   ` kbuild test robot
2019-10-02  7:05     ` David Hildenbrand
2019-10-01 14:40 ` [PATCH v5 06/10] mm/memory_hotplug: Poison memmap in remove_pfn_range_from_zone() David Hildenbrand
2019-10-01 14:40 ` [PATCH v5 07/10] mm/memory_hotplug: We always have a zone in find_(smallest|biggest)_section_pfn David Hildenbrand
2019-10-01 14:40 ` [PATCH v5 08/10] mm/memory_hotplug: Don't check for "all holes" in shrink_zone_span() David Hildenbrand
2019-10-01 14:40 ` [PATCH v5 09/10] mm/memory_hotplug: Drop local variables " David Hildenbrand
2019-10-01 14:40 ` [PATCH v5 10/10] mm/memory_hotplug: Cleanup __remove_pages() David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).