linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@techadventures.net>
To: akpm@linux-foundation.org
Cc: mhocko@suse.com, dan.j.williams@intel.com,
	yasu.isimatu@gmail.com, rppt@linux.vnet.ibm.com,
	malat@debian.org, linux-kernel@vger.kernel.org,
	pavel.tatashin@microsoft.com, jglisse@redhat.com,
	Jonathan.Cameron@huawei.com, rafael@kernel.org, david@redhat.com,
	dave.jiang@intel.com, linux-mm@kvack.org,
	alexander.h.duyck@linux.intel.com,
	Oscar Salvador <osalvador@suse.de>
Subject: [PATCH 2/5] mm/memory_hotplug: Create add/del_device_memory functions
Date: Mon, 15 Oct 2018 17:30:31 +0200	[thread overview]
Message-ID: <20181015153034.32203-3-osalvador@techadventures.net> (raw)
In-Reply-To: <20181015153034.32203-1-osalvador@techadventures.net>

From: Oscar Salvador <osalvador@suse.de>

HMM/devm have a particular handling of memory-hotplug.
They do not go through the common path, and so, they do not
call either offline_pages() or online_pages().

The operations they perform are the following ones:

1) Create the linear mapping in case the memory is not private
2) Initialize the pages and add the sections
3) Move the pages to ZONE_DEVICE

Due to this particular handling of hot-add/remove memory from HMM/devm,
I think it would be nice to provide a helper function in order to
make this cleaner, and not populate other regions with code
that should belong to memory-hotplug.

The helpers are named:

del_device_memory
add_device_memory

The idea is that add_device_memory will be in charge of:

a) call either arch_add_memory() or add_pages(), depending on whether
   we want a linear mapping
b) online the memory sections that correspond to the pfn range
c) call move_pfn_range_to_zone() being zone ZONE_DEVICE to
   expand zone/pgdat spanned pages and initialize its pages

del_device_memory, on the other hand, will be in charge of:

a) offline the memory sections that correspond to the pfn range
b) call shrink_zone_pgdat_pages(), which shrinks node/zone spanned pages.
c) call either arch_remove_memory() or __remove_pages(), depending on
   whether we need to tear down the linear mapping or not

The reason behind step b) from add_device_memory() and step a)
from del_device_memory is that now find_smallest/biggest_section_pfn
will have to check for online sections, and not for valid sections as
they used to do, because we call offline_mem_sections() in
offline_pages().

In order to split up better the patches and ease the review,
this patch will only make a) case work for add_device_memory(),
and case c) for del_device_memory.

The other cases will be added in the next patch.

These two functions have to be called from devm/HMM code:

dd_device_memory:
        - devm_memremap_pages()
        - hmm_devmem_pages_create()

del_device_memory:
        - hmm_devmem_release
        - devm_memremap_pages_release

One thing I do not know is whether we can move kasan calls out of the
hotplug lock or not.
If we can, we could move the hotplug lock within add/del_device_memory().

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/memory_hotplug.h | 11 +++++++++++
 kernel/memremap.c              | 11 ++++-------
 mm/hmm.c                       | 33 +++++++++++++++++----------------
 mm/memory_hotplug.c            | 41 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 73 insertions(+), 23 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 786cdfc9a974..cf014d5edbb2 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -111,8 +111,19 @@ extern int arch_remove_memory(int nid, u64 start, u64 size,
 		struct vmem_altmap *altmap);
 extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages, struct vmem_altmap *altmap);
+
+#ifdef CONFIG_ZONE_DEVICE
+extern int del_device_memory(int nid, unsigned long start, unsigned long size,
+				struct vmem_altmap *altmap, bool private_mem);
+#endif
+
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
+#ifdef CONFIG_ZONE_DEVICE
+extern int add_device_memory(int nid, unsigned long start, unsigned long size,
+				struct vmem_altmap *altmap, bool private_mem);
+#endif
+
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
 		struct vmem_altmap *altmap, bool want_memblock);
diff --git a/kernel/memremap.c b/kernel/memremap.c
index c95df6ed2d4a..b86bba8713b9 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -86,6 +86,8 @@ static void devm_memremap_pages_release(void *data)
 	struct device *dev = pgmap->dev;
 	struct resource *res = &pgmap->res;
 	resource_size_t align_start, align_size;
+	struct vmem_altmap *altmap = pgmap->altmap_valid ?
+					&pgmap->altmap : NULL;
 	unsigned long pfn;
 	int nid;
 
@@ -104,8 +106,7 @@ static void devm_memremap_pages_release(void *data)
 	nid = dev_to_node(dev);
 
 	mem_hotplug_begin();
-	arch_remove_memory(nid, align_start, align_size, pgmap->altmap_valid ?
-			&pgmap->altmap : NULL);
+	del_device_memory(nid, align_start, align_size, altmap, true);
 	kasan_remove_zero_shadow(__va(align_start), align_size);
 	mem_hotplug_done();
 
@@ -204,11 +205,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 		goto err_kasan;
 	}
 
-	error = arch_add_memory(nid, align_start, align_size, altmap, false);
-	if (!error)
-		move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
-					align_start >> PAGE_SHIFT,
-					align_size >> PAGE_SHIFT, altmap);
+	error = add_device_memory(nid, align_start, align_size, altmap, true);
 	mem_hotplug_done();
 	if (error)
 		goto err_add_memory;
diff --git a/mm/hmm.c b/mm/hmm.c
index 42d79bcc8aab..d3e52ae71bd9 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -996,6 +996,7 @@ static void hmm_devmem_release(struct device *dev, void *data)
 	struct zone *zone;
 	struct page *page;
 	int nid;
+	bool mapping;
 
 	if (percpu_ref_tryget_live(&devmem->ref)) {
 		dev_WARN(dev, "%s: page mapping is still live!\n", __func__);
@@ -1010,12 +1011,15 @@ static void hmm_devmem_release(struct device *dev, void *data)
 	zone = page_zone(page);
 	nid = zone->zone_pgdat->node_id;
 
-	mem_hotplug_begin();
 	if (resource->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY)
-		__remove_pages(zone, start_pfn, npages, NULL);
+		mapping = false;
 	else
-		arch_remove_memory(nid, start_pfn << PAGE_SHIFT,
-				   npages << PAGE_SHIFT, NULL);
+		mapping = true;
+
+	mem_hotplug_begin();
+	del_device_memory(nid, start_pfn << PAGE_SHIFT, npages << PAGE_SHIFT,
+								NULL,
+								mapping);
 	mem_hotplug_done();
 
 	hmm_devmem_radix_release(resource);
@@ -1026,6 +1030,7 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
 	resource_size_t key, align_start, align_size, align_end;
 	struct device *device = devmem->device;
 	int ret, nid, is_ram;
+	bool mapping;
 
 	align_start = devmem->resource->start & ~(PA_SECTION_SIZE - 1);
 	align_size = ALIGN(devmem->resource->start +
@@ -1084,7 +1089,6 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
 	if (nid < 0)
 		nid = numa_mem_id();
 
-	mem_hotplug_begin();
 	/*
 	 * For device private memory we call add_pages() as we only need to
 	 * allocate and initialize struct page for the device memory. More-
@@ -1096,20 +1100,17 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
 	 * want the linear mapping and thus use arch_add_memory().
 	 */
 	if (devmem->pagemap.type == MEMORY_DEVICE_PUBLIC)
-		ret = arch_add_memory(nid, align_start, align_size, NULL,
-				false);
+		mapping = true;
 	else
-		ret = add_pages(nid, align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, NULL, false);
-	if (ret) {
-		mem_hotplug_done();
-		goto error_add_memory;
-	}
-	move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
-				align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, NULL);
+		mapping = false;
+
+	mem_hotplug_begin();
+	ret = add_device_memory(nid, align_start, align_size, NULL, mapping);
 	mem_hotplug_done();
 
+	if (ret)
+		goto error_add_memory;
+
 	/*
 	 * Initialization of the pages has been deferred until now in order
 	 * to allow us to do the work while not holding the hotplug lock.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 33d448314b3f..5874aceb81ac 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1889,4 +1889,45 @@ void remove_memory(int nid, u64 start, u64 size)
 	unlock_device_hotplug();
 }
 EXPORT_SYMBOL_GPL(remove_memory);
+
+#ifdef CONFIG_ZONE_DEVICE
+int del_device_memory(int nid, unsigned long start, unsigned long size,
+				struct vmem_altmap *altmap, bool mapping)
+{
+	int ret;
+	unsigned long start_pfn = PHYS_PFN(start);
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	struct zone *zone = page_zone(pfn_to_page(pfn));
+
+	if (mapping)
+		ret = arch_remove_memory(nid, start, size, altmap);
+	else
+		ret = __remove_pages(zone, start_pfn, nr_pages, altmap);
+
+	return ret;
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTREMOVE */
+
+#ifdef CONFIG_ZONE_DEVICE
+int add_device_memory(int nid, unsigned long start, unsigned long size,
+				struct vmem_altmap *altmap, bool mapping)
+{
+	int ret;
+	unsigned long start_pfn = PHYS_PFN(start);
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+
+	if (mapping)
+		ret = arch_add_memory(nid, start, size, altmap, false);
+	else
+		ret = add_pages(nid, start_pfn, nr_pages, altmap, false);
+
+	if (!ret) {
+		struct zone *zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
+
+		move_pfn_range_to_zone(zone, start_pfn, nr_pages, altmap);
+	}
+
+	return ret;
+}
+#endif
-- 
2.13.6


  parent reply	other threads:[~2018-10-15 15:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-15 15:30 [PATCH 0/5] Do not touch pages/zones during hot-remove path Oscar Salvador
2018-10-15 15:30 ` [PATCH 1/5] mm/memory_hotplug: Add nid parameter to arch_remove_memory Oscar Salvador
2018-11-12 18:52   ` Pavel Tatashin
2018-10-15 15:30 ` Oscar Salvador [this message]
2018-10-16  8:55   ` [PATCH 2/5] mm/memory_hotplug: Create add/del_device_memory functions David Hildenbrand
2018-10-17  9:33     ` Oscar Salvador
2018-10-17  9:45       ` David Hildenbrand
2018-10-18  6:57         ` Oscar Salvador
2018-11-12 19:14   ` Pavel Tatashin
2018-11-12 19:21   ` Dan Williams
2018-11-12 21:28     ` Pavel Tatashin
2018-11-16 13:06       ` osalvador
2018-11-16 13:03     ` osalvador
2018-10-15 15:30 ` [PATCH 3/5] mm/memory_hotplug: Check for IORESOURCE_SYSRAM in release_mem_region_adjustable Oscar Salvador
2018-10-16  9:54   ` David Hildenbrand
2018-11-12 19:27   ` Pavel Tatashin
2018-10-15 15:30 ` [PATCH 4/5] mm/memory_hotplug: Move zone/pages handling to offline stage Oscar Salvador
2018-10-15 15:30 ` [PATCH 5/5] mm/memory-hotplug: Rework unregister_mem_sect_under_nodes Oscar Salvador
2018-10-18 14:24   ` Jonathan Cameron
2018-10-18 15:02     ` Oscar Salvador
2018-10-18 15:38       ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181015153034.32203-3-osalvador@techadventures.net \
    --to=osalvador@techadventures.net \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=malat@debian.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=pavel.tatashin@microsoft.com \
    --cc=rafael@kernel.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=yasu.isimatu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).