linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] mm/memory_hotplug: Factor out memory block device handling
@ 2019-04-08 10:12 David Hildenbrand
  2019-04-08 10:12 ` [PATCH RFC 1/3] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: David Hildenbrand @ 2019-04-08 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Greg Kroah-Hartman, Rafael J . Wysocki,
	Ingo Molnar, Andrew Banman, mike.travis, Jonathan Cameron,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Wei Yang, Qian Cai,
	Arun KS, Mathieu Malaterre, linux-mm, dan.j.williams,
	David Hildenbrand

We only want memory block devices for memory to be onlined/offlined
(add/remove from the buddy). This is required so user space can
online/offline memory and kdump gets notified about newly onlined memory.

Only such memory has the requirement of having to span whole memory blocks.
Let's factor out creation/removal of memory block devices.

This not only allows to clean up arch_add_memory() to get rid of
want_memblock, but also reduces locking overhead and eventually allows
us to handle errors while adding memory in a nicer fashion.

Only did a quick sanity test with DIMM plug/unplug. This should be
sufficient to discuss the general approach. Patches are against
next/master.

David Hildenbrand (3):
  mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  mm/memory_hotplug: Create memory block devices after arch_add_memory()
  mm/memory_hotplug: Remove memory block devices before
    arch_remove_memory()

 drivers/base/memory.c  | 108 +++++++++++++++++++++--------------------
 drivers/base/node.c    |   7 ++-
 include/linux/memory.h |   4 +-
 include/linux/node.h   |   6 +--
 mm/memory_hotplug.c    |  38 +++++++--------
 5 files changed, 81 insertions(+), 82 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC 1/3] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  2019-04-08 10:12 [PATCH RFC 0/3] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
@ 2019-04-08 10:12 ` David Hildenbrand
  2019-04-08 10:12 ` [PATCH RFC 2/3] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
  2019-04-08 10:12 ` [PATCH RFC 3/3] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
  2 siblings, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2019-04-08 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Greg Kroah-Hartman, Rafael J . Wysocki,
	Ingo Molnar, Andrew Banman, mike.travis, Jonathan Cameron,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Wei Yang, Qian Cai,
	Arun KS, Mathieu Malaterre, linux-mm, dan.j.williams,
	David Hildenbrand

By converting start and size to page granularity, we actually ignore
unaligned parts within a page instead of properly bailing out with an
error.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f206b8b66af1..680dcc67f9d5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1070,16 +1070,11 @@ int try_online_node(int nid)
 
 static int check_hotplug_memory_range(u64 start, u64 size)
 {
-	unsigned long block_sz = memory_block_size_bytes();
-	u64 block_nr_pages = block_sz >> PAGE_SHIFT;
-	u64 nr_pages = size >> PAGE_SHIFT;
-	u64 start_pfn = PFN_DOWN(start);
-
 	/* memory range must be block size aligned */
-	if (!nr_pages || !IS_ALIGNED(start_pfn, block_nr_pages) ||
-	    !IS_ALIGNED(nr_pages, block_nr_pages)) {
+	if (!size || !IS_ALIGNED(start, memory_block_size_bytes()) ||
+	    !IS_ALIGNED(size, memory_block_size_bytes())) {
 		pr_err("Block size [%#lx] unaligned hotplug range: start %#llx, size %#llx",
-		       block_sz, start, size);
+		       memory_block_size_bytes(), start, size);
 		return -EINVAL;
 	}
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 2/3] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-04-08 10:12 [PATCH RFC 0/3] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
  2019-04-08 10:12 ` [PATCH RFC 1/3] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
@ 2019-04-08 10:12 ` David Hildenbrand
  2019-04-09  7:33   ` David Hildenbrand
  2019-04-08 10:12 ` [PATCH RFC 3/3] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
  2 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2019-04-08 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Greg Kroah-Hartman, Rafael J . Wysocki,
	Ingo Molnar, Andrew Banman, mike.travis, Jonathan Cameron,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Wei Yang, Qian Cai,
	Arun KS, Mathieu Malaterre, linux-mm, dan.j.williams,
	David Hildenbrand

Only memory added via add_memory() and friends will need memory
block devices - only memory to be used via the buddy and to be onlined/
offlined by user space in memory block granularity.

Move creation of memory block devices out of arch_add_memory(). Create all
devices after arch_add_memory() succeeded. We can later drop the
want_memblock parameter, because it is now effectively stale.

Only after memory block devices have been added, memory can be onlined
by user space. This implies, that memory is not visible to user space at
all before arch_add_memory() succeeded.

Issue 1: __add_pages() does not remove pages in case something went
wrong. If this is the case, we would now no longer create memory block
devices for such "partially added memory". So the memory would not be
usable/onlinable. Bad? Or related to issue 2 (e.g. fix __add_pages()
to remove any parts that were added in case of an error). Functions that
fail and don't clean up are not that nice.

Issue 2: In case we can't add memory block devices, and we don't have
HOTREMOVE, we can't remove the pages via arch_remove_pages. Maybe we should
try to get rid of CONFIG_MEMORY_HOTREMOVE, so we can handle all failures
in a nice way? Or at least allow arch_remove_pages() and friends, so a
subset of CONFIG_MEMORY_HOTREMOVE.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 67 +++++++++++++++++++++++++-----------------
 include/linux/memory.h |  2 +-
 mm/memory_hotplug.c    | 17 +++++++----
 3 files changed, 53 insertions(+), 33 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index d9ebb89816f7..847b33061e2e 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -701,44 +701,57 @@ static int add_memory_block(int base_section_nr)
 	return 0;
 }
 
-/*
- * need an interface for the VM to add new memory regions,
- * but without onlining it.
- */
-int hotplug_memory_register(int nid, struct mem_section *section)
+static void unregister_memory(struct memory_block *memory)
 {
-	int ret = 0;
+	BUG_ON(memory->dev.bus != &memory_subsys);
+
+	/* drop the ref. we got via find_memory_block() */
+	put_device(&memory->dev);
+	device_unregister(&memory->dev);
+}
+
+int hotplug_memory_register(unsigned long start, unsigned long size)
+{
+	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
+	unsigned long start_pfn = PFN_DOWN(start);
+	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
+	unsigned long pfn;
 	struct memory_block *mem;
+	int ret = 0;
 
-	mutex_lock(&mem_sysfs_mutex);
+	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
+	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
 
-	mem = find_memory_block(section);
-	if (mem) {
-		mem->section_count++;
-		put_device(&mem->dev);
-	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
+	mutex_lock(&mem_sysfs_mutex);
+	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
+		mem = find_memory_block(__pfn_to_section(pfn));
+		if (mem) {
+			WARN_ON_ONCE(false);
+			put_device(&mem->dev);
+			continue;
+		}
+		ret = init_memory_block(&mem, __pfn_to_section(pfn),
+					MEM_OFFLINE);
 		if (ret)
-			goto out;
-		mem->section_count++;
+			break;
+		mem->section_count = memory_block_size_bytes() /
+				     MIN_MEMORY_BLOCK_SIZE;
+	}
+	if (ret) {
+		end_pfn = pfn;
+		for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
+			mem = find_memory_block(__pfn_to_section(pfn));
+			if (!mem)
+				continue;
+			mem->section_count = 0;
+			unregister_memory(mem);
+		}
 	}
-
-out:
 	mutex_unlock(&mem_sysfs_mutex);
 	return ret;
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-static void
-unregister_memory(struct memory_block *memory)
-{
-	BUG_ON(memory->dev.bus != &memory_subsys);
-
-	/* drop the ref. we got in remove_memory_section() */
-	put_device(&memory->dev);
-	device_unregister(&memory->dev);
-}
-
 static int remove_memory_section(struct mem_section *section)
 {
 	struct memory_block *mem;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a6ddefc60517..e275dc775834 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -111,7 +111,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
-int hotplug_memory_register(int nid, struct mem_section *section);
+int hotplug_memory_register(unsigned long start, unsigned long size);
 #ifdef CONFIG_MEMORY_HOTREMOVE
 extern int unregister_memory_section(struct mem_section *);
 #endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 680dcc67f9d5..13ee0a26e034 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -260,11 +260,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 	ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
 	if (ret < 0)
 		return ret;
-
-	if (!want_memblock)
-		return 0;
-
-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
+	return 0;
 }
 
 /*
@@ -1125,6 +1121,17 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	if (ret < 0)
 		goto error;
 
+	/* create memory block devices after memory was added */
+	ret = hotplug_memory_register(start, size);
+#ifdef CONFIG_MEMORY_HOTREMOVE
+	if (ret) {
+		arch_remove_memory(nid, start, size, NULL);
+		goto error;
+	}
+#else
+	WARN_ON(ret);
+#endif
+
 	if (new_node) {
 		/* If sysfs file of new node can't be created, cpu on the node
 		 * can't be hot-added. There is no rollback way now.
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 3/3] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
  2019-04-08 10:12 [PATCH RFC 0/3] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
  2019-04-08 10:12 ` [PATCH RFC 1/3] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
  2019-04-08 10:12 ` [PATCH RFC 2/3] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
@ 2019-04-08 10:12 ` David Hildenbrand
  2019-04-09  9:18   ` Oscar Salvador
  2 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2019-04-08 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Greg Kroah-Hartman, Rafael J . Wysocki,
	Ingo Molnar, Andrew Banman, mike.travis, Jonathan Cameron,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Wei Yang, Qian Cai,
	Arun KS, Mathieu Malaterre, linux-mm, dan.j.williams,
	David Hildenbrand

Let's factor out removing of memory block devices, which is only
necessary for memory added via add_memory() and friends that created
memory block devices. Remove the devices before calling
arch_remove_memory().

TODO: We should try to get rid of the errors that could be reported by
unregister_memory_block_under_nodes(). Ignoring failures is not that
nice.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 41 +++++++++++++++--------------------------
 drivers/base/node.c    |  7 +++----
 include/linux/memory.h |  2 +-
 include/linux/node.h   |  6 ++----
 mm/memory_hotplug.c    | 10 ++++------
 5 files changed, 25 insertions(+), 41 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 847b33061e2e..fd8940c37129 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -752,40 +752,29 @@ int hotplug_memory_register(unsigned long start, unsigned long size)
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-static int remove_memory_section(struct mem_section *section)
+void hotplug_memory_unregister(unsigned long start, unsigned long size)
 {
+	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
+	unsigned long start_pfn = PFN_DOWN(start);
+	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
 	struct memory_block *mem;
+	unsigned long pfn;
 
-	mutex_lock(&mem_sysfs_mutex);
-
-	/*
-	 * Some users of the memory hotplug do not want/need memblock to
-	 * track all sections. Skip over those.
-	 */
-	mem = find_memory_block(section);
-	if (!mem)
-		goto out_unlock;
-
-	unregister_mem_sect_under_nodes(mem, __section_nr(section));
+	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
+	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
 
-	mem->section_count--;
-	if (mem->section_count == 0)
+	mutex_lock(&mem_sysfs_mutex);
+	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
+		mem = find_memory_block(__pfn_to_section(pfn));
+		if (!mem)
+			continue;
+		mem->section_count = 0;
+		unregister_memory_block_under_nodes(mem);
 		unregister_memory(mem);
-	else
-		put_device(&mem->dev);
-
-out_unlock:
+	}
 	mutex_unlock(&mem_sysfs_mutex);
-	return 0;
 }
 
-int unregister_memory_section(struct mem_section *section)
-{
-	if (!present_section(section))
-		return -EINVAL;
-
-	return remove_memory_section(section);
-}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /* return true if the memory block is offlined, otherwise, return false */
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 8598fcbd2a17..f9997770ac15 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -802,8 +802,7 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
 }
 
 /* unregister memory section under all nodes that it spans */
-int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
-				    unsigned long phys_index)
+int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
 	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
@@ -816,8 +815,8 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
 		return -ENOMEM;
 	nodes_clear(*unlinked_nodes);
 
-	sect_start_pfn = section_nr_to_pfn(phys_index);
-	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
+	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
 		int nid;
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index e275dc775834..414e43ab0881 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -113,7 +113,7 @@ extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
 int hotplug_memory_register(unsigned long start, unsigned long size);
 #ifdef CONFIG_MEMORY_HOTREMOVE
-extern int unregister_memory_section(struct mem_section *);
+void hotplug_memory_unregister(unsigned long start, unsigned long size);
 #endif
 extern int memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
diff --git a/include/linux/node.h b/include/linux/node.h
index 1a557c589ecb..02a29e71b175 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -139,8 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int register_mem_sect_under_node(struct memory_block *mem_blk,
 						void *arg);
-extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
-					   unsigned long phys_index);
+extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
 
 extern int register_memory_node_under_compute_node(unsigned int mem_nid,
 						   unsigned int cpu_nid,
@@ -176,8 +175,7 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
 {
 	return 0;
 }
-static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
-						  unsigned long phys_index)
+static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
 	return 0;
 }
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 13ee0a26e034..041b93c5eede 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -518,14 +518,9 @@ static int __remove_section(struct zone *zone, struct mem_section *ms,
 {
 	unsigned long start_pfn;
 	int scn_nr;
-	int ret = -EINVAL;
 
 	if (!valid_section(ms))
-		return ret;
-
-	ret = unregister_memory_section(ms);
-	if (ret)
-		return ret;
+		return -EINVAL;
 
 	scn_nr = __section_nr(ms);
 	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
@@ -1875,6 +1870,9 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
 	memblock_free(start, size);
 	memblock_remove(start, size);
 
+	/* remove memory block devices before removing memory */
+	hotplug_memory_unregister(start, size);
+
 	arch_remove_memory(nid, start, size, NULL);
 
 	try_offline_node(nid);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 2/3] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-04-08 10:12 ` [PATCH RFC 2/3] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
@ 2019-04-09  7:33   ` David Hildenbrand
  0 siblings, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2019-04-09  7:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Greg Kroah-Hartman, Rafael J . Wysocki,
	Ingo Molnar, Andrew Banman, mike.travis, Jonathan Cameron,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Wei Yang, Qian Cai,
	Arun KS, Mathieu Malaterre, linux-mm, dan.j.williams

On 08.04.19 12:12, David Hildenbrand wrote:
> Only memory added via add_memory() and friends will need memory
> block devices - only memory to be used via the buddy and to be onlined/
> offlined by user space in memory block granularity.
> 
> Move creation of memory block devices out of arch_add_memory(). Create all
> devices after arch_add_memory() succeeded. We can later drop the
> want_memblock parameter, because it is now effectively stale.
> 
> Only after memory block devices have been added, memory can be onlined
> by user space. This implies, that memory is not visible to user space at
> all before arch_add_memory() succeeded.
> 
> Issue 1: __add_pages() does not remove pages in case something went
> wrong. If this is the case, we would now no longer create memory block
> devices for such "partially added memory". So the memory would not be
> usable/onlinable. Bad? Or related to issue 2 (e.g. fix __add_pages()
> to remove any parts that were added in case of an error). Functions that
> fail and don't clean up are not that nice.
> 
> Issue 2: In case we can't add memory block devices, and we don't have
> HOTREMOVE, we can't remove the pages via arch_remove_pages. Maybe we should
> try to get rid of CONFIG_MEMORY_HOTREMOVE, so we can handle all failures
> in a nice way? Or at least allow arch_remove_pages() and friends, so a
> subset of CONFIG_MEMORY_HOTREMOVE.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/memory.c  | 67 +++++++++++++++++++++++++-----------------
>  include/linux/memory.h |  2 +-
>  mm/memory_hotplug.c    | 17 +++++++----
>  3 files changed, 53 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index d9ebb89816f7..847b33061e2e 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -701,44 +701,57 @@ static int add_memory_block(int base_section_nr)
>  	return 0;
>  }
>  
> -/*
> - * need an interface for the VM to add new memory regions,
> - * but without onlining it.
> - */
> -int hotplug_memory_register(int nid, struct mem_section *section)
> +static void unregister_memory(struct memory_block *memory)
>  {
> -	int ret = 0;
> +	BUG_ON(memory->dev.bus != &memory_subsys);
> +
> +	/* drop the ref. we got via find_memory_block() */
> +	put_device(&memory->dev);
> +	device_unregister(&memory->dev);
> +}
> +
> +int hotplug_memory_register(unsigned long start, unsigned long size)
> +{
> +	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
> +	unsigned long start_pfn = PFN_DOWN(start);
> +	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
> +	unsigned long pfn;
>  	struct memory_block *mem;
> +	int ret = 0;
>  
> -	mutex_lock(&mem_sysfs_mutex);
> +	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
> +	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
>  
> -	mem = find_memory_block(section);
> -	if (mem) {
> -		mem->section_count++;
> -		put_device(&mem->dev);
> -	} else {
> -		ret = init_memory_block(&mem, section, MEM_OFFLINE);
> +	mutex_lock(&mem_sysfs_mutex);
> +	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +		mem = find_memory_block(__pfn_to_section(pfn));
> +		if (mem) {
> +			WARN_ON_ONCE(false);
> +			put_device(&mem->dev);
> +			continue;
> +		}
> +		ret = init_memory_block(&mem, __pfn_to_section(pfn),
> +					MEM_OFFLINE);
>  		if (ret)
> -			goto out;
> -		mem->section_count++;
> +			break;
> +		mem->section_count = memory_block_size_bytes() /
> +				     MIN_MEMORY_BLOCK_SIZE;
> +	}
> +	if (ret) {
> +		end_pfn = pfn;
> +		for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +			mem = find_memory_block(__pfn_to_section(pfn));
> +			if (!mem)
> +				continue;
> +			mem->section_count = 0;
> +			unregister_memory(mem);
> +		}
>  	}
> -
> -out:
>  	mutex_unlock(&mem_sysfs_mutex);
>  	return ret;
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> -static void
> -unregister_memory(struct memory_block *memory)
> -{
> -	BUG_ON(memory->dev.bus != &memory_subsys);
> -
> -	/* drop the ref. we got in remove_memory_section() */
> -	put_device(&memory->dev);
> -	device_unregister(&memory->dev);
> -}
> -
>  static int remove_memory_section(struct mem_section *section)
>  {
>  	struct memory_block *mem;
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index a6ddefc60517..e275dc775834 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -111,7 +111,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
>  extern void unregister_memory_notifier(struct notifier_block *nb);
>  extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
> -int hotplug_memory_register(int nid, struct mem_section *section);
> +int hotplug_memory_register(unsigned long start, unsigned long size);
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  extern int unregister_memory_section(struct mem_section *);
>  #endif
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 680dcc67f9d5..13ee0a26e034 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -260,11 +260,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>  	ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
>  	if (ret < 0)
>  		return ret;
> -
> -	if (!want_memblock)
> -		return 0;
> -
> -	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
> +	return 0;
>  }
>  
>  /*
> @@ -1125,6 +1121,17 @@ int __ref add_memory_resource(int nid, struct resource *res)
>  	if (ret < 0)
>  		goto error;
>  
> +	/* create memory block devices after memory was added */
> +	ret = hotplug_memory_register(start, size);
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +	if (ret) {
> +		arch_remove_memory(nid, start, size, NULL);
> +		goto error;
> +	}
> +#else
> +	WARN_ON(ret);
> +#endif
> +
>  	if (new_node) {
>  		/* If sysfs file of new node can't be created, cpu on the node
>  		 * can't be hot-added. There is no rollback way now.
> 

FWIW, I think we should first try to make sure arch_remove_memory()
cannot fail / will not ignore errors if possible. There are still some
things in there that need more re-factoring first.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 3/3] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
  2019-04-08 10:12 ` [PATCH RFC 3/3] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
@ 2019-04-09  9:18   ` Oscar Salvador
  2019-04-09  9:25     ` David Hildenbrand
  0 siblings, 1 reply; 7+ messages in thread
From: Oscar Salvador @ 2019-04-09  9:18 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, Andrew Morton, Greg Kroah-Hartman,
	Rafael J . Wysocki, Ingo Molnar, Andrew Banman, mike.travis,
	Jonathan Cameron, Michal Hocko, Pavel Tatashin, Wei Yang,
	Qian Cai, Arun KS, Mathieu Malaterre, linux-mm, dan.j.williams

On Mon, Apr 08, 2019 at 12:12:26PM +0200, David Hildenbrand wrote:
> Let's factor out removing of memory block devices, which is only
> necessary for memory added via add_memory() and friends that created
> memory block devices. Remove the devices before calling
> arch_remove_memory().
> 
> TODO: We should try to get rid of the errors that could be reported by
> unregister_memory_block_under_nodes(). Ignoring failures is not that
> nice.

Hi David,

I am sorry but I will not have to look into this until next week as I am
up to my ears with work plus I am in the middle of a move.

I remember I was once trying to simplify unregister_mem_sect_under_nodes (your
new unregister_memory_block_under_nodes), and I checked whether we could get
rid of the NODEMASK_ALLOC there, something like:

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 8598fcbd2a17..f4294a2928dd 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -805,16 +805,10 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
 int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
                                    unsigned long phys_index)
 {
-       NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
+       nodemask_t unlinked_nodes;
        unsigned long pfn, sect_start_pfn, sect_end_pfn;
 
-       if (!mem_blk) {
-               NODEMASK_FREE(unlinked_nodes);
-               return -EFAULT;
-       }
-       if (!unlinked_nodes)
-               return -ENOMEM;
-       nodes_clear(*unlinked_nodes);
+       nodes_clear(unlinked_nodes);
 
        sect_start_pfn = section_nr_to_pfn(phys_index);
        sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
@@ -826,14 +820,13 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
                        continue;
                if (!node_online(nid))
                        continue;
-               if (node_test_and_set(nid, *unlinked_nodes))
+               if (node_test_and_set(nid, unlinked_nodes))
                        continue;
                sysfs_remove_link(&node_devices[nid]->dev.kobj,
                         kobject_name(&mem_blk->dev.kobj));
                sysfs_remove_link(&mem_blk->dev.kobj,
                         kobject_name(&node_devices[nid]->dev.kobj));
        }
-       NODEMASK_FREE(unlinked_nodes);
        return 0;
 }


nodemask_t is 128bytes when CONFIG_NODES_SHIFT is 10 , which is the maximum value.
We just need to check whether we can overflow the stack or not.

AFAICS, it is not really a shore stack but it might not be that deep either.

> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/memory.c  | 41 +++++++++++++++--------------------------
>  drivers/base/node.c    |  7 +++----
>  include/linux/memory.h |  2 +-
>  include/linux/node.h   |  6 ++----
>  mm/memory_hotplug.c    | 10 ++++------
>  5 files changed, 25 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 847b33061e2e..fd8940c37129 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -752,40 +752,29 @@ int hotplug_memory_register(unsigned long start, unsigned long size)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> -static int remove_memory_section(struct mem_section *section)
> +void hotplug_memory_unregister(unsigned long start, unsigned long size)
>  {
> +	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
> +	unsigned long start_pfn = PFN_DOWN(start);
> +	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>  	struct memory_block *mem;
> +	unsigned long pfn;
>  
> -	mutex_lock(&mem_sysfs_mutex);
> -
> -	/*
> -	 * Some users of the memory hotplug do not want/need memblock to
> -	 * track all sections. Skip over those.
> -	 */
> -	mem = find_memory_block(section);
> -	if (!mem)
> -		goto out_unlock;
> -
> -	unregister_mem_sect_under_nodes(mem, __section_nr(section));
> +	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
> +	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
>  
> -	mem->section_count--;
> -	if (mem->section_count == 0)
> +	mutex_lock(&mem_sysfs_mutex);
> +	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +		mem = find_memory_block(__pfn_to_section(pfn));
> +		if (!mem)
> +			continue;
> +		mem->section_count = 0;
> +		unregister_memory_block_under_nodes(mem);
>  		unregister_memory(mem);
> -	else
> -		put_device(&mem->dev);
> -
> -out_unlock:
> +	}
>  	mutex_unlock(&mem_sysfs_mutex);
> -	return 0;
>  }
>  
> -int unregister_memory_section(struct mem_section *section)
> -{
> -	if (!present_section(section))
> -		return -EINVAL;
> -
> -	return remove_memory_section(section);
> -}
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  /* return true if the memory block is offlined, otherwise, return false */
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 8598fcbd2a17..f9997770ac15 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -802,8 +802,7 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
>  }
>  
>  /* unregister memory section under all nodes that it spans */
> -int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -				    unsigned long phys_index)
> +int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  {
>  	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
>  	unsigned long pfn, sect_start_pfn, sect_end_pfn;
> @@ -816,8 +815,8 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>  		return -ENOMEM;
>  	nodes_clear(*unlinked_nodes);
>  
> -	sect_start_pfn = section_nr_to_pfn(phys_index);
> -	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
> +	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
> +	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
>  	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
>  		int nid;
>  
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index e275dc775834..414e43ab0881 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -113,7 +113,7 @@ extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>  int hotplug_memory_register(unsigned long start, unsigned long size);
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> -extern int unregister_memory_section(struct mem_section *);
> +void hotplug_memory_unregister(unsigned long start, unsigned long size);
>  #endif
>  extern int memory_dev_init(void);
>  extern int memory_notify(unsigned long val, void *v);
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 1a557c589ecb..02a29e71b175 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -139,8 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
>  extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
>  extern int register_mem_sect_under_node(struct memory_block *mem_blk,
>  						void *arg);
> -extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -					   unsigned long phys_index);
> +extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
>  
>  extern int register_memory_node_under_compute_node(unsigned int mem_nid,
>  						   unsigned int cpu_nid,
> @@ -176,8 +175,7 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
>  {
>  	return 0;
>  }
> -static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -						  unsigned long phys_index)
> +static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  {
>  	return 0;
>  }
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 13ee0a26e034..041b93c5eede 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -518,14 +518,9 @@ static int __remove_section(struct zone *zone, struct mem_section *ms,
>  {
>  	unsigned long start_pfn;
>  	int scn_nr;
> -	int ret = -EINVAL;
>  
>  	if (!valid_section(ms))
> -		return ret;
> -
> -	ret = unregister_memory_section(ms);
> -	if (ret)
> -		return ret;
> +		return -EINVAL;
>  
>  	scn_nr = __section_nr(ms);
>  	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
> @@ -1875,6 +1870,9 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
>  	memblock_free(start, size);
>  	memblock_remove(start, size);
>  
> +	/* remove memory block devices before removing memory */
> +	hotplug_memory_unregister(start, size);
> +
>  	arch_remove_memory(nid, start, size, NULL);
>  
>  	try_offline_node(nid);
> -- 
> 2.17.2
> 

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 3/3] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
  2019-04-09  9:18   ` Oscar Salvador
@ 2019-04-09  9:25     ` David Hildenbrand
  0 siblings, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2019-04-09  9:25 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, Andrew Morton, Greg Kroah-Hartman,
	Rafael J . Wysocki, Ingo Molnar, Andrew Banman, mike.travis,
	Jonathan Cameron, Michal Hocko, Pavel Tatashin, Wei Yang,
	Qian Cai, Arun KS, Mathieu Malaterre, linux-mm, dan.j.williams

On 09.04.19 11:18, Oscar Salvador wrote:
> On Mon, Apr 08, 2019 at 12:12:26PM +0200, David Hildenbrand wrote:
>> Let's factor out removing of memory block devices, which is only
>> necessary for memory added via add_memory() and friends that created
>> memory block devices. Remove the devices before calling
>> arch_remove_memory().
>>
>> TODO: We should try to get rid of the errors that could be reported by
>> unregister_memory_block_under_nodes(). Ignoring failures is not that
>> nice.
> 
> Hi David,
> 
> I am sorry but I will not have to look into this until next week as I am
> up to my ears with work plus I am in the middle of a move.

No worries, I have plenty of other stuff to do as well and this is only
an RFC that will require other refactorings and maybe discussions first
- one of these, I will send out shortly so we can discuss.

Happy moving :)

> 
> I remember I was once trying to simplify unregister_mem_sect_under_nodes (your
> new unregister_memory_block_under_nodes), and I checked whether we could get
> rid of the NODEMASK_ALLOC there, something like:

Yeah, something like that makes perfect sense. Thanks!

> 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 8598fcbd2a17..f4294a2928dd 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -805,16 +805,10 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
>  int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>                                     unsigned long phys_index)
>  {
> -       NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
> +       nodemask_t unlinked_nodes;
>         unsigned long pfn, sect_start_pfn, sect_end_pfn;
>  
> -       if (!mem_blk) {
> -               NODEMASK_FREE(unlinked_nodes);
> -               return -EFAULT;
> -       }
> -       if (!unlinked_nodes)
> -               return -ENOMEM;
> -       nodes_clear(*unlinked_nodes);
> +       nodes_clear(unlinked_nodes);
>  
>         sect_start_pfn = section_nr_to_pfn(phys_index);
>         sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
> @@ -826,14 +820,13 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>                         continue;
>                 if (!node_online(nid))
>                         continue;
> -               if (node_test_and_set(nid, *unlinked_nodes))
> +               if (node_test_and_set(nid, unlinked_nodes))
>                         continue;
>                 sysfs_remove_link(&node_devices[nid]->dev.kobj,
>                          kobject_name(&mem_blk->dev.kobj));
>                 sysfs_remove_link(&mem_blk->dev.kobj,
>                          kobject_name(&node_devices[nid]->dev.kobj));
>         }
> -       NODEMASK_FREE(unlinked_nodes);
>         return 0;
>  }
> 
> 
> nodemask_t is 128bytes when CONFIG_NODES_SHIFT is 10 , which is the maximum value.
> We just need to check whether we can overflow the stack or not.
> 
> AFAICS, it is not really a shore stack but it might not be that deep either.


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-04-09  9:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-08 10:12 [PATCH RFC 0/3] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
2019-04-08 10:12 ` [PATCH RFC 1/3] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
2019-04-08 10:12 ` [PATCH RFC 2/3] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
2019-04-09  7:33   ` David Hildenbrand
2019-04-08 10:12 ` [PATCH RFC 3/3] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
2019-04-09  9:18   ` Oscar Salvador
2019-04-09  9:25     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).