linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/4] mm/memory_hotplug: Better error handling when removing memory
@ 2019-04-09 10:01 David Hildenbrand
  2019-04-09 10:01 ` [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory() David Hildenbrand
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-04-09 10:01 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, David Hildenbrand, Andrew Banman, Andrew Morton,
	Andy Lutomirski, Arun KS, Benjamin Herrenschmidt,
	Borislav Petkov, Christophe Leroy, Dave Hansen, Fenghua Yu,
	Geert Uytterhoeven, Greg Kroah-Hartman, Heiko Carstens,
	H. Peter Anvin, Ingo Molnar, Ingo Molnar, Joonsoo Kim,
	Kirill A. Shutemov, Martin Schwidefsky, Masahiro Yamada,
	Mathieu Malaterre, Michael Ellerman, Michal Hocko, Mike Rapoport,
	mike.travis, Nicholas Piggin, Oscar Salvador, Oscar Salvador,
	Paul Mackerras, Pavel Tatashin, Peter Zijlstra, Qian Cai,
	Rafael J. Wysocki, Rich Felker, Rob Herring, Stefan Agner,
	Thomas Gleixner, Tony Luck, Vasily Gorbik, Wei Yang,
	Yoshinori Sato

Error handling when removing memory is somewhat messed up right now. Some
errors result in warnings, others are completely ignored. Memory unplug
code can essentially not deal with errors properly as of now.
remove_memory() will never fail.

We have basically two choices:
1. Allow arch_remov_memory() and friends to fail, propagating errors via
   remove_memory(). Might be problematic (e.g. DIMMs consisting of multiple
   pieces added/removed separately).
2. Don't allow the functions to fail, handling errors in a nicer way.

It seems like most errors that can theoretically happen are really corner
cases and mostly theoretical (e.g. "section not valid"). However e.g.
aborting removal of sections while all callers simply continue in case of
errors is not nice.

If we can gurantee that removal of memory always works (and WARN/skip in
case of theoretical errors so we can figure out what is going on), we can
go ahead and implement better error handling when adding memory.

E.g. via add_memory():

arch_add_memory()
ret = do_stuff()
if (ret) {
	arch_remove_memory();
	goto error;
}

Handling here that arch_remove_memory() might fail is basically impossible.
So I suggest, let's avoid reporting errors while removing memory, warning
on theoretical errors instead and continuing instead of aborting.

Compile-tested on x86-64, powerpc, s390x. Tested on x86-64 with DIMMs.
Based on git://git.cmpxchg.org/linux-mmots.git

David Hildenbrand (4):
  mm/memory_hotplug: Release memory resource after arch_remove_memory()
  mm/memory_hotplug: Make unregister_memory_section() never fail
  mm/memory_hotplug: Make __remove_section() never fail
  mm/memory_hotplug: Make __remove_pages() and arch_remove_memory()
    never fail

 arch/ia64/mm/init.c            | 11 ++----
 arch/powerpc/mm/mem.c          | 11 +++---
 arch/s390/mm/init.c            |  5 +--
 arch/sh/mm/init.c              | 11 ++----
 arch/x86/mm/init_32.c          |  5 +--
 arch/x86/mm/init_64.c          | 10 ++----
 drivers/base/memory.c          | 16 +++------
 include/linux/memory.h         |  2 +-
 include/linux/memory_hotplug.h |  8 ++---
 mm/memory_hotplug.c            | 63 +++++++++++++++++-----------------
 10 files changed, 60 insertions(+), 82 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()
  2019-04-09 10:01 [PATCH v1 0/4] mm/memory_hotplug: Better error handling when removing memory David Hildenbrand
@ 2019-04-09 10:01 ` David Hildenbrand
  2019-04-09 22:41   ` Andrew Morton
                     ` (3 more replies)
  2019-04-09 10:01 ` [PATCH v1 2/4] mm/memory_hotplug: Make unregister_memory_section() never fail David Hildenbrand
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-04-09 10:01 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Wei Yang, Qian Cai, Arun KS,
	Mathieu Malaterre

__add_pages() doesn't add the memory resource, so __remove_pages()
shouldn't remove it. Let's factor it out. Especially as it is a special
case for memory used as system memory, added via add_memory() and
friends.

We now remove the resource after removing the sections instead of doing
it the other way around. I don't think this change is problematic.

add_memory()
	register memory resource
	arch_add_memory()

remove_memory
	arch_remove_memory()
	release memory resource

While at it, explain why we ignore errors and that it only happeny if
we remove memory in a different granularity as we added it.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4970ff658055..696ed7ee5e28 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -562,20 +562,6 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 	if (is_dev_zone(zone)) {
 		if (altmap)
 			map_offset = vmem_altmap_offset(altmap);
-	} else {
-		resource_size_t start, size;
-
-		start = phys_start_pfn << PAGE_SHIFT;
-		size = nr_pages * PAGE_SIZE;
-
-		ret = release_mem_region_adjustable(&iomem_resource, start,
-					size);
-		if (ret) {
-			resource_size_t endres = start + size - 1;
-
-			pr_warn("Unable to release resource <%pa-%pa> (%d)\n",
-					&start, &endres, ret);
-		}
 	}
 
 	clear_zone_contiguous(zone);
@@ -1820,6 +1806,25 @@ void try_offline_node(int nid)
 }
 EXPORT_SYMBOL(try_offline_node);
 
+static void __release_memory_resource(u64 start, u64 size)
+{
+	int ret;
+
+	/*
+	 * When removing memory in the same granularity as it was added,
+	 * this function never fails. It might only fail if resources
+	 * have to be adjusted or split. We'll ignore the error, as
+	 * removing of memory cannot fail.
+	 */
+	ret = release_mem_region_adjustable(&iomem_resource, start, size);
+	if (ret) {
+		resource_size_t endres = start + size - 1;
+
+		pr_warn("Unable to release resource <%pa-%pa> (%d)\n",
+			&start, &endres, ret);
+	}
+}
+
 /**
  * remove_memory
  * @nid: the node ID
@@ -1854,6 +1859,7 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
 	memblock_remove(start, size);
 
 	arch_remove_memory(nid, start, size, NULL);
+	__release_memory_resource(start, size);
 
 	try_offline_node(nid);
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 2/4] mm/memory_hotplug: Make unregister_memory_section() never fail
  2019-04-09 10:01 [PATCH v1 0/4] mm/memory_hotplug: Better error handling when removing memory David Hildenbrand
  2019-04-09 10:01 ` [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory() David Hildenbrand
@ 2019-04-09 10:01 ` David Hildenbrand
  2019-04-17 12:45   ` Oscar Salvador
  2019-04-09 10:01 ` [PATCH v1 3/4] mm/memory_hotplug: Make __remove_section() " David Hildenbrand
  2019-04-09 10:01 ` [PATCH v1 4/4] mm/memory_hotplug: Make __remove_pages() and arch_remove_memory() " David Hildenbrand
  3 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2019-04-09 10:01 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, David Hildenbrand, Greg Kroah-Hartman,
	Rafael J. Wysocki, Ingo Molnar, Andrew Banman, mike.travis,
	Andrew Morton, Oscar Salvador, Michal Hocko, Pavel Tatashin,
	Qian Cai, Wei Yang, Arun KS, Mathieu Malaterre

Failing while removing memory is mostly ignored and cannot really be
handled. Let's treat errors in unregister_memory_section() in a nice
way, warning, but continuing.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 16 +++++-----------
 include/linux/memory.h |  2 +-
 mm/memory_hotplug.c    |  4 +---
 3 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 0c9e22ffa47a..f180427e48f4 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -734,15 +734,18 @@ unregister_memory(struct memory_block *memory)
 {
 	BUG_ON(memory->dev.bus != &memory_subsys);
 
-	/* drop the ref. we got in remove_memory_section() */
+	/* drop the ref. we got via find_memory_block() */
 	put_device(&memory->dev);
 	device_unregister(&memory->dev);
 }
 
-static int remove_memory_section(struct mem_section *section)
+void unregister_memory_section(struct mem_section *section)
 {
 	struct memory_block *mem;
 
+	if (WARN_ON_ONCE(!present_section(section)))
+		return;
+
 	mutex_lock(&mem_sysfs_mutex);
 
 	/*
@@ -763,15 +766,6 @@ static int remove_memory_section(struct mem_section *section)
 
 out_unlock:
 	mutex_unlock(&mem_sysfs_mutex);
-	return 0;
-}
-
-int unregister_memory_section(struct mem_section *section)
-{
-	if (!present_section(section))
-		return -EINVAL;
-
-	return remove_memory_section(section);
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a6ddefc60517..e1dc1bb2b787 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -113,7 +113,7 @@ extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
 int hotplug_memory_register(int nid, struct mem_section *section);
 #ifdef CONFIG_MEMORY_HOTREMOVE
-extern int unregister_memory_section(struct mem_section *);
+extern void unregister_memory_section(struct mem_section *);
 #endif
 extern int memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 696ed7ee5e28..b0cb05748f99 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -527,9 +527,7 @@ static int __remove_section(struct zone *zone, struct mem_section *ms,
 	if (!valid_section(ms))
 		return ret;
 
-	ret = unregister_memory_section(ms);
-	if (ret)
-		return ret;
+	unregister_memory_section(ms);
 
 	scn_nr = __section_nr(ms);
 	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 3/4] mm/memory_hotplug: Make __remove_section() never fail
  2019-04-09 10:01 [PATCH v1 0/4] mm/memory_hotplug: Better error handling when removing memory David Hildenbrand
  2019-04-09 10:01 ` [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory() David Hildenbrand
  2019-04-09 10:01 ` [PATCH v1 2/4] mm/memory_hotplug: Make unregister_memory_section() never fail David Hildenbrand
@ 2019-04-09 10:01 ` David Hildenbrand
  2019-04-17 13:56   ` Oscar Salvador
  2019-04-09 10:01 ` [PATCH v1 4/4] mm/memory_hotplug: Make __remove_pages() and arch_remove_memory() " David Hildenbrand
  3 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2019-04-09 10:01 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang, Arun KS,
	Mathieu Malaterre

Let's just warn in case a section is not valid instead of failing to
remove somewhere in the middle of the process, returning an error that
will be mostly ignored by callers.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b0cb05748f99..17a60281c36f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -517,15 +517,15 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn)
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
 }
 
-static int __remove_section(struct zone *zone, struct mem_section *ms,
-		unsigned long map_offset, struct vmem_altmap *altmap)
+static void __remove_section(struct zone *zone, struct mem_section *ms,
+			     unsigned long map_offset,
+			     struct vmem_altmap *altmap)
 {
 	unsigned long start_pfn;
 	int scn_nr;
-	int ret = -EINVAL;
 
-	if (!valid_section(ms))
-		return ret;
+	if (WARN_ON_ONCE(!valid_section(ms)))
+		return;
 
 	unregister_memory_section(ms);
 
@@ -534,7 +534,6 @@ static int __remove_section(struct zone *zone, struct mem_section *ms,
 	__remove_zone(zone, start_pfn);
 
 	sparse_remove_one_section(zone, ms, map_offset, altmap);
-	return 0;
 }
 
 /**
@@ -554,7 +553,7 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 {
 	unsigned long i;
 	unsigned long map_offset = 0;
-	int sections_to_remove, ret = 0;
+	int sections_to_remove;
 
 	/* In the ZONE_DEVICE case device driver owns the memory region */
 	if (is_dev_zone(zone)) {
@@ -575,16 +574,13 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 		unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
 
 		cond_resched();
-		ret = __remove_section(zone, __pfn_to_section(pfn), map_offset,
-				altmap);
+		__remove_section(zone, __pfn_to_section(pfn), map_offset,
+				 altmap);
 		map_offset = 0;
-		if (ret)
-			break;
 	}
 
 	set_zone_contiguous(zone);
-
-	return ret;
+	return 0;
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 4/4] mm/memory_hotplug: Make __remove_pages() and arch_remove_memory() never fail
  2019-04-09 10:01 [PATCH v1 0/4] mm/memory_hotplug: Better error handling when removing memory David Hildenbrand
                   ` (2 preceding siblings ...)
  2019-04-09 10:01 ` [PATCH v1 3/4] mm/memory_hotplug: Make __remove_section() " David Hildenbrand
@ 2019-04-09 10:01 ` David Hildenbrand
  3 siblings, 0 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-04-09 10:01 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, David Hildenbrand, Tony Luck, Fenghua Yu,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Martin Schwidefsky, Heiko Carstens, Yoshinori Sato, Rich Felker,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, x86, Andrew Morton,
	Michal Hocko, Mike Rapoport, Oscar Salvador, Kirill A. Shutemov,
	Christophe Leroy, Stefan Agner, Nicholas Piggin, Pavel Tatashin,
	Vasily Gorbik, Arun KS, Geert Uytterhoeven, Masahiro Yamada,
	Rob Herring, Joonsoo Kim, Wei Yang, Qian Cai, Mathieu Malaterre,
	linux-ia64, linuxppc-dev, linux-sh, linux-s390

All callers of arch_remove_memory() ignore errors. And we should really
try to remove any errors from the memory removal path.
No more errors are reported from __remove_pages(). BUG() in s390x code
in case arch_remove_memory() is triggered. We may implement that properly
later. WARN in case powerpc code failed to remove the section mapping,
which is better than ignoring the error completely right now.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/ia64/mm/init.c            | 11 +++--------
 arch/powerpc/mm/mem.c          | 11 ++++-------
 arch/s390/mm/init.c            |  5 +++--
 arch/sh/mm/init.c              | 11 +++--------
 arch/x86/mm/init_32.c          |  5 +++--
 arch/x86/mm/init_64.c          | 10 +++-------
 include/linux/memory_hotplug.h |  8 ++++----
 mm/memory_hotplug.c            |  5 ++---
 8 files changed, 25 insertions(+), 41 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 379eb1f9adc9..d28e29103bdb 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -682,20 +682,15 @@ int arch_add_memory(int nid, u64 start, u64 size,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(int nid, u64 start, u64 size,
+			struct vmem_altmap *altmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct zone *zone;
-	int ret;
 
 	zone = page_zone(pfn_to_page(start_pfn));
-	ret = __remove_pages(zone, start_pfn, nr_pages, altmap);
-	if (ret)
-		pr_warn("%s: Problem encountered in __remove_pages() as"
-			" ret=%d\n", __func__,  ret);
-
-	return ret;
+	__remove_pages(zone, start_pfn, nr_pages, altmap);
 }
 #endif
 #endif
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 76deaa8525db..cc9425fb9056 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -131,8 +131,8 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int __meminit arch_remove_memory(int nid, u64 start, u64 size,
-					struct vmem_altmap *altmap)
+void __meminit arch_remove_memory(int nid, u64 start, u64 size,
+				  struct vmem_altmap *altmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -147,14 +147,13 @@ int __meminit arch_remove_memory(int nid, u64 start, u64 size,
 	if (altmap)
 		page += vmem_altmap_offset(altmap);
 
-	ret = __remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
-	if (ret)
-		return ret;
+	__remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
 
 	/* Remove htab bolted mappings for this section of memory */
 	start = (unsigned long)__va(start);
 	flush_inval_dcache_range(start, start + size);
 	ret = remove_section_mapping(start, start + size);
+	WARN_ON_ONCE(ret);
 
 	/* Ensure all vmalloc mappings are flushed in case they also
 	 * hit that section of memory
@@ -162,8 +161,6 @@ int __meminit arch_remove_memory(int nid, u64 start, u64 size,
 	vm_unmap_aliases();
 
 	resize_hpt_for_hotplug(memblock_phys_mem_size());
-
-	return ret;
 }
 #endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index f5db961ad792..31b1071315d7 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -234,14 +234,15 @@ int arch_add_memory(int nid, u64 start, u64 size,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(int nid, u64 start, u64 size,
+			struct vmem_altmap *altmap)
 {
 	/*
 	 * There is no hardware or firmware interface which could trigger a
 	 * hot memory remove on s390. So there is nothing that needs to be
 	 * implemented.
 	 */
-	return -EBUSY;
+	BUG();
 }
 #endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 168d3a6b9358..5aeb4d7099a1 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -429,20 +429,15 @@ EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 #endif
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(int nid, u64 start, u64 size,
+			struct vmem_altmap *altmap)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct zone *zone;
-	int ret;
 
 	zone = page_zone(pfn_to_page(start_pfn));
-	ret = __remove_pages(zone, start_pfn, nr_pages, altmap);
-	if (unlikely(ret))
-		pr_warn("%s: Failed, __remove_pages() == %d\n", __func__,
-			ret);
-
-	return ret;
+	__remove_pages(zone, start_pfn, nr_pages, altmap);
 }
 #endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 755dbed85531..075e568098f2 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -860,14 +860,15 @@ int arch_add_memory(int nid, u64 start, u64 size,
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(int nid, u64 start, u64 size,
+			struct vmem_altmap *altmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct zone *zone;
 
 	zone = page_zone(pfn_to_page(start_pfn));
-	return __remove_pages(zone, start_pfn, nr_pages, altmap);
+	__remove_pages(zone, start_pfn, nr_pages, altmap);
 }
 #endif
 #endif
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index db42c11b48fb..20d14254b686 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1141,24 +1141,20 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 	remove_pagetable(start, end, true, NULL);
 }
 
-int __ref arch_remove_memory(int nid, u64 start, u64 size,
-				struct vmem_altmap *altmap)
+void __ref arch_remove_memory(int nid, u64 start, u64 size,
+			      struct vmem_altmap *altmap)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	struct page *page = pfn_to_page(start_pfn);
 	struct zone *zone;
-	int ret;
 
 	/* With altmap the first mapped page is offset from @start */
 	if (altmap)
 		page += vmem_altmap_offset(altmap);
 	zone = page_zone(page);
-	ret = __remove_pages(zone, start_pfn, nr_pages, altmap);
-	WARN_ON_ONCE(ret);
+	__remove_pages(zone, start_pfn, nr_pages, altmap);
 	kernel_physical_mapping_remove(start, start + size);
-
-	return ret;
 }
 #endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index d0a145ffa4fe..9d0efac902ec 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -112,10 +112,10 @@ static inline bool movable_node_is_enabled(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-extern int arch_remove_memory(int nid, u64 start, u64 size,
-				struct vmem_altmap *altmap);
-extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
-	unsigned long nr_pages, struct vmem_altmap *altmap);
+extern void arch_remove_memory(int nid, u64 start, u64 size,
+			       struct vmem_altmap *altmap);
+extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
+			   unsigned long nr_pages, struct vmem_altmap *altmap);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 17a60281c36f..52fef4a81e4c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -548,8 +548,8 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
  * sure that pages are marked reserved and zones are adjust properly by
  * calling offline_pages().
  */
-int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
-		 unsigned long nr_pages, struct vmem_altmap *altmap)
+void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
+		    unsigned long nr_pages, struct vmem_altmap *altmap)
 {
 	unsigned long i;
 	unsigned long map_offset = 0;
@@ -580,7 +580,6 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 	}
 
 	set_zone_contiguous(zone);
-	return 0;
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()
  2019-04-09 10:01 ` [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory() David Hildenbrand
@ 2019-04-09 22:41   ` Andrew Morton
  2019-04-10  8:07     ` David Hildenbrand
  2019-04-17 11:52   ` Oscar Salvador
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2019-04-09 22:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, Oscar Salvador, Michal Hocko,
	Pavel Tatashin, Wei Yang, Qian Cai, Arun KS, Mathieu Malaterre

On Tue,  9 Apr 2019 12:01:45 +0200 David Hildenbrand <david@redhat.com> wrote:

> __add_pages() doesn't add the memory resource, so __remove_pages()
> shouldn't remove it. Let's factor it out. Especially as it is a special
> case for memory used as system memory, added via add_memory() and
> friends.
> 
> We now remove the resource after removing the sections instead of doing
> it the other way around. I don't think this change is problematic.
> 
> add_memory()
> 	register memory resource
> 	arch_add_memory()
> 
> remove_memory
> 	arch_remove_memory()
> 	release memory resource
> 
> While at it, explain why we ignore errors and that it only happeny if
> we remove memory in a different granularity as we added it.

Seems sane.

> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1820,6 +1806,25 @@ void try_offline_node(int nid)
>  }
>  EXPORT_SYMBOL(try_offline_node);
>  
> +static void __release_memory_resource(u64 start, u64 size)
> +{
> +	int ret;
> +
> +	/*
> +	 * When removing memory in the same granularity as it was added,
> +	 * this function never fails. It might only fail if resources
> +	 * have to be adjusted or split. We'll ignore the error, as
> +	 * removing of memory cannot fail.
> +	 */
> +	ret = release_mem_region_adjustable(&iomem_resource, start, size);
> +	if (ret) {
> +		resource_size_t endres = start + size - 1;
> +
> +		pr_warn("Unable to release resource <%pa-%pa> (%d)\n",
> +			&start, &endres, ret);
> +	}
> +}

The types seem confused here.  Should `start' and `size' be
resource_size_t?  Or maybe phys_addr_t.

release_mem_region_adjustable() takes resource_size_t's.

Is %pa the way to print a resource_size_t?  I guess it happens to work
because resource_size_t happens to map onto phys_addr_t, which isn't
ideal.

Wanna have a headscratch over that?

>  /**
>   * remove_memory
>   * @nid: the node ID
> @@ -1854,6 +1859,7 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
>  	memblock_remove(start, size);
>  
>  	arch_remove_memory(nid, start, size, NULL);
> +	__release_memory_resource(start, size);
>  
>  	try_offline_node(nid);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()
  2019-04-09 22:41   ` Andrew Morton
@ 2019-04-10  8:07     ` David Hildenbrand
  2019-04-17  3:37       ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2019-04-10  8:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Oscar Salvador, Michal Hocko,
	Pavel Tatashin, Wei Yang, Qian Cai, Arun KS, Mathieu Malaterre

On 10.04.19 00:41, Andrew Morton wrote:
> On Tue,  9 Apr 2019 12:01:45 +0200 David Hildenbrand <david@redhat.com> wrote:
> 
>> __add_pages() doesn't add the memory resource, so __remove_pages()
>> shouldn't remove it. Let's factor it out. Especially as it is a special
>> case for memory used as system memory, added via add_memory() and
>> friends.
>>
>> We now remove the resource after removing the sections instead of doing
>> it the other way around. I don't think this change is problematic.
>>
>> add_memory()
>> 	register memory resource
>> 	arch_add_memory()
>>
>> remove_memory
>> 	arch_remove_memory()
>> 	release memory resource
>>
>> While at it, explain why we ignore errors and that it only happeny if
>> we remove memory in a different granularity as we added it.
> 
> Seems sane.
> 
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1820,6 +1806,25 @@ void try_offline_node(int nid)
>>  }
>>  EXPORT_SYMBOL(try_offline_node);
>>  
>> +static void __release_memory_resource(u64 start, u64 size)
>> +{
>> +	int ret;
>> +
>> +	/*
>> +	 * When removing memory in the same granularity as it was added,
>> +	 * this function never fails. It might only fail if resources
>> +	 * have to be adjusted or split. We'll ignore the error, as
>> +	 * removing of memory cannot fail.
>> +	 */
>> +	ret = release_mem_region_adjustable(&iomem_resource, start, size);
>> +	if (ret) {
>> +		resource_size_t endres = start + size - 1;
>> +
>> +		pr_warn("Unable to release resource <%pa-%pa> (%d)\n",
>> +			&start, &endres, ret);
>> +	}
>> +}
> 
> The types seem confused here.  Should `start' and `size' be
> resource_size_t?  Or maybe phys_addr_t.

Hmm, right now it has the same prototype as register_memory_resource. I
guess using resource_size_t is the right thing to do.

> 
> release_mem_region_adjustable() takes resource_size_t's.
> 
> Is %pa the way to print a resource_size_t?  I guess it happens to work
> because resource_size_t happens to map onto phys_addr_t, which isn't
> ideal.

Documentation/core-api/printk-formats.rst

"
	%pa[p]	0x01234567 or 0x0123456789abcdef

For printing a phys_addr_t type (and its derivatives, such as
resource_size_t) ...
"


Care to fixup both u64 to resource_size_t? Or should I send a patch?
Whatever you prefer.

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()
  2019-04-10  8:07     ` David Hildenbrand
@ 2019-04-17  3:37       ` Andrew Morton
  2019-04-17  7:44         ` David Hildenbrand
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2019-04-17  3:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, Oscar Salvador, Michal Hocko,
	Pavel Tatashin, Wei Yang, Qian Cai, Arun KS, Mathieu Malaterre

On Wed, 10 Apr 2019 10:07:24 +0200 David Hildenbrand <david@redhat.com> wrote:

> Care to fixup both u64 to resource_size_t? Or should I send a patch?
> Whatever you prefer.

Please send along a fixup.

This patch series has no evidence of having been reviewed :(.  Can you
suggest who could help us out here?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()
  2019-04-17  3:37       ` Andrew Morton
@ 2019-04-17  7:44         ` David Hildenbrand
  0 siblings, 0 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-04-17  7:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Oscar Salvador, Michal Hocko,
	Pavel Tatashin, Wei Yang, Qian Cai, Arun KS, Mathieu Malaterre

On 17.04.19 05:37, Andrew Morton wrote:
> On Wed, 10 Apr 2019 10:07:24 +0200 David Hildenbrand <david@redhat.com> wrote:
> 
>> Care to fixup both u64 to resource_size_t? Or should I send a patch?
>> Whatever you prefer.
> 
> Please send along a fixup.

Will do!

> 
> This patch series has no evidence of having been reviewed :(.  Can you
> suggest who could help us out here?

Usually I would say Oscar and Michal, but they seem to be fairly busy. I
guess only the first patch is a real change, the other three patches are
mostly function prototype changes, handling errors in a nicer way.

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()
  2019-04-09 10:01 ` [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory() David Hildenbrand
  2019-04-09 22:41   ` Andrew Morton
@ 2019-04-17 11:52   ` Oscar Salvador
  2019-04-17 12:02   ` [PATCH] mm/memory_hotplug: Fixup "Release memory resource after arch_remove_memory()" David Hildenbrand
  2019-04-17 13:12   ` [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory() Michal Hocko
  3 siblings, 0 replies; 19+ messages in thread
From: Oscar Salvador @ 2019-04-17 11:52 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Hocko, Pavel Tatashin,
	Wei Yang, Qian Cai, Arun KS, Mathieu Malaterre

On Tue, 2019-04-09 at 12:01 +0200, David Hildenbrand wrote:
> __add_pages() doesn't add the memory resource, so __remove_pages()
> shouldn't remove it. Let's factor it out. Especially as it is a
> special
> case for memory used as system memory, added via add_memory() and
> friends.

I would call the special case the other way, aka: zone_device hooking
into hotplug path.

> 
> We now remove the resource after removing the sections instead of
> doing
> it the other way around. I don't think this change is problematic.
> 
> add_memory()
> 	register memory resource
> 	arch_add_memory()
> 
> remove_memory
> 	arch_remove_memory()
> 	release memory resource
> 
> While at it, explain why we ignore errors and that it only happeny if
> we remove memory in a different granularity as we added it.

In the future we may want to allow drivers to hook directly into
arch_add_memory()/arch_remove_memory(), and this will lead to different
granularity in hot_add/hot_remove operations. 

At least that was one of the conclusions I drew from the last vmemmap-
patchset.
So, we will have to see how we can handle those kind of errors.

> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Besides what Andrew pointed out about the types of start,size, I do not
see anything wrong:

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> ---
>  mm/memory_hotplug.c | 34 ++++++++++++++++++++--------------
>  1 file changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 4970ff658055..696ed7ee5e28 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -562,20 +562,6 @@ int __remove_pages(struct zone *zone, unsigned
> long phys_start_pfn,
>  	if (is_dev_zone(zone)) {
>  		if (altmap)
>  			map_offset = vmem_altmap_offset(altmap);
> -	} else {
> -		resource_size_t start, size;
> -
> -		start = phys_start_pfn << PAGE_SHIFT;
> -		size = nr_pages * PAGE_SIZE;
> -
> -		ret = release_mem_region_adjustable(&iomem_resource,
> start,
> -					size);
> -		if (ret) {
> -			resource_size_t endres = start + size - 1;
> -
> -			pr_warn("Unable to release resource <%pa-
> %pa> (%d)\n",
> -					&start, &endres, ret);
> -		}
>  	}
>  
>  	clear_zone_contiguous(zone);
> @@ -1820,6 +1806,25 @@ void try_offline_node(int nid)
>  }
>  EXPORT_SYMBOL(try_offline_node);
>  
> +static void __release_memory_resource(u64 start, u64 size)
> +{
> +	int ret;
> +
> +	/*
> +	 * When removing memory in the same granularity as it was
> added,
> +	 * this function never fails. It might only fail if
> resources
> +	 * have to be adjusted or split. We'll ignore the error, as
> +	 * removing of memory cannot fail.
> +	 */
> +	ret = release_mem_region_adjustable(&iomem_resource, start,
> size);
> +	if (ret) {
> +		resource_size_t endres = start + size - 1;
> +
> +		pr_warn("Unable to release resource <%pa-%pa>
> (%d)\n",
> +			&start, &endres, ret);
> +	}
> +}
> +
>  /**
>   * remove_memory
>   * @nid: the node ID
> @@ -1854,6 +1859,7 @@ void __ref __remove_memory(int nid, u64 start,
> u64 size)
>  	memblock_remove(start, size);
>  
>  	arch_remove_memory(nid, start, size, NULL);
> +	__release_memory_resource(start, size);
>  
>  	try_offline_node(nid);
>  
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] mm/memory_hotplug: Fixup "Release memory resource after arch_remove_memory()"
  2019-04-09 10:01 ` [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory() David Hildenbrand
  2019-04-09 22:41   ` Andrew Morton
  2019-04-17 11:52   ` Oscar Salvador
@ 2019-04-17 12:02   ` David Hildenbrand
  2019-04-17 13:12   ` [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory() Michal Hocko
  3 siblings, 0 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-04-17 12:02 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, David Hildenbrand, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Wei Yang, Qian Cai, Arun KS,
	Mathieu Malaterre

pr_warn with "%pa" expects a pointer to phys_addr_t or resource_size_t.
So let's propery use resource_size_t for the start address and size,
we are dealing with resources after all.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 093f6dc66f46..328878b6799d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1800,7 +1800,8 @@ void try_offline_node(int nid)
 }
 EXPORT_SYMBOL(try_offline_node);
 
-static void __release_memory_resource(u64 start, u64 size)
+static void __release_memory_resource(resource_size_t start,
+				      resource_size_t size)
 {
 	int ret;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/4] mm/memory_hotplug: Make unregister_memory_section() never fail
  2019-04-09 10:01 ` [PATCH v1 2/4] mm/memory_hotplug: Make unregister_memory_section() never fail David Hildenbrand
@ 2019-04-17 12:45   ` Oscar Salvador
  2019-04-17 13:10     ` David Hildenbrand
  0 siblings, 1 reply; 19+ messages in thread
From: Oscar Salvador @ 2019-04-17 12:45 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Greg Kroah-Hartman, Rafael J. Wysocki, Ingo Molnar,
	Andrew Banman, mike.travis, Andrew Morton, Michal Hocko,
	Pavel Tatashin, Qian Cai, Wei Yang, Arun KS, Mathieu Malaterre

On Tue, 2019-04-09 at 12:01 +0200, David Hildenbrand wrote:
> Failing while removing memory is mostly ignored and cannot really be
> handled. Let's treat errors in unregister_memory_section() in a nice
> way, warning, but continuing.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/memory.c  | 16 +++++-----------
>  include/linux/memory.h |  2 +-
>  mm/memory_hotplug.c    |  4 +---
>  3 files changed, 7 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 0c9e22ffa47a..f180427e48f4 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -734,15 +734,18 @@ unregister_memory(struct memory_block *memory)
>  {
>  	BUG_ON(memory->dev.bus != &memory_subsys);
>  
> -	/* drop the ref. we got in remove_memory_section() */
> +	/* drop the ref. we got via find_memory_block() */
>  	put_device(&memory->dev);
>  	device_unregister(&memory->dev);
>  }
>  
> -static int remove_memory_section(struct mem_section *section)
> +void unregister_memory_section(struct mem_section *section)
>  {
>  	struct memory_block *mem;
>  
> +	if (WARN_ON_ONCE(!present_section(section)))
> +		return;
> +
>  	mutex_lock(&mem_sysfs_mutex);
>  
>  	/*
> @@ -763,15 +766,6 @@ static int remove_memory_section(struct
> mem_section *section)
>  
>  out_unlock:
>  	mutex_unlock(&mem_sysfs_mutex);
> -	return 0;
> -}
> -
> -int unregister_memory_section(struct mem_section *section)
> -{
> -	if (!present_section(section))
> -		return -EINVAL;
> -
> -	return remove_memory_section(section);
>  }
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index a6ddefc60517..e1dc1bb2b787 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -113,7 +113,7 @@ extern int
> register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block
> *nb);
>  int hotplug_memory_register(int nid, struct mem_section *section);
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> -extern int unregister_memory_section(struct mem_section *);
> +extern void unregister_memory_section(struct mem_section *);
>  #endif
>  extern int memory_dev_init(void);
>  extern int memory_notify(unsigned long val, void *v);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 696ed7ee5e28..b0cb05748f99 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -527,9 +527,7 @@ static int __remove_section(struct zone *zone,
> struct mem_section *ms,
>  	if (!valid_section(ms))
>  		return ret;
>  
> -	ret = unregister_memory_section(ms);
> -	if (ret)
> -		return ret;
> +	unregister_memory_section(ms);

So, technically unregister_memory_section() can __only__ fail in case
the section is not present, returning -EINVAL.

Now, I was checking how the pair valid/present sections work.
Unless I am mistaken, we only mark sections as memmap those sections
that are present.

This can come from two paths:

- Early boot:
  * memblocks_present
     memory_present           - mark sections as present
    sparse_init               - iterates only over present sections
     sparse_init_nid
      sparse_init_one_section - mark section as valid

- Hotplug path:
  * sparse_add_one_section
     section_mark_present     - mark section as present
     sparse_init_one_section  - mark section as valid
   

During early boot, sparse_init iterates __only__ over present sections,
so only those are marked valid as well, and during hotplug, the section
is both marked present and valid.

All in all, I think that we are safe if we remove the present_section
check in your new unregister_memory_section(), as a valid_section
cannot be non-present, and we do already catch those early in
__remove_section().

Then, the only thing missing to be completely error-less in that
codepath is to make unregister_mem_sect_under_nodes() void return-type.
Not that it matters a lot as we are already ignoring its return code,
but I find that quite disturbing and wrong.

So, would you like to take this patch in your patchset in case you re-
submit?

From: Oscar Salvador <osalvador@suse.de>
Date: Wed, 17 Apr 2019 14:41:32 +0200
Subject: [PATCH] mm,memory_hotplug: Refactor
unregister_mem_sect_under_nodes

Currently, the return code of unregister_mem_sect_under_nodes gets
ignored.
It can only fail in case we are not able to allocate a nodemask_t,
but such allocation is too small, so it is not really clear
we can actually fail.

The maximum a nodemask_t can be is 128 bytes, so we can make the whole
thing more simple if we simply allocate a nodemask_t within the stack.

With this change, we can convert unregister_mem_sect_under_nodes to
be void-return type.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 drivers/base/node.c  | 16 ++++------------
 include/linux/node.h |  5 ++---
 2 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 8598fcbd2a17..fcd0f442e73d 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -802,19 +802,13 @@ int register_mem_sect_under_node(struct
memory_block *mem_blk, void *arg)
 }
 
 /* unregister memory section under all nodes that it spans */
-int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
+void unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
 				    unsigned long phys_index)
 {
-	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
+	nodemask_t unlinked_nodes;
 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
 
-	if (!mem_blk) {
-		NODEMASK_FREE(unlinked_nodes);
-		return -EFAULT;
-	}
-	if (!unlinked_nodes)
-		return -ENOMEM;
-	nodes_clear(*unlinked_nodes);
+	nodes_clear(unlinked_nodes);
 
 	sect_start_pfn = section_nr_to_pfn(phys_index);
 	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
@@ -826,15 +820,13 @@ int unregister_mem_sect_under_nodes(struct
memory_block *mem_blk,
 			continue;
 		if (!node_online(nid))
 			continue;
-		if (node_test_and_set(nid, *unlinked_nodes))
+		if (node_test_and_set(nid, unlinked_nodes))
 			continue;
 		sysfs_remove_link(&node_devices[nid]->dev.kobj,
 			 kobject_name(&mem_blk->dev.kobj));
 		sysfs_remove_link(&mem_blk->dev.kobj,
 			 kobject_name(&node_devices[nid]->dev.kobj));
 	}
-	NODEMASK_FREE(unlinked_nodes);
-	return 0;
 }
 
 int link_mem_sections(int nid, unsigned long start_pfn, unsigned long
end_pfn)
diff --git a/include/linux/node.h b/include/linux/node.h
index 1a557c589ecb..094ec9922bf5 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -139,7 +139,7 @@ extern int register_cpu_under_node(unsigned int
cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int
nid);
 extern int register_mem_sect_under_node(struct memory_block *mem_blk,
 						void *arg);
-extern int unregister_mem_sect_under_nodes(struct memory_block
*mem_blk,
+extern void unregister_mem_sect_under_nodes(struct memory_block
*mem_blk,
 					   unsigned long phys_index);
 
 extern int register_memory_node_under_compute_node(unsigned int
mem_nid,
@@ -176,10 +176,9 @@ static inline int
register_mem_sect_under_node(struct memory_block *mem_blk,
 {
 	return 0;
 }
-static inline int unregister_mem_sect_under_nodes(struct memory_block
*mem_blk,
+static inline void unregister_mem_sect_under_nodes(struct memory_block
*mem_blk,
 						  unsigned long
phys_index)
 {
-	return 0;
 }
 
 static inline void
register_hugetlbfs_with_node(node_registration_func_t reg,
-- 
2.12.3


-- 
Oscar Salvador
SUSE L3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/4] mm/memory_hotplug: Make unregister_memory_section() never fail
  2019-04-17 12:45   ` Oscar Salvador
@ 2019-04-17 13:10     ` David Hildenbrand
  0 siblings, 0 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-04-17 13:10 UTC (permalink / raw)
  To: Oscar Salvador, linux-mm
  Cc: linux-kernel, Greg Kroah-Hartman, Rafael J. Wysocki, Ingo Molnar,
	Andrew Banman, mike.travis, Andrew Morton, Michal Hocko,
	Pavel Tatashin, Qian Cai, Wei Yang, Arun KS, Mathieu Malaterre

On 17.04.19 14:45, Oscar Salvador wrote:
> On Tue, 2019-04-09 at 12:01 +0200, David Hildenbrand wrote:
>> Failing while removing memory is mostly ignored and cannot really be
>> handled. Let's treat errors in unregister_memory_section() in a nice
>> way, warning, but continuing.
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Qian Cai <cai@lca.pw>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Arun KS <arunks@codeaurora.org>
>> Cc: Mathieu Malaterre <malat@debian.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  drivers/base/memory.c  | 16 +++++-----------
>>  include/linux/memory.h |  2 +-
>>  mm/memory_hotplug.c    |  4 +---
>>  3 files changed, 7 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 0c9e22ffa47a..f180427e48f4 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -734,15 +734,18 @@ unregister_memory(struct memory_block *memory)
>>  {
>>  	BUG_ON(memory->dev.bus != &memory_subsys);
>>  
>> -	/* drop the ref. we got in remove_memory_section() */
>> +	/* drop the ref. we got via find_memory_block() */
>>  	put_device(&memory->dev);
>>  	device_unregister(&memory->dev);
>>  }
>>  
>> -static int remove_memory_section(struct mem_section *section)
>> +void unregister_memory_section(struct mem_section *section)
>>  {
>>  	struct memory_block *mem;
>>  
>> +	if (WARN_ON_ONCE(!present_section(section)))
>> +		return;
>> +
>>  	mutex_lock(&mem_sysfs_mutex);
>>  
>>  	/*
>> @@ -763,15 +766,6 @@ static int remove_memory_section(struct
>> mem_section *section)
>>  
>>  out_unlock:
>>  	mutex_unlock(&mem_sysfs_mutex);
>> -	return 0;
>> -}
>> -
>> -int unregister_memory_section(struct mem_section *section)
>> -{
>> -	if (!present_section(section))
>> -		return -EINVAL;
>> -
>> -	return remove_memory_section(section);
>>  }
>>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>>  
>> diff --git a/include/linux/memory.h b/include/linux/memory.h
>> index a6ddefc60517..e1dc1bb2b787 100644
>> --- a/include/linux/memory.h
>> +++ b/include/linux/memory.h
>> @@ -113,7 +113,7 @@ extern int
>> register_memory_isolate_notifier(struct notifier_block *nb);
>>  extern void unregister_memory_isolate_notifier(struct notifier_block
>> *nb);
>>  int hotplug_memory_register(int nid, struct mem_section *section);
>>  #ifdef CONFIG_MEMORY_HOTREMOVE
>> -extern int unregister_memory_section(struct mem_section *);
>> +extern void unregister_memory_section(struct mem_section *);
>>  #endif
>>  extern int memory_dev_init(void);
>>  extern int memory_notify(unsigned long val, void *v);
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 696ed7ee5e28..b0cb05748f99 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -527,9 +527,7 @@ static int __remove_section(struct zone *zone,
>> struct mem_section *ms,
>>  	if (!valid_section(ms))
>>  		return ret;
>>  
>> -	ret = unregister_memory_section(ms);
>> -	if (ret)
>> -		return ret;
>> +	unregister_memory_section(ms);
> 
> So, technically unregister_memory_section() can __only__ fail in case
> the section is not present, returning -EINVAL.
> 
> Now, I was checking how the pair valid/present sections work.
> Unless I am mistaken, we only mark sections as memmap those sections
> that are present.
> 
> This can come from two paths:
> 
> - Early boot:
>   * memblocks_present
>      memory_present           - mark sections as present
>     sparse_init               - iterates only over present sections
>      sparse_init_nid
>       sparse_init_one_section - mark section as valid
> 
> - Hotplug path:
>   * sparse_add_one_section
>      section_mark_present     - mark section as present
>      sparse_init_one_section  - mark section as valid
>    
> 
> During early boot, sparse_init iterates __only__ over present sections,
> so only those are marked valid as well, and during hotplug, the section
> is both marked present and valid.
> 
> All in all, I think that we are safe if we remove the present_section
> check in your new unregister_memory_section(), as a valid_section
> cannot be non-present, and we do already catch those early in
> __remove_section().

Yes, dropping the check completely would be the next step.

> 
> Then, the only thing missing to be completely error-less in that
> codepath is to make unregister_mem_sect_under_nodes() void return-type.
> Not that it matters a lot as we are already ignoring its return code,
> but I find that quite disturbing and wrong.
> 
> So, would you like to take this patch in your patchset in case you re-
> submit?

I already had a look at that approach but would like to defer it, until
I eventually factor out creation/deletion of memory devices (see my
other RFC, still some work has to be done).

Reasons:

1. We only support unplug of system ram memory belonging to exactly one
node. No need to iterate over all nodes. mem->node should later tell us
exactly what needs to be removed. No need to iterate at all.

2. unregister_mem_sect_under_nodes() should be replaced by
"unregister_mem_block_under_nodes()". We only support unplug of whole
memory blocks. Ripping out random sections from a memory block is not
going to work and is not supported.

IOW, this is some leftover when people turned memory block devices to
span multiple sections. Once we clean that up, this problem goes away.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()
  2019-04-09 10:01 ` [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory() David Hildenbrand
                     ` (2 preceding siblings ...)
  2019-04-17 12:02   ` [PATCH] mm/memory_hotplug: Fixup "Release memory resource after arch_remove_memory()" David Hildenbrand
@ 2019-04-17 13:12   ` Michal Hocko
  2019-04-17 13:24     ` David Hildenbrand
  3 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-04-17 13:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, Andrew Morton, Oscar Salvador,
	Pavel Tatashin, Wei Yang, Qian Cai, Arun KS, Mathieu Malaterre

On Tue 09-04-19 12:01:45, David Hildenbrand wrote:
> __add_pages() doesn't add the memory resource, so __remove_pages()
> shouldn't remove it. Let's factor it out. Especially as it is a special
> case for memory used as system memory, added via add_memory() and
> friends.
> 
> We now remove the resource after removing the sections instead of doing
> it the other way around. I don't think this change is problematic.
> 
> add_memory()
> 	register memory resource
> 	arch_add_memory()
> 
> remove_memory
> 	arch_remove_memory()
> 	release memory resource
> 
> While at it, explain why we ignore errors and that it only happeny if
> we remove memory in a different granularity as we added it.

OK, I agree that the symmetry is good in general and it certainly makes
sense here as well. But does it make sense to pick up this particular
part without larger considerations of add vs. remove apis? I have a
strong feeling this wouldn't be the only thing to care about. In other
words does this help future changes or it is more likely to cause more
code conflicts with other features being developed right now?

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memory_hotplug.c | 34 ++++++++++++++++++++--------------
>  1 file changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 4970ff658055..696ed7ee5e28 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -562,20 +562,6 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>  	if (is_dev_zone(zone)) {
>  		if (altmap)
>  			map_offset = vmem_altmap_offset(altmap);
> -	} else {
> -		resource_size_t start, size;
> -
> -		start = phys_start_pfn << PAGE_SHIFT;
> -		size = nr_pages * PAGE_SIZE;
> -
> -		ret = release_mem_region_adjustable(&iomem_resource, start,
> -					size);
> -		if (ret) {
> -			resource_size_t endres = start + size - 1;
> -
> -			pr_warn("Unable to release resource <%pa-%pa> (%d)\n",
> -					&start, &endres, ret);
> -		}
>  	}
>  
>  	clear_zone_contiguous(zone);
> @@ -1820,6 +1806,25 @@ void try_offline_node(int nid)
>  }
>  EXPORT_SYMBOL(try_offline_node);
>  
> +static void __release_memory_resource(u64 start, u64 size)
> +{
> +	int ret;
> +
> +	/*
> +	 * When removing memory in the same granularity as it was added,
> +	 * this function never fails. It might only fail if resources
> +	 * have to be adjusted or split. We'll ignore the error, as
> +	 * removing of memory cannot fail.
> +	 */
> +	ret = release_mem_region_adjustable(&iomem_resource, start, size);
> +	if (ret) {
> +		resource_size_t endres = start + size - 1;
> +
> +		pr_warn("Unable to release resource <%pa-%pa> (%d)\n",
> +			&start, &endres, ret);
> +	}
> +}
> +
>  /**
>   * remove_memory
>   * @nid: the node ID
> @@ -1854,6 +1859,7 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
>  	memblock_remove(start, size);
>  
>  	arch_remove_memory(nid, start, size, NULL);
> +	__release_memory_resource(start, size);
>  
>  	try_offline_node(nid);
>  
> -- 
> 2.17.2
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()
  2019-04-17 13:12   ` [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory() Michal Hocko
@ 2019-04-17 13:24     ` David Hildenbrand
  2019-04-17 13:31       ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2019-04-17 13:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Andrew Morton, Oscar Salvador,
	Pavel Tatashin, Wei Yang, Qian Cai, Arun KS, Mathieu Malaterre

On 17.04.19 15:12, Michal Hocko wrote:
> On Tue 09-04-19 12:01:45, David Hildenbrand wrote:
>> __add_pages() doesn't add the memory resource, so __remove_pages()
>> shouldn't remove it. Let's factor it out. Especially as it is a special
>> case for memory used as system memory, added via add_memory() and
>> friends.
>>
>> We now remove the resource after removing the sections instead of doing
>> it the other way around. I don't think this change is problematic.
>>
>> add_memory()
>> 	register memory resource
>> 	arch_add_memory()
>>
>> remove_memory
>> 	arch_remove_memory()
>> 	release memory resource
>>
>> While at it, explain why we ignore errors and that it only happeny if
>> we remove memory in a different granularity as we added it.
> 
> OK, I agree that the symmetry is good in general and it certainly makes
> sense here as well. But does it make sense to pick up this particular
> part without larger considerations of add vs. remove apis? I have a
> strong feeling this wouldn't be the only thing to care about. In other
> words does this help future changes or it is more likely to cause more
> code conflicts with other features being developed right now?

I am planning to

1. factor out memory block device handling, so features like sub-section
add/remove are easier to add internally. Move it to the user that wants
it. Clean up all the mess we have right now due to supporting memory
block devices that span several sections.

2. Make sure that any arch_add_pages() and friends clean up properly if
they fail instead of indicating failure but leaving some partially added
memory lying around.

3. Clean up node handling regarding to memory hotplug/unplug. Especially
don't allow to offline/remove memory spanning several nodes etc.

IOW, in order to properly clean up memory block device handling and
prepare for more changes people are interested in (e.g. sub-section add
of device memory), this is the right thing to do. The other parts are
bigger changes.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()
  2019-04-17 13:24     ` David Hildenbrand
@ 2019-04-17 13:31       ` Michal Hocko
  2019-04-17 13:48         ` David Hildenbrand
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-04-17 13:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, Andrew Morton, Oscar Salvador,
	Pavel Tatashin, Wei Yang, Qian Cai, Arun KS, Mathieu Malaterre

On Wed 17-04-19 15:24:47, David Hildenbrand wrote:
> On 17.04.19 15:12, Michal Hocko wrote:
> > On Tue 09-04-19 12:01:45, David Hildenbrand wrote:
> >> __add_pages() doesn't add the memory resource, so __remove_pages()
> >> shouldn't remove it. Let's factor it out. Especially as it is a special
> >> case for memory used as system memory, added via add_memory() and
> >> friends.
> >>
> >> We now remove the resource after removing the sections instead of doing
> >> it the other way around. I don't think this change is problematic.
> >>
> >> add_memory()
> >> 	register memory resource
> >> 	arch_add_memory()
> >>
> >> remove_memory
> >> 	arch_remove_memory()
> >> 	release memory resource
> >>
> >> While at it, explain why we ignore errors and that it only happeny if
> >> we remove memory in a different granularity as we added it.
> > 
> > OK, I agree that the symmetry is good in general and it certainly makes
> > sense here as well. But does it make sense to pick up this particular
> > part without larger considerations of add vs. remove apis? I have a
> > strong feeling this wouldn't be the only thing to care about. In other
> > words does this help future changes or it is more likely to cause more
> > code conflicts with other features being developed right now?
> 
> I am planning to
> 
> 1. factor out memory block device handling, so features like sub-section
> add/remove are easier to add internally. Move it to the user that wants
> it. Clean up all the mess we have right now due to supporting memory
> block devices that span several sections.
> 
> 2. Make sure that any arch_add_pages() and friends clean up properly if
> they fail instead of indicating failure but leaving some partially added
> memory lying around.
> 
> 3. Clean up node handling regarding to memory hotplug/unplug. Especially
> don't allow to offline/remove memory spanning several nodes etc.

Yes, this all sounds sane to me.

> IOW, in order to properly clean up memory block device handling and
> prepare for more changes people are interested in (e.g. sub-section add
> of device memory), this is the right thing to do. The other parts are
> bigger changes.

This would be really valuable to have in the cover. Beause there is so
much to clean up in this mess but making random small cleanups without a
larger plan tends to step on others toes more than being useful.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory()
  2019-04-17 13:31       ` Michal Hocko
@ 2019-04-17 13:48         ` David Hildenbrand
  0 siblings, 0 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-04-17 13:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Andrew Morton, Oscar Salvador,
	Pavel Tatashin, Wei Yang, Qian Cai, Arun KS, Mathieu Malaterre

On 17.04.19 15:31, Michal Hocko wrote:
> On Wed 17-04-19 15:24:47, David Hildenbrand wrote:
>> On 17.04.19 15:12, Michal Hocko wrote:
>>> On Tue 09-04-19 12:01:45, David Hildenbrand wrote:
>>>> __add_pages() doesn't add the memory resource, so __remove_pages()
>>>> shouldn't remove it. Let's factor it out. Especially as it is a special
>>>> case for memory used as system memory, added via add_memory() and
>>>> friends.
>>>>
>>>> We now remove the resource after removing the sections instead of doing
>>>> it the other way around. I don't think this change is problematic.
>>>>
>>>> add_memory()
>>>> 	register memory resource
>>>> 	arch_add_memory()
>>>>
>>>> remove_memory
>>>> 	arch_remove_memory()
>>>> 	release memory resource
>>>>
>>>> While at it, explain why we ignore errors and that it only happeny if
>>>> we remove memory in a different granularity as we added it.
>>>
>>> OK, I agree that the symmetry is good in general and it certainly makes
>>> sense here as well. But does it make sense to pick up this particular
>>> part without larger considerations of add vs. remove apis? I have a
>>> strong feeling this wouldn't be the only thing to care about. In other
>>> words does this help future changes or it is more likely to cause more
>>> code conflicts with other features being developed right now?
>>
>> I am planning to
>>
>> 1. factor out memory block device handling, so features like sub-section
>> add/remove are easier to add internally. Move it to the user that wants
>> it. Clean up all the mess we have right now due to supporting memory
>> block devices that span several sections.
>>
>> 2. Make sure that any arch_add_pages() and friends clean up properly if
>> they fail instead of indicating failure but leaving some partially added
>> memory lying around.
>>
>> 3. Clean up node handling regarding to memory hotplug/unplug. Especially
>> don't allow to offline/remove memory spanning several nodes etc.
> 
> Yes, this all sounds sane to me.
> 
>> IOW, in order to properly clean up memory block device handling and
>> prepare for more changes people are interested in (e.g. sub-section add
>> of device memory), this is the right thing to do. The other parts are
>> bigger changes.
> 
> This would be really valuable to have in the cover. Beause there is so
> much to clean up in this mess but making random small cleanups without a
> larger plan tends to step on others toes more than being useful.

I agree, let's discuss the bigger plan I have in mind

1. arch_add_memory()/arch_remove_memory() don't deal with memory block
devices. add_memory()/remove_memory()/online_pages()/offline_pages() do.

2. add_memory()/remove_memory()/online_pages()/offline_pages()
- Only work on memory block device alignment/granularity
- Only work on single nodes.
- Only work on single zones.

3. mem->nid correctly indicates if
- Memory block devices belongs to single node / no node / multiple nodes
- Fast and reliable way to detect
remove_memory()/online_pages()/offline_pages() being called with
multiple nodes.

4. arch_remove_memory() and friends never fail. Removing of memory
always succeeds. This allows better error handling when adding of memory
fails. We will move some parts from CONFIG_MEMORY_HOTREMOVE to
CONFIG_MEMORY_HOTPLUG, so we can use them to clean up if adding of
memory fails.

5. Remove all accesses to struct_page from memory removal path. Pages
might never have been initialized, they should not be touched.

All other features I see on the horizon (vmemmap on added memory,
sub-section hot-add) would mainly be centered around
arch_add_memory()/arch_remove_memory(), which would not have to deal
with any special cases around memory block device memory.

Do you agree, do you have any other points/things in mind you consider
important?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/4] mm/memory_hotplug: Make __remove_section() never fail
  2019-04-09 10:01 ` [PATCH v1 3/4] mm/memory_hotplug: Make __remove_section() " David Hildenbrand
@ 2019-04-17 13:56   ` Oscar Salvador
  2019-04-24  6:54     ` David Hildenbrand
  0 siblings, 1 reply; 19+ messages in thread
From: Oscar Salvador @ 2019-04-17 13:56 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Hocko, Pavel Tatashin,
	Qian Cai, Wei Yang, Arun KS, Mathieu Malaterre

On Tue, 2019-04-09 at 12:01 +0200, David Hildenbrand wrote:
> Let's just warn in case a section is not valid instead of failing to
> remove somewhere in the middle of the process, returning an error
> that
> will be mostly ignored by callers.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Just a nit:

I think this could be combined with patch#2.
The only reason to fail in here is 1) !valid_section 2)
!present_section.
As I stated in patch#2, one cannot be without the other, so makes sense
to rip present_section check from unregister_mem_section() as well.
Then, you could combine both changelogs explaining the whole thing, and
why we do not need the present_section check either.

But the change looks good to me:

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> ---
>  mm/memory_hotplug.c | 22 +++++++++-------------
>  1 file changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index b0cb05748f99..17a60281c36f 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -517,15 +517,15 @@ static void __remove_zone(struct zone *zone,
> unsigned long start_pfn)
>  	pgdat_resize_unlock(zone->zone_pgdat, &flags);
>  }
>  
> -static int __remove_section(struct zone *zone, struct mem_section
> *ms,
> -		unsigned long map_offset, struct vmem_altmap
> *altmap)
> +static void __remove_section(struct zone *zone, struct mem_section
> *ms,
> +			     unsigned long map_offset,
> +			     struct vmem_altmap *altmap)
>  {
>  	unsigned long start_pfn;
>  	int scn_nr;
> -	int ret = -EINVAL;
>  
> -	if (!valid_section(ms))
> -		return ret;
> +	if (WARN_ON_ONCE(!valid_section(ms)))
> +		return;
>  
>  	unregister_memory_section(ms);
>  
> @@ -534,7 +534,6 @@ static int __remove_section(struct zone *zone,
> struct mem_section *ms,
>  	__remove_zone(zone, start_pfn);
>  
>  	sparse_remove_one_section(zone, ms, map_offset, altmap);
> -	return 0;
>  }
>  
>  /**
> @@ -554,7 +553,7 @@ int __remove_pages(struct zone *zone, unsigned
> long phys_start_pfn,
>  {
>  	unsigned long i;
>  	unsigned long map_offset = 0;
> -	int sections_to_remove, ret = 0;
> +	int sections_to_remove;
>  
>  	/* In the ZONE_DEVICE case device driver owns the memory
> region */
>  	if (is_dev_zone(zone)) {
> @@ -575,16 +574,13 @@ int __remove_pages(struct zone *zone, unsigned
> long phys_start_pfn,
>  		unsigned long pfn = phys_start_pfn +
> i*PAGES_PER_SECTION;
>  
>  		cond_resched();
> -		ret = __remove_section(zone, __pfn_to_section(pfn),
> map_offset,
> -				altmap);
> +		__remove_section(zone, __pfn_to_section(pfn),
> map_offset,
> +				 altmap);
>  		map_offset = 0;
> -		if (ret)
> -			break;
>  	}
>  
>  	set_zone_contiguous(zone);
> -
> -	return ret;
> +	return 0;
>  }
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/4] mm/memory_hotplug: Make __remove_section() never fail
  2019-04-17 13:56   ` Oscar Salvador
@ 2019-04-24  6:54     ` David Hildenbrand
  0 siblings, 0 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-04-24  6:54 UTC (permalink / raw)
  To: Oscar Salvador, linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Hocko, Pavel Tatashin,
	Qian Cai, Wei Yang, Arun KS, Mathieu Malaterre

On 17.04.19 15:56, Oscar Salvador wrote:
> On Tue, 2019-04-09 at 12:01 +0200, David Hildenbrand wrote:
>> Let's just warn in case a section is not valid instead of failing to
>> remove somewhere in the middle of the process, returning an error
>> that
>> will be mostly ignored by callers.
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Qian Cai <cai@lca.pw>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Arun KS <arunks@codeaurora.org>
>> Cc: Mathieu Malaterre <malat@debian.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Just a nit:
> 
> I think this could be combined with patch#2.
> The only reason to fail in here is 1) !valid_section 2)
> !present_section.
> As I stated in patch#2, one cannot be without the other, so makes sense
> to rip present_section check from unregister_mem_section() as well.
> Then, you could combine both changelogs explaining the whole thing, and
> why we do not need the present_section check either.
> 

If I have to resend the whole thing, I might do that. Otherwise we can
drop the present_section() based on your explanation later.

Thanks!

> But the change looks good to me:
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de>


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-04-24  6:54 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-09 10:01 [PATCH v1 0/4] mm/memory_hotplug: Better error handling when removing memory David Hildenbrand
2019-04-09 10:01 ` [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory() David Hildenbrand
2019-04-09 22:41   ` Andrew Morton
2019-04-10  8:07     ` David Hildenbrand
2019-04-17  3:37       ` Andrew Morton
2019-04-17  7:44         ` David Hildenbrand
2019-04-17 11:52   ` Oscar Salvador
2019-04-17 12:02   ` [PATCH] mm/memory_hotplug: Fixup "Release memory resource after arch_remove_memory()" David Hildenbrand
2019-04-17 13:12   ` [PATCH v1 1/4] mm/memory_hotplug: Release memory resource after arch_remove_memory() Michal Hocko
2019-04-17 13:24     ` David Hildenbrand
2019-04-17 13:31       ` Michal Hocko
2019-04-17 13:48         ` David Hildenbrand
2019-04-09 10:01 ` [PATCH v1 2/4] mm/memory_hotplug: Make unregister_memory_section() never fail David Hildenbrand
2019-04-17 12:45   ` Oscar Salvador
2019-04-17 13:10     ` David Hildenbrand
2019-04-09 10:01 ` [PATCH v1 3/4] mm/memory_hotplug: Make __remove_section() " David Hildenbrand
2019-04-17 13:56   ` Oscar Salvador
2019-04-24  6:54     ` David Hildenbrand
2019-04-09 10:01 ` [PATCH v1 4/4] mm/memory_hotplug: Make __remove_pages() and arch_remove_memory() " David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).