linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling
@ 2019-05-07 18:37 David Hildenbrand
  2019-05-07 18:37 ` [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
                   ` (8 more replies)
  0 siblings, 9 replies; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:37 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Alex Deucher,
	Andrew Banman, Andy Lutomirski, Arun KS, Baoquan He,
	Benjamin Herrenschmidt, Borislav Petkov, Christophe Leroy,
	Chris Wilson, Dave Hansen, David S. Miller, Fenghua Yu,
	Greg Kroah-Hartman, Heiko Carstens, H. Peter Anvin, Ingo Molnar,
	Ingo Molnar, Jonathan Cameron, Joonsoo Kim, Kirill A. Shutemov,
	Logan Gunthorpe, Mark Brown, Martin Schwidefsky, Masahiro Yamada,
	Mathieu Malaterre, Michael Ellerman, Michal Hocko, Mike Rapoport,
	Mike Rapoport, mike.travis, Nicholas Piggin, Oscar Salvador,
	Oscar Salvador, Paul Mackerras, Pavel Tatashin, Pavel Tatashin,
	Peter Zijlstra, Qian Cai, Rafael J. Wysocki, Rich Felker,
	Rob Herring, Thomas Gleixner, Tony Luck, Vasily Gorbik, Wei Yang,
	Wei Yang, Yoshinori Sato

We only want memory block devices for memory to be onlined/offlined
(add/remove from the buddy). This is required so user space can
online/offline memory and kdump gets notified about newly onlined memory.

Only such memory has the requirement of having to span whole memory blocks.
Let's factor out creation/removal of memory block devices. This helps
to further cleanup arch_add_memory/arch_remove_memory() and to make
implementation of new features easier. E.g. supplying a driver for
memory block devices becomes way easier (so user space is able to
distinguish different types of added memory to properly online it).

Patch 1 makes sure the memory block size granularity is always respected.
Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares
arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE.
Patch 4,5 and 6 factor out creation/removal of memory block devices.
Patch 7 gets rid of some unlikely errors that could have happened, not
removing links between memory block devices and nodes, previously brought
up by Oscar.

Did a quick sanity test with DIMM plug/unplug, making sure all devices
and sysfs links properly get added/removed. Compile tested on s390x and
x86-64.

Based on git://git.cmpxchg.org/linux-mmots.git

Next refactoring on my list will be making sure that remove_memory()
will never deal with zones / access "struct pages". Any kind of zone
handling will have to be done when offlining system memory / before
removing device memory. I am thinking about remove_pfn_range_from_zone()",
du undo everything "move_pfn_range_to_zone()" did.

v1 -> v2:
- s390x/mm: Implement arch_remove_memory()
-- remove mapping after "__remove_pages"


David Hildenbrand (8):
  mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  s390x/mm: Implement arch_remove_memory()
  mm/memory_hotplug: arch_remove_memory() and __remove_pages() with
    CONFIG_MEMORY_HOTPLUG
  mm/memory_hotplug: Create memory block devices after arch_add_memory()
  mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  mm/memory_hotplug: Remove memory block devices before
    arch_remove_memory()
  mm/memory_hotplug: Make unregister_memory_block_under_nodes() never
    fail
  mm/memory_hotplug: Remove "zone" parameter from
    sparse_remove_one_section

 arch/ia64/mm/init.c            |   2 -
 arch/powerpc/mm/mem.c          |   2 -
 arch/s390/mm/init.c            |  15 +++--
 arch/sh/mm/init.c              |   2 -
 arch/x86/mm/init_32.c          |   2 -
 arch/x86/mm/init_64.c          |   2 -
 drivers/base/memory.c          | 109 +++++++++++++++++++--------------
 drivers/base/node.c            |  27 +++-----
 include/linux/memory.h         |   6 +-
 include/linux/memory_hotplug.h |  12 +---
 include/linux/node.h           |   7 +--
 mm/memory_hotplug.c            |  44 ++++++-------
 mm/sparse.c                    |  10 +--
 13 files changed, 104 insertions(+), 136 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  2019-05-07 18:37 [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
@ 2019-05-07 18:37 ` David Hildenbrand
  2019-05-07 20:38   ` Dan Williams
  2019-05-09 12:23   ` Wei Yang
  2019-05-07 18:37 ` [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory() David Hildenbrand
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:37 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang, Arun KS,
	Mathieu Malaterre

By converting start and size to page granularity, we actually ignore
unaligned parts within a page instead of properly bailing out with an
error.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 328878b6799d..202febe88b58 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1050,16 +1050,11 @@ int try_online_node(int nid)
 
 static int check_hotplug_memory_range(u64 start, u64 size)
 {
-	unsigned long block_sz = memory_block_size_bytes();
-	u64 block_nr_pages = block_sz >> PAGE_SHIFT;
-	u64 nr_pages = size >> PAGE_SHIFT;
-	u64 start_pfn = PFN_DOWN(start);
-
 	/* memory range must be block size aligned */
-	if (!nr_pages || !IS_ALIGNED(start_pfn, block_nr_pages) ||
-	    !IS_ALIGNED(nr_pages, block_nr_pages)) {
+	if (!size || !IS_ALIGNED(start, memory_block_size_bytes()) ||
+	    !IS_ALIGNED(size, memory_block_size_bytes())) {
 		pr_err("Block size [%#lx] unaligned hotplug range: start %#llx, size %#llx",
-		       block_sz, start, size);
+		       memory_block_size_bytes(), start, size);
 		return -EINVAL;
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory()
  2019-05-07 18:37 [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
  2019-05-07 18:37 ` [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
@ 2019-05-07 18:37 ` David Hildenbrand
  2019-05-07 20:46   ` Dan Williams
  2019-05-07 18:37 ` [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG David Hildenbrand
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:37 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Martin Schwidefsky,
	Heiko Carstens, Michal Hocko, Mike Rapoport, Vasily Gorbik,
	Oscar Salvador

Will come in handy when wanting to handle errors after
arch_add_memory().

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/init.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 31b1071315d7..1e0cbae69f12 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -237,12 +237,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
 void arch_remove_memory(int nid, u64 start, u64 size,
 			struct vmem_altmap *altmap)
 {
-	/*
-	 * There is no hardware or firmware interface which could trigger a
-	 * hot memory remove on s390. So there is nothing that needs to be
-	 * implemented.
-	 */
-	BUG();
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	struct zone *zone;
+
+	zone = page_zone(pfn_to_page(start_pfn));
+	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	vmem_remove_mapping(start, size);
 }
 #endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG
  2019-05-07 18:37 [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
  2019-05-07 18:37 ` [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
  2019-05-07 18:37 ` [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory() David Hildenbrand
@ 2019-05-07 18:37 ` David Hildenbrand
  2019-05-07 21:02   ` Dan Williams
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:37 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Tony Luck, Fenghua Yu,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Martin Schwidefsky, Heiko Carstens, Yoshinori Sato, Rich Felker,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Greg Kroah-Hartman,
	Rafael J. Wysocki, Michal Hocko, Mike Rapoport, Oscar Salvador,
	Kirill A. Shutemov, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Christophe Leroy, Nicholas Piggin, Vasily Gorbik,
	Rob Herring, Masahiro Yamada, mike.travis, Andrew Banman,
	Pavel Tatashin, Wei Yang, Arun KS, Qian Cai, Mathieu Malaterre,
	Baoquan He, Logan Gunthorpe

Let's prepare for better error handling while adding memory by allowing
to use arch_remove_memory() and __remove_pages() even if
CONFIG_MEMORY_HOTREMOVE is not set. CONFIG_MEMORY_HOTREMOVE effectively
covers
- Offlining of system ram (memory block devices) - offline_pages()
- Unplug of system ram - remove_memory()
- Unplug/remap of device memory - devm_memremap()

This allows e.g. for handling like

arch_add_memory()
rc = do_something();
if (rc) {
	arch_remove_memory();
}

Whereby do_something() will for example be memory block device creation
after it has been factored out.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/ia64/mm/init.c            | 2 --
 arch/powerpc/mm/mem.c          | 2 --
 arch/s390/mm/init.c            | 2 --
 arch/sh/mm/init.c              | 2 --
 arch/x86/mm/init_32.c          | 2 --
 arch/x86/mm/init_64.c          | 2 --
 drivers/base/memory.c          | 2 --
 include/linux/memory.h         | 2 --
 include/linux/memory_hotplug.h | 2 --
 mm/memory_hotplug.c            | 2 --
 mm/sparse.c                    | 6 ------
 11 files changed, 26 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index d28e29103bdb..aae75fd7b810 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -681,7 +681,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return ret;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
 			struct vmem_altmap *altmap)
 {
@@ -693,4 +692,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 	__remove_pages(zone, start_pfn, nr_pages, altmap);
 }
 #endif
-#endif
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index a2b78e72452f..ddc69b59575c 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -131,7 +131,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
 	return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void __ref arch_remove_memory(int nid, u64 start, u64 size,
 			     struct vmem_altmap *altmap)
 {
@@ -164,7 +163,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 	resize_hpt_for_hotplug(memblock_phys_mem_size());
 }
 #endif
-#endif /* CONFIG_MEMORY_HOTPLUG */
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 void __init mem_topology_setup(void)
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 1e0cbae69f12..eafa3c750efc 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -233,7 +233,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return rc;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
 			struct vmem_altmap *altmap)
 {
@@ -245,5 +244,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 	__remove_pages(zone, start_pfn, nr_pages, altmap);
 	vmem_remove_mapping(start, size);
 }
-#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 5aeb4d7099a1..59c5fe511f25 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -428,7 +428,6 @@ int memory_add_physaddr_to_nid(u64 addr)
 EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 #endif
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
 			struct vmem_altmap *altmap)
 {
@@ -439,5 +438,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 	zone = page_zone(pfn_to_page(start_pfn));
 	__remove_pages(zone, start_pfn, nr_pages, altmap);
 }
-#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 075e568098f2..8d4bf2d97d50 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -859,7 +859,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
 			struct vmem_altmap *altmap)
 {
@@ -871,7 +870,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 	__remove_pages(zone, start_pfn, nr_pages, altmap);
 }
 #endif
-#endif
 
 int kernel_set_to_readonly __read_mostly;
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 20d14254b686..f1b55ddea23f 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1131,7 +1131,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
 	remove_pagetable(start, end, false, altmap);
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 static void __meminit
 kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 {
@@ -1156,7 +1155,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 	__remove_pages(zone, start_pfn, nr_pages, altmap);
 	kernel_physical_mapping_remove(start, start + size);
 }
-#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 static struct kcore_list kcore_vsyscall;
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f180427e48f4..6e0cb4fda179 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -728,7 +728,6 @@ int hotplug_memory_register(int nid, struct mem_section *section)
 	return ret;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 static void
 unregister_memory(struct memory_block *memory)
 {
@@ -767,7 +766,6 @@ void unregister_memory_section(struct mem_section *section)
 out_unlock:
 	mutex_unlock(&mem_sysfs_mutex);
 }
-#endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /* return true if the memory block is offlined, otherwise, return false */
 bool is_memblock_offlined(struct memory_block *mem)
diff --git a/include/linux/memory.h b/include/linux/memory.h
index e1dc1bb2b787..474c7c60c8f2 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -112,9 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
 int hotplug_memory_register(int nid, struct mem_section *section);
-#ifdef CONFIG_MEMORY_HOTREMOVE
 extern void unregister_memory_section(struct mem_section *);
-#endif
 extern int memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
 extern int memory_isolate_notify(unsigned long val, void *v);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ae892eef8b82..2d4de313926d 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -123,12 +123,10 @@ static inline bool movable_node_is_enabled(void)
 	return movable_node_enabled;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 extern void arch_remove_memory(int nid, u64 start, u64 size,
 			       struct vmem_altmap *altmap);
 extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
 			   unsigned long nr_pages, struct vmem_altmap *altmap);
-#endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /*
  * Do we want sysfs memblock files created. This will allow userspace to online
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 202febe88b58..7b5439839d67 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -317,7 +317,6 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	return err;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 /* find the smallest valid pfn in the range [start_pfn, end_pfn) */
 static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
 				     unsigned long start_pfn,
@@ -581,7 +580,6 @@ void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 
 	set_zone_contiguous(zone);
 }
-#endif /* CONFIG_MEMORY_HOTREMOVE */
 
 int set_online_page_callback(online_page_callback_t callback)
 {
diff --git a/mm/sparse.c b/mm/sparse.c
index fd13166949b5..d1d5e05f5b8d 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -604,7 +604,6 @@ static void __kfree_section_memmap(struct page *memmap,
 
 	vmemmap_free(start, end, altmap);
 }
-#ifdef CONFIG_MEMORY_HOTREMOVE
 static void free_map_bootmem(struct page *memmap)
 {
 	unsigned long start = (unsigned long)memmap;
@@ -612,7 +611,6 @@ static void free_map_bootmem(struct page *memmap)
 
 	vmemmap_free(start, end, NULL);
 }
-#endif /* CONFIG_MEMORY_HOTREMOVE */
 #else
 static struct page *__kmalloc_section_memmap(void)
 {
@@ -651,7 +649,6 @@ static void __kfree_section_memmap(struct page *memmap,
 			   get_order(sizeof(struct page) * PAGES_PER_SECTION));
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 static void free_map_bootmem(struct page *memmap)
 {
 	unsigned long maps_section_nr, removing_section_nr, i;
@@ -681,7 +678,6 @@ static void free_map_bootmem(struct page *memmap)
 			put_page_bootmem(page);
 	}
 }
-#endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
 /**
@@ -746,7 +742,6 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
 	return ret;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 #ifdef CONFIG_MEMORY_FAILURE
 static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
 {
@@ -823,5 +818,4 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
 			PAGES_PER_SECTION - map_offset);
 	free_section_usemap(memmap, usemap, altmap);
 }
-#endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-07 18:37 [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
                   ` (2 preceding siblings ...)
  2019-05-07 18:37 ` [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG David Hildenbrand
@ 2019-05-07 18:38 ` David Hildenbrand
  2019-05-07 21:17   ` Dan Williams
                     ` (4 more replies)
  2019-05-07 18:38 ` [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API David Hildenbrand
                   ` (4 subsequent siblings)
  8 siblings, 5 replies; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang,
	Arun KS, Mathieu Malaterre

Only memory to be added to the buddy and to be onlined/offlined by
user space using memory block devices needs (and should have!) memory
block devices.

Factor out creation of memory block devices Create all devices after
arch_add_memory() succeeded. We can later drop the want_memblock parameter,
because it is now effectively stale.

Only after memory block devices have been added, memory can be onlined
by user space. This implies, that memory is not visible to user space at
all before arch_add_memory() succeeded.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
 include/linux/memory.h |  2 +-
 mm/memory_hotplug.c    | 15 ++++-----
 3 files changed, 53 insertions(+), 34 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 6e0cb4fda179..862c202a18ca 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
 	return 0;
 }
 
+static void unregister_memory(struct memory_block *memory)
+{
+	BUG_ON(memory->dev.bus != &memory_subsys);
+
+	/* drop the ref. we got via find_memory_block() */
+	put_device(&memory->dev);
+	device_unregister(&memory->dev);
+}
+
 /*
- * need an interface for the VM to add new memory regions,
- * but without onlining it.
+ * Create memory block devices for the given memory area. Start and size
+ * have to be aligned to memory block granularity. Memory block devices
+ * will be initialized as offline.
  */
-int hotplug_memory_register(int nid, struct mem_section *section)
+int hotplug_memory_register(unsigned long start, unsigned long size)
 {
-	int ret = 0;
+	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
+	unsigned long start_pfn = PFN_DOWN(start);
+	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
+	unsigned long pfn;
 	struct memory_block *mem;
+	int ret = 0;
 
-	mutex_lock(&mem_sysfs_mutex);
+	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
+	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
 
-	mem = find_memory_block(section);
-	if (mem) {
-		mem->section_count++;
-		put_device(&mem->dev);
-	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
+	mutex_lock(&mem_sysfs_mutex);
+	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
+		mem = find_memory_block(__pfn_to_section(pfn));
+		if (mem) {
+			WARN_ON_ONCE(false);
+			put_device(&mem->dev);
+			continue;
+		}
+		ret = init_memory_block(&mem, __pfn_to_section(pfn),
+					MEM_OFFLINE);
 		if (ret)
-			goto out;
-		mem->section_count++;
+			break;
+		mem->section_count = memory_block_size_bytes() /
+				     MIN_MEMORY_BLOCK_SIZE;
+	}
+	if (ret) {
+		end_pfn = pfn;
+		for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
+			mem = find_memory_block(__pfn_to_section(pfn));
+			if (!mem)
+				continue;
+			mem->section_count = 0;
+			unregister_memory(mem);
+		}
 	}
-
-out:
 	mutex_unlock(&mem_sysfs_mutex);
 	return ret;
 }
 
-static void
-unregister_memory(struct memory_block *memory)
-{
-	BUG_ON(memory->dev.bus != &memory_subsys);
-
-	/* drop the ref. we got via find_memory_block() */
-	put_device(&memory->dev);
-	device_unregister(&memory->dev);
-}
-
-void unregister_memory_section(struct mem_section *section)
+static int remove_memory_section(struct mem_section *section)
 {
 	struct memory_block *mem;
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 474c7c60c8f2..95505fbb5f85 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -111,7 +111,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
-int hotplug_memory_register(int nid, struct mem_section *section);
+int hotplug_memory_register(unsigned long start, unsigned long size);
 extern void unregister_memory_section(struct mem_section *);
 extern int memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7b5439839d67..e1637c8a0723 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -258,13 +258,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 		return -EEXIST;
 
 	ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
-	if (ret < 0)
-		return ret;
-
-	if (!want_memblock)
-		return 0;
-
-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
+	return ret < 0 ? ret : 0;
 }
 
 /*
@@ -1106,6 +1100,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	if (ret < 0)
 		goto error;
 
+	/* create memory block devices after memory was added */
+	ret = hotplug_memory_register(start, size);
+	if (ret) {
+		arch_remove_memory(nid, start, size, NULL);
+		goto error;
+	}
+
 	if (new_node) {
 		/* If sysfs file of new node can't be created, cpu on the node
 		 * can't be hot-added. There is no rollback way now.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-07 18:37 [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
                   ` (3 preceding siblings ...)
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
@ 2019-05-07 18:38 ` David Hildenbrand
  2019-05-07 21:19   ` Dan Williams
  2019-05-07 18:38 ` [PATCH v2 6/8] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai,
	Arun KS, Mathieu Malaterre

No longer needed, the callers of arch_add_memory() can handle this
manually.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/memory_hotplug.h | 8 --------
 mm/memory_hotplug.c            | 9 +++------
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2d4de313926d..2f1f87e13baa 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
 extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
 			   unsigned long nr_pages, struct vmem_altmap *altmap);
 
-/*
- * Do we want sysfs memblock files created. This will allow userspace to online
- * and offline memory explicitly. Lack of this bit means that the caller has to
- * call move_pfn_range_to_zone to finish the initialization.
- */
-
-#define MHP_MEMBLOCK_API               (1<<0)
-
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
 		       struct mhp_restrictions *restrictions);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e1637c8a0723..107f72952347 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
 static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-		struct vmem_altmap *altmap, bool want_memblock)
+				   struct vmem_altmap *altmap)
 {
 	int ret;
 
@@ -293,8 +293,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	}
 
 	for (i = start_sec; i <= end_sec; i++) {
-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
-				restrictions->flags & MHP_MEMBLOCK_API);
+		err = __add_section(nid, section_nr_to_pfn(i), altmap);
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
@@ -1066,9 +1065,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
  */
 int __ref add_memory_resource(int nid, struct resource *res)
 {
-	struct mhp_restrictions restrictions = {
-		.flags = MHP_MEMBLOCK_API,
-	};
+	struct mhp_restrictions restrictions = {};
 	u64 start, size;
 	bool new_node = false;
 	int ret;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 6/8] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
  2019-05-07 18:37 [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
                   ` (4 preceding siblings ...)
  2019-05-07 18:38 ` [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API David Hildenbrand
@ 2019-05-07 18:38 ` David Hildenbrand
  2019-05-07 21:27   ` Dan Williams
  2019-05-07 18:38 ` [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Andrew Banman, Ingo Molnar,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Oscar Salvador, Jonathan Cameron, Michal Hocko, Pavel Tatashin,
	Arun KS, Mathieu Malaterre

Let's factor out removing of memory block devices, which is only
necessary for memory added via add_memory() and friends that created
memory block devices. Remove the devices before calling
arch_remove_memory().

This finishes factoring out memory block device handling from
arch_add_memory() and arch_remove_memory().

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 39 +++++++++++++++++++--------------------
 drivers/base/node.c    | 11 ++++++-----
 include/linux/memory.h |  2 +-
 include/linux/node.h   |  6 ++----
 mm/memory_hotplug.c    |  5 +++--
 5 files changed, 31 insertions(+), 32 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 862c202a18ca..47ff49058d1f 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -756,32 +756,31 @@ int hotplug_memory_register(unsigned long start, unsigned long size)
 	return ret;
 }
 
-static int remove_memory_section(struct mem_section *section)
+/*
+ * Remove memory block devices for the given memory area. Start and size
+ * have to be aligned to memory block granularity. Memory block devices
+ * have to be offline.
+ */
+void hotplug_memory_unregister(unsigned long start, unsigned long size)
 {
+	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
+	unsigned long start_pfn = PFN_DOWN(start);
+	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
 	struct memory_block *mem;
+	unsigned long pfn;
 
-	if (WARN_ON_ONCE(!present_section(section)))
-		return;
+	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
+	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
 
 	mutex_lock(&mem_sysfs_mutex);
-
-	/*
-	 * Some users of the memory hotplug do not want/need memblock to
-	 * track all sections. Skip over those.
-	 */
-	mem = find_memory_block(section);
-	if (!mem)
-		goto out_unlock;
-
-	unregister_mem_sect_under_nodes(mem, __section_nr(section));
-
-	mem->section_count--;
-	if (mem->section_count == 0)
+	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
+		mem = find_memory_block(__pfn_to_section(pfn));
+		if (!mem)
+			continue;
+		mem->section_count = 0;
+		unregister_memory_block_under_nodes(mem);
 		unregister_memory(mem);
-	else
-		put_device(&mem->dev);
-
-out_unlock:
+	}
 	mutex_unlock(&mem_sysfs_mutex);
 }
 
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 8598fcbd2a17..04fdfa99b8bc 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -801,9 +801,10 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
 	return 0;
 }
 
-/* unregister memory section under all nodes that it spans */
-int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
-				    unsigned long phys_index)
+/*
+ * Unregister memory block device under all nodes that it spans.
+ */
+int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
 	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
@@ -816,8 +817,8 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
 		return -ENOMEM;
 	nodes_clear(*unlinked_nodes);
 
-	sect_start_pfn = section_nr_to_pfn(phys_index);
-	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
+	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
 		int nid;
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 95505fbb5f85..aa236c2a0466 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -112,7 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
 int hotplug_memory_register(unsigned long start, unsigned long size);
-extern void unregister_memory_section(struct mem_section *);
+void hotplug_memory_unregister(unsigned long start, unsigned long size);
 extern int memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
 extern int memory_isolate_notify(unsigned long val, void *v);
diff --git a/include/linux/node.h b/include/linux/node.h
index 1a557c589ecb..02a29e71b175 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -139,8 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int register_mem_sect_under_node(struct memory_block *mem_blk,
 						void *arg);
-extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
-					   unsigned long phys_index);
+extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
 
 extern int register_memory_node_under_compute_node(unsigned int mem_nid,
 						   unsigned int cpu_nid,
@@ -176,8 +175,7 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
 {
 	return 0;
 }
-static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
-						  unsigned long phys_index)
+static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
 	return 0;
 }
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 107f72952347..527fe4f9c620 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -519,8 +519,6 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
 	if (WARN_ON_ONCE(!valid_section(ms)))
 		return;
 
-	unregister_memory_section(ms);
-
 	scn_nr = __section_nr(ms);
 	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
 	__remove_zone(zone, start_pfn);
@@ -1844,6 +1842,9 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
 	memblock_free(start, size);
 	memblock_remove(start, size);
 
+	/* remove memory block devices before removing memory */
+	hotplug_memory_unregister(start, size);
+
 	arch_remove_memory(nid, start, size, NULL);
 	__release_memory_resource(start, size);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-05-07 18:37 [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
                   ` (5 preceding siblings ...)
  2019-05-07 18:38 ` [PATCH v2 6/8] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
@ 2019-05-07 18:38 ` David Hildenbrand
  2019-05-08  0:15   ` Dan Williams
  2019-05-07 18:38 ` [PATCH v2 8/8] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
  2019-05-07 19:04 ` [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling Dan Williams
  8 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Oscar Salvador, Jonathan Cameron

We really don't want anything during memory hotunplug to fail.
We always pass a valid memory block device, that check can go. Avoid
allocating memory and eventually failing. As we are always called under
lock, we can use a static piece of memory. This avoids having to put
the structure onto the stack, having to guess about the stack size
of callers.

Patch inspired by a patch from Oscar Salvador.

In the future, there might be no need to iterate over nodes at all.
mem->nid should tell us exactly what to remove. Memory block devices
with mixed nodes (added during boot) should properly fenced off and never
removed.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/node.c  | 18 +++++-------------
 include/linux/node.h |  5 ++---
 2 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 04fdfa99b8bc..9be88fd05147 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
 
 /*
  * Unregister memory block device under all nodes that it spans.
+ * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).
  */
-int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
+void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
-	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
+	static nodemask_t unlinked_nodes;
 
-	if (!mem_blk) {
-		NODEMASK_FREE(unlinked_nodes);
-		return -EFAULT;
-	}
-	if (!unlinked_nodes)
-		return -ENOMEM;
-	nodes_clear(*unlinked_nodes);
-
+	nodes_clear(unlinked_nodes);
 	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
 	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
@@ -827,15 +821,13 @@ int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 			continue;
 		if (!node_online(nid))
 			continue;
-		if (node_test_and_set(nid, *unlinked_nodes))
+		if (node_test_and_set(nid, unlinked_nodes))
 			continue;
 		sysfs_remove_link(&node_devices[nid]->dev.kobj,
 			 kobject_name(&mem_blk->dev.kobj));
 		sysfs_remove_link(&mem_blk->dev.kobj,
 			 kobject_name(&node_devices[nid]->dev.kobj));
 	}
-	NODEMASK_FREE(unlinked_nodes);
-	return 0;
 }
 
 int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn)
diff --git a/include/linux/node.h b/include/linux/node.h
index 02a29e71b175..548c226966a2 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -139,7 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int register_mem_sect_under_node(struct memory_block *mem_blk,
 						void *arg);
-extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
+extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
 
 extern int register_memory_node_under_compute_node(unsigned int mem_nid,
 						   unsigned int cpu_nid,
@@ -175,9 +175,8 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
 {
 	return 0;
 }
-static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
+static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
-	return 0;
 }
 
 static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 8/8] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section
  2019-05-07 18:37 [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
                   ` (6 preceding siblings ...)
  2019-05-07 18:38 ` [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
@ 2019-05-07 18:38 ` David Hildenbrand
  2019-05-08  0:30   ` Dan Williams
  2019-05-07 19:04 ` [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling Dan Williams
  8 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand

Unused, and memory unplug path should never care about zones. This is
the job of memory offlining. ZONE_DEVICE might require special care -
the caller of arch_remove_memory() should handle this.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/memory_hotplug.h | 2 +-
 mm/memory_hotplug.c            | 2 +-
 mm/sparse.c                    | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2f1f87e13baa..1a4257c5f74c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -346,7 +346,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern int sparse_add_one_section(int nid, unsigned long start_pfn,
 				  struct vmem_altmap *altmap);
-extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
+extern void sparse_remove_one_section(struct mem_section *ms,
 		unsigned long map_offset, struct vmem_altmap *altmap);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 527fe4f9c620..e0340c8f6df4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -523,7 +523,7 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
 	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
 	__remove_zone(zone, start_pfn);
 
-	sparse_remove_one_section(zone, ms, map_offset, altmap);
+	sparse_remove_one_section(ms, map_offset, altmap);
 }
 
 /**
diff --git a/mm/sparse.c b/mm/sparse.c
index d1d5e05f5b8d..1552c855d62a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -800,8 +800,8 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap,
 		free_map_bootmem(memmap);
 }
 
-void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
-		unsigned long map_offset, struct vmem_altmap *altmap)
+void sparse_remove_one_section(struct mem_section *ms, unsigned long map_offset,
+			       struct vmem_altmap *altmap)
 {
 	struct page *memmap = NULL;
 	unsigned long *usemap = NULL;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling
  2019-05-07 18:37 [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
                   ` (7 preceding siblings ...)
  2019-05-07 18:38 ` [PATCH v2 8/8] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
@ 2019-05-07 19:04 ` Dan Williams
  2019-05-07 19:21   ` David Hildenbrand
  8 siblings, 1 reply; 45+ messages in thread
From: Dan Williams @ 2019-05-07 19:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Alex Deucher, Andrew Banman,
	Andy Lutomirski, Arun KS, Baoquan He, Benjamin Herrenschmidt,
	Borislav Petkov, Christophe Leroy, Chris Wilson, Dave Hansen,
	David S. Miller, Fenghua Yu, Greg Kroah-Hartman, Heiko Carstens,
	H. Peter Anvin, Ingo Molnar, Ingo Molnar, Jonathan Cameron,
	Joonsoo Kim, Kirill A. Shutemov, Logan Gunthorpe, Mark Brown,
	Martin Schwidefsky, Masahiro Yamada, Mathieu Malaterre,
	Michael Ellerman, Michal Hocko, Mike Rapoport, Mike Rapoport,
	mike.travis, Nicholas Piggin, Oscar Salvador, Oscar Salvador,
	Paul Mackerras, Pavel Tatashin, Pavel Tatashin, Peter Zijlstra,
	Qian Cai, Rafael J. Wysocki, Rich Felker, Rob Herring,
	Thomas Gleixner, Tony Luck, Vasily Gorbik, Wei Yang, Wei Yang,
	Yoshinori Sato

On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> We only want memory block devices for memory to be onlined/offlined
> (add/remove from the buddy). This is required so user space can
> online/offline memory and kdump gets notified about newly onlined memory.
>
> Only such memory has the requirement of having to span whole memory blocks.
> Let's factor out creation/removal of memory block devices. This helps
> to further cleanup arch_add_memory/arch_remove_memory() and to make
> implementation of new features easier. E.g. supplying a driver for
> memory block devices becomes way easier (so user space is able to
> distinguish different types of added memory to properly online it).
>
> Patch 1 makes sure the memory block size granularity is always respected.
> Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares
> arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE.
> Patch 4,5 and 6 factor out creation/removal of memory block devices.
> Patch 7 gets rid of some unlikely errors that could have happened, not
> removing links between memory block devices and nodes, previously brought
> up by Oscar.
>
> Did a quick sanity test with DIMM plug/unplug, making sure all devices
> and sysfs links properly get added/removed. Compile tested on s390x and
> x86-64.
>
> Based on git://git.cmpxchg.org/linux-mmots.git
>
> Next refactoring on my list will be making sure that remove_memory()
> will never deal with zones / access "struct pages". Any kind of zone
> handling will have to be done when offlining system memory / before
> removing device memory. I am thinking about remove_pfn_range_from_zone()",
> du undo everything "move_pfn_range_to_zone()" did.
>
> v1 -> v2:
> - s390x/mm: Implement arch_remove_memory()
> -- remove mapping after "__remove_pages"
>
>
> David Hildenbrand (8):
>   mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
>   s390x/mm: Implement arch_remove_memory()
>   mm/memory_hotplug: arch_remove_memory() and __remove_pages() with
>     CONFIG_MEMORY_HOTPLUG
>   mm/memory_hotplug: Create memory block devices after arch_add_memory()
>   mm/memory_hotplug: Drop MHP_MEMBLOCK_API

So at a minimum we need a bit of patch staging guidance because this
obviously collides with the subsection bits that are built on top of
the existence of MHP_MEMBLOCK_API. What trigger do you envision as a
replacement that arch_add_memory() use to determine that subsection
operations should be disallowed?


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling
  2019-05-07 19:04 ` [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling Dan Williams
@ 2019-05-07 19:21   ` David Hildenbrand
  2019-05-07 19:37     ` David Hildenbrand
  0 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 19:21 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Alex Deucher, Andrew Banman,
	Andy Lutomirski, Arun KS, Baoquan He, Benjamin Herrenschmidt,
	Borislav Petkov, Christophe Leroy, Chris Wilson, Dave Hansen,
	David S. Miller, Fenghua Yu, Greg Kroah-Hartman, Heiko Carstens,
	H. Peter Anvin, Ingo Molnar, Ingo Molnar, Jonathan Cameron,
	Joonsoo Kim, Kirill A. Shutemov, Logan Gunthorpe, Mark Brown,
	Martin Schwidefsky, Masahiro Yamada, Mathieu Malaterre,
	Michael Ellerman, Michal Hocko, Mike Rapoport, Mike Rapoport,
	mike.travis, Nicholas Piggin, Oscar Salvador, Oscar Salvador,
	Paul Mackerras, Pavel Tatashin, Pavel Tatashin, Peter Zijlstra,
	Qian Cai, Rafael J. Wysocki, Rich Felker, Rob Herring,
	Thomas Gleixner, Tony Luck, Vasily Gorbik, Wei Yang, Wei Yang,
	Yoshinori Sato

On 07.05.19 21:04, Dan Williams wrote:
> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> We only want memory block devices for memory to be onlined/offlined
>> (add/remove from the buddy). This is required so user space can
>> online/offline memory and kdump gets notified about newly onlined memory.
>>
>> Only such memory has the requirement of having to span whole memory blocks.
>> Let's factor out creation/removal of memory block devices. This helps
>> to further cleanup arch_add_memory/arch_remove_memory() and to make
>> implementation of new features easier. E.g. supplying a driver for
>> memory block devices becomes way easier (so user space is able to
>> distinguish different types of added memory to properly online it).
>>
>> Patch 1 makes sure the memory block size granularity is always respected.
>> Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares
>> arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE.
>> Patch 4,5 and 6 factor out creation/removal of memory block devices.
>> Patch 7 gets rid of some unlikely errors that could have happened, not
>> removing links between memory block devices and nodes, previously brought
>> up by Oscar.
>>
>> Did a quick sanity test with DIMM plug/unplug, making sure all devices
>> and sysfs links properly get added/removed. Compile tested on s390x and
>> x86-64.
>>
>> Based on git://git.cmpxchg.org/linux-mmots.git
>>
>> Next refactoring on my list will be making sure that remove_memory()
>> will never deal with zones / access "struct pages". Any kind of zone
>> handling will have to be done when offlining system memory / before
>> removing device memory. I am thinking about remove_pfn_range_from_zone()",
>> du undo everything "move_pfn_range_to_zone()" did.
>>
>> v1 -> v2:
>> - s390x/mm: Implement arch_remove_memory()
>> -- remove mapping after "__remove_pages"
>>
>>
>> David Hildenbrand (8):
>>   mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
>>   s390x/mm: Implement arch_remove_memory()
>>   mm/memory_hotplug: arch_remove_memory() and __remove_pages() with
>>     CONFIG_MEMORY_HOTPLUG
>>   mm/memory_hotplug: Create memory block devices after arch_add_memory()
>>   mm/memory_hotplug: Drop MHP_MEMBLOCK_API
> 
> So at a minimum we need a bit of patch staging guidance because this
> obviously collides with the subsection bits that are built on top of
> the existence of MHP_MEMBLOCK_API. What trigger do you envision as a
> replacement that arch_add_memory() use to determine that subsection
> operations should be disallowed?
> 

Looks like we now have time to sort it out :)


Looking at your series

[PATCH v8 08/12] mm/sparsemem: Prepare for sub-section ranges

is the "single" effectively place using MHP_MEMBLOCK_API, namely
"subsection_check()". Used when adding/removing memory.


+static int subsection_check(unsigned long pfn, unsigned long nr_pages,
+		unsigned long flags, const char *reason)
+{
+	/*
+	 * Only allow partial section hotplug for !memblock ranges,
+	 * since register_new_memory() requires section alignment, and
+	 * CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully
+	 * populated.
+	 */
+	if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
+				|| (flags & MHP_MEMBLOCK_API))
+			&& ((pfn & ~PAGE_SECTION_MASK)
+				|| (nr_pages & ~PAGE_SECTION_MASK))) {
+		WARN(1, "Sub-section hot-%s incompatible with %s\n", reason,
+				(flags & MHP_MEMBLOCK_API)
+				? "memblock api" : "!CONFIG_SPARSEMEM_VMEMMAP");
+		return -EINVAL;
+	}
+	return 0;
 }


(flags & MHP_MEMBLOCK_API)) && ((pfn & ~PAGE_SECTION_MASK) || (nr_pages
& ~PAGE_SECTION_MASK)))

sounds like something the caller (add_memory()) always has to take care
of. No need to check. The one imposing this restriction is the only caller.

In my opinion, that check/function can go completely.

Am I missing something / missing another user?

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling
  2019-05-07 19:21   ` David Hildenbrand
@ 2019-05-07 19:37     ` David Hildenbrand
  2019-05-07 20:36       ` Dan Williams
  0 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 19:37 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Alex Deucher, Andrew Banman,
	Andy Lutomirski, Arun KS, Baoquan He, Benjamin Herrenschmidt,
	Borislav Petkov, Christophe Leroy, Chris Wilson, Dave Hansen,
	David S. Miller, Fenghua Yu, Greg Kroah-Hartman, Heiko Carstens,
	H. Peter Anvin, Ingo Molnar, Ingo Molnar, Jonathan Cameron,
	Joonsoo Kim, Kirill A. Shutemov, Logan Gunthorpe, Mark Brown,
	Martin Schwidefsky, Masahiro Yamada, Mathieu Malaterre,
	Michael Ellerman, Michal Hocko, Mike Rapoport, Mike Rapoport,
	mike.travis, Nicholas Piggin, Oscar Salvador, Oscar Salvador,
	Paul Mackerras, Pavel Tatashin, Pavel Tatashin, Peter Zijlstra,
	Qian Cai, Rafael J. Wysocki, Rich Felker, Rob Herring,
	Thomas Gleixner, Tony Luck, Vasily Gorbik, Wei Yang, Wei Yang,
	Yoshinori Sato

On 07.05.19 21:21, David Hildenbrand wrote:
> On 07.05.19 21:04, Dan Williams wrote:
>> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>>
>>> We only want memory block devices for memory to be onlined/offlined
>>> (add/remove from the buddy). This is required so user space can
>>> online/offline memory and kdump gets notified about newly onlined memory.
>>>
>>> Only such memory has the requirement of having to span whole memory blocks.
>>> Let's factor out creation/removal of memory block devices. This helps
>>> to further cleanup arch_add_memory/arch_remove_memory() and to make
>>> implementation of new features easier. E.g. supplying a driver for
>>> memory block devices becomes way easier (so user space is able to
>>> distinguish different types of added memory to properly online it).
>>>
>>> Patch 1 makes sure the memory block size granularity is always respected.
>>> Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares
>>> arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE.
>>> Patch 4,5 and 6 factor out creation/removal of memory block devices.
>>> Patch 7 gets rid of some unlikely errors that could have happened, not
>>> removing links between memory block devices and nodes, previously brought
>>> up by Oscar.
>>>
>>> Did a quick sanity test with DIMM plug/unplug, making sure all devices
>>> and sysfs links properly get added/removed. Compile tested on s390x and
>>> x86-64.
>>>
>>> Based on git://git.cmpxchg.org/linux-mmots.git
>>>
>>> Next refactoring on my list will be making sure that remove_memory()
>>> will never deal with zones / access "struct pages". Any kind of zone
>>> handling will have to be done when offlining system memory / before
>>> removing device memory. I am thinking about remove_pfn_range_from_zone()",
>>> du undo everything "move_pfn_range_to_zone()" did.
>>>
>>> v1 -> v2:
>>> - s390x/mm: Implement arch_remove_memory()
>>> -- remove mapping after "__remove_pages"
>>>
>>>
>>> David Hildenbrand (8):
>>>   mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
>>>   s390x/mm: Implement arch_remove_memory()
>>>   mm/memory_hotplug: arch_remove_memory() and __remove_pages() with
>>>     CONFIG_MEMORY_HOTPLUG
>>>   mm/memory_hotplug: Create memory block devices after arch_add_memory()
>>>   mm/memory_hotplug: Drop MHP_MEMBLOCK_API
>>
>> So at a minimum we need a bit of patch staging guidance because this
>> obviously collides with the subsection bits that are built on top of
>> the existence of MHP_MEMBLOCK_API. What trigger do you envision as a
>> replacement that arch_add_memory() use to determine that subsection
>> operations should be disallowed?
>>
> 
> Looks like we now have time to sort it out :)
> 
> 
> Looking at your series
> 
> [PATCH v8 08/12] mm/sparsemem: Prepare for sub-section ranges
> 
> is the "single" effectively place using MHP_MEMBLOCK_API, namely
> "subsection_check()". Used when adding/removing memory.
> 
> 
> +static int subsection_check(unsigned long pfn, unsigned long nr_pages,
> +		unsigned long flags, const char *reason)
> +{
> +	/*
> +	 * Only allow partial section hotplug for !memblock ranges,
> +	 * since register_new_memory() requires section alignment, and
> +	 * CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully
> +	 * populated.
> +	 */
> +	if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
> +				|| (flags & MHP_MEMBLOCK_API))
> +			&& ((pfn & ~PAGE_SECTION_MASK)
> +				|| (nr_pages & ~PAGE_SECTION_MASK))) {
> +		WARN(1, "Sub-section hot-%s incompatible with %s\n", reason,
> +				(flags & MHP_MEMBLOCK_API)
> +				? "memblock api" : "!CONFIG_SPARSEMEM_VMEMMAP");
> +		return -EINVAL;
> +	}
> +	return 0;
>  }
> 
> 
> (flags & MHP_MEMBLOCK_API)) && ((pfn & ~PAGE_SECTION_MASK) || (nr_pages
> & ~PAGE_SECTION_MASK)))
> 
> sounds like something the caller (add_memory()) always has to take care
> of. No need to check. The one imposing this restriction is the only caller.
> 
> In my opinion, that check/function can go completely.
> 
> Am I missing something / missing another user?
> 

In other word, this series moves the restriction out of
arch_add_memory() and therefore you don't need subsection_check() at all
anymore. At least if I am not missing something :)

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling
  2019-05-07 19:37     ` David Hildenbrand
@ 2019-05-07 20:36       ` Dan Williams
  0 siblings, 0 replies; 45+ messages in thread
From: Dan Williams @ 2019-05-07 20:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Alex Deucher, Andrew Banman,
	Andy Lutomirski, Arun KS, Baoquan He, Benjamin Herrenschmidt,
	Borislav Petkov, Christophe Leroy, Chris Wilson, Dave Hansen,
	David S. Miller, Fenghua Yu, Greg Kroah-Hartman, Heiko Carstens,
	H. Peter Anvin, Ingo Molnar, Ingo Molnar, Jonathan Cameron,
	Joonsoo Kim, Kirill A. Shutemov, Logan Gunthorpe, Mark Brown,
	Martin Schwidefsky, Masahiro Yamada, Mathieu Malaterre,
	Michael Ellerman, Michal Hocko, Mike Rapoport, Mike Rapoport,
	mike.travis, Nicholas Piggin, Oscar Salvador, Oscar Salvador,
	Paul Mackerras, Pavel Tatashin, Pavel Tatashin, Peter Zijlstra,
	Qian Cai, Rafael J. Wysocki, Rich Felker, Rob Herring,
	Thomas Gleixner, Tony Luck, Vasily Gorbik, Wei Yang, Wei Yang,
	Yoshinori Sato

On Tue, May 7, 2019 at 12:38 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.05.19 21:21, David Hildenbrand wrote:
> > On 07.05.19 21:04, Dan Williams wrote:
> >> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
> >>>
> >>> We only want memory block devices for memory to be onlined/offlined
> >>> (add/remove from the buddy). This is required so user space can
> >>> online/offline memory and kdump gets notified about newly onlined memory.
> >>>
> >>> Only such memory has the requirement of having to span whole memory blocks.
> >>> Let's factor out creation/removal of memory block devices. This helps
> >>> to further cleanup arch_add_memory/arch_remove_memory() and to make
> >>> implementation of new features easier. E.g. supplying a driver for
> >>> memory block devices becomes way easier (so user space is able to
> >>> distinguish different types of added memory to properly online it).
> >>>
> >>> Patch 1 makes sure the memory block size granularity is always respected.
> >>> Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares
> >>> arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE.
> >>> Patch 4,5 and 6 factor out creation/removal of memory block devices.
> >>> Patch 7 gets rid of some unlikely errors that could have happened, not
> >>> removing links between memory block devices and nodes, previously brought
> >>> up by Oscar.
> >>>
> >>> Did a quick sanity test with DIMM plug/unplug, making sure all devices
> >>> and sysfs links properly get added/removed. Compile tested on s390x and
> >>> x86-64.
> >>>
> >>> Based on git://git.cmpxchg.org/linux-mmots.git
> >>>
> >>> Next refactoring on my list will be making sure that remove_memory()
> >>> will never deal with zones / access "struct pages". Any kind of zone
> >>> handling will have to be done when offlining system memory / before
> >>> removing device memory. I am thinking about remove_pfn_range_from_zone()",
> >>> du undo everything "move_pfn_range_to_zone()" did.
> >>>
> >>> v1 -> v2:
> >>> - s390x/mm: Implement arch_remove_memory()
> >>> -- remove mapping after "__remove_pages"
> >>>
> >>>
> >>> David Hildenbrand (8):
> >>>   mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
> >>>   s390x/mm: Implement arch_remove_memory()
> >>>   mm/memory_hotplug: arch_remove_memory() and __remove_pages() with
> >>>     CONFIG_MEMORY_HOTPLUG
> >>>   mm/memory_hotplug: Create memory block devices after arch_add_memory()
> >>>   mm/memory_hotplug: Drop MHP_MEMBLOCK_API
> >>
> >> So at a minimum we need a bit of patch staging guidance because this
> >> obviously collides with the subsection bits that are built on top of
> >> the existence of MHP_MEMBLOCK_API. What trigger do you envision as a
> >> replacement that arch_add_memory() use to determine that subsection
> >> operations should be disallowed?
> >>
> >
> > Looks like we now have time to sort it out :)
> >
> >
> > Looking at your series
> >
> > [PATCH v8 08/12] mm/sparsemem: Prepare for sub-section ranges
> >
> > is the "single" effectively place using MHP_MEMBLOCK_API, namely
> > "subsection_check()". Used when adding/removing memory.
> >
> >
> > +static int subsection_check(unsigned long pfn, unsigned long nr_pages,
> > +             unsigned long flags, const char *reason)
> > +{
> > +     /*
> > +      * Only allow partial section hotplug for !memblock ranges,
> > +      * since register_new_memory() requires section alignment, and
> > +      * CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully
> > +      * populated.
> > +      */
> > +     if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
> > +                             || (flags & MHP_MEMBLOCK_API))
> > +                     && ((pfn & ~PAGE_SECTION_MASK)
> > +                             || (nr_pages & ~PAGE_SECTION_MASK))) {
> > +             WARN(1, "Sub-section hot-%s incompatible with %s\n", reason,
> > +                             (flags & MHP_MEMBLOCK_API)
> > +                             ? "memblock api" : "!CONFIG_SPARSEMEM_VMEMMAP");
> > +             return -EINVAL;
> > +     }
> > +     return 0;
> >  }
> >
> >
> > (flags & MHP_MEMBLOCK_API)) && ((pfn & ~PAGE_SECTION_MASK) || (nr_pages
> > & ~PAGE_SECTION_MASK)))
> >
> > sounds like something the caller (add_memory()) always has to take care
> > of. No need to check. The one imposing this restriction is the only caller.
> >
> > In my opinion, that check/function can go completely.
> >
> > Am I missing something / missing another user?
> >
>
> In other word, this series moves the restriction out of
> arch_add_memory() and therefore you don't need subsection_check() at all
> anymore. At least if I am not missing something :)

Ah, ok. Only direct arch_add_memory() users need to be worried about
subsection hotplug and the add_memory_resource() + __remove_memory()
paths are already protected by check_hotplug_memory_range(). Ok, I can
get on board with the removal.

Let me go ahead and review this series so Andrew can get it pulled in
and I can rebase.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  2019-05-07 18:37 ` [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
@ 2019-05-07 20:38   ` Dan Williams
  2019-05-09 12:23   ` Wei Yang
  1 sibling, 0 replies; 45+ messages in thread
From: Dan Williams @ 2019-05-07 20:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang, Arun KS,
	Mathieu Malaterre

On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> By converting start and size to page granularity, we actually ignore
> unaligned parts within a page instead of properly bailing out with an
> error.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Dan Williams <dan.j.williams@intel.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory()
  2019-05-07 18:37 ` [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory() David Hildenbrand
@ 2019-05-07 20:46   ` Dan Williams
  2019-05-07 20:47     ` David Hildenbrand
  0 siblings, 1 reply; 45+ messages in thread
From: Dan Williams @ 2019-05-07 20:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Martin Schwidefsky,
	Heiko Carstens, Michal Hocko, Mike Rapoport, Vasily Gorbik,
	Oscar Salvador

On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> Will come in handy when wanting to handle errors after
> arch_add_memory().
>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/mm/init.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 31b1071315d7..1e0cbae69f12 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -237,12 +237,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
>  void arch_remove_memory(int nid, u64 start, u64 size,
>                         struct vmem_altmap *altmap)
>  {
> -       /*
> -        * There is no hardware or firmware interface which could trigger a
> -        * hot memory remove on s390. So there is nothing that needs to be
> -        * implemented.
> -        */
> -       BUG();
> +       unsigned long start_pfn = start >> PAGE_SHIFT;
> +       unsigned long nr_pages = size >> PAGE_SHIFT;
> +       struct zone *zone;
> +
> +       zone = page_zone(pfn_to_page(start_pfn));

Does s390 actually support passing in an altmap? If 'yes', I think it
also needs the vmem_altmap_offset() fixup like x86-64:

        /* With altmap the first mapped page is offset from @start */
        if (altmap)
                page += vmem_altmap_offset(altmap);

...but I suspect it does not support altmap since
arch/s390/mm/vmem.c::vmemmap_populate() does not arrange for 'struct
page' capacity to be allocated out of an altmap defined page pool.

I think it would be enough to disallow any arch_add_memory() on s390
where @altmap is non-NULL. At least until s390 gains ZONE_DEVICE
support and can enable the pmem use case.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory()
  2019-05-07 20:46   ` Dan Williams
@ 2019-05-07 20:47     ` David Hildenbrand
  2019-05-07 20:57       ` Dan Williams
  0 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 20:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Martin Schwidefsky,
	Heiko Carstens, Michal Hocko, Mike Rapoport, Vasily Gorbik,
	Oscar Salvador

On 07.05.19 22:46, Dan Williams wrote:
> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> Will come in handy when wanting to handle errors after
>> arch_add_memory().
>>
>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>> Cc: Oscar Salvador <osalvador@suse.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  arch/s390/mm/init.c | 13 +++++++------
>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>> index 31b1071315d7..1e0cbae69f12 100644
>> --- a/arch/s390/mm/init.c
>> +++ b/arch/s390/mm/init.c
>> @@ -237,12 +237,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
>>  void arch_remove_memory(int nid, u64 start, u64 size,
>>                         struct vmem_altmap *altmap)
>>  {
>> -       /*
>> -        * There is no hardware or firmware interface which could trigger a
>> -        * hot memory remove on s390. So there is nothing that needs to be
>> -        * implemented.
>> -        */
>> -       BUG();
>> +       unsigned long start_pfn = start >> PAGE_SHIFT;
>> +       unsigned long nr_pages = size >> PAGE_SHIFT;
>> +       struct zone *zone;
>> +
>> +       zone = page_zone(pfn_to_page(start_pfn));
> 
> Does s390 actually support passing in an altmap? If 'yes', I think it
> also needs the vmem_altmap_offset() fixup like x86-64:
> 
>         /* With altmap the first mapped page is offset from @start */
>         if (altmap)
>                 page += vmem_altmap_offset(altmap);
> 
> ...but I suspect it does not support altmap since
> arch/s390/mm/vmem.c::vmemmap_populate() does not arrange for 'struct
> page' capacity to be allocated out of an altmap defined page pool.
> 
> I think it would be enough to disallow any arch_add_memory() on s390
> where @altmap is non-NULL. At least until s390 gains ZONE_DEVICE
> support and can enable the pmem use case.
> 

As far as I know, it doesn't yet, however I guess this could change once
virtio-pmem is supported?

Thanks!

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory()
  2019-05-07 20:47     ` David Hildenbrand
@ 2019-05-07 20:57       ` Dan Williams
  2019-05-07 21:13         ` David Hildenbrand
  0 siblings, 1 reply; 45+ messages in thread
From: Dan Williams @ 2019-05-07 20:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Martin Schwidefsky,
	Heiko Carstens, Michal Hocko, Mike Rapoport, Vasily Gorbik,
	Oscar Salvador

On Tue, May 7, 2019 at 1:47 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.05.19 22:46, Dan Williams wrote:
> > On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> Will come in handy when wanting to handle errors after
> >> arch_add_memory().
> >>
> >> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> >> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Michal Hocko <mhocko@suse.com>
> >> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> >> Cc: David Hildenbrand <david@redhat.com>
> >> Cc: Vasily Gorbik <gor@linux.ibm.com>
> >> Cc: Oscar Salvador <osalvador@suse.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  arch/s390/mm/init.c | 13 +++++++------
> >>  1 file changed, 7 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> >> index 31b1071315d7..1e0cbae69f12 100644
> >> --- a/arch/s390/mm/init.c
> >> +++ b/arch/s390/mm/init.c
> >> @@ -237,12 +237,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>  void arch_remove_memory(int nid, u64 start, u64 size,
> >>                         struct vmem_altmap *altmap)
> >>  {
> >> -       /*
> >> -        * There is no hardware or firmware interface which could trigger a
> >> -        * hot memory remove on s390. So there is nothing that needs to be
> >> -        * implemented.
> >> -        */
> >> -       BUG();
> >> +       unsigned long start_pfn = start >> PAGE_SHIFT;
> >> +       unsigned long nr_pages = size >> PAGE_SHIFT;
> >> +       struct zone *zone;
> >> +
> >> +       zone = page_zone(pfn_to_page(start_pfn));
> >
> > Does s390 actually support passing in an altmap? If 'yes', I think it
> > also needs the vmem_altmap_offset() fixup like x86-64:
> >
> >         /* With altmap the first mapped page is offset from @start */
> >         if (altmap)
> >                 page += vmem_altmap_offset(altmap);
> >
> > ...but I suspect it does not support altmap since
> > arch/s390/mm/vmem.c::vmemmap_populate() does not arrange for 'struct
> > page' capacity to be allocated out of an altmap defined page pool.
> >
> > I think it would be enough to disallow any arch_add_memory() on s390
> > where @altmap is non-NULL. At least until s390 gains ZONE_DEVICE
> > support and can enable the pmem use case.
> >
>
> As far as I know, it doesn't yet, however I guess this could change once
> virtio-pmem is supported?

I would expect and request virtio-pmem remain a non-starter on s390
until s390 gains ZONE_DEVICE support. As it stands virtio-pmem is just
another flavor of the general pmem driver and the pmem driver
currently only exports ZONE_DEVICE pfns tagged by the PTE_DEVMAP
pte-flag and PFN_DEV+PFN_MAP pfn_t-flags.

A hamstrung version of DAX (CONFIG_FS_DAX_LIMITED) is enabled for the
s390/dcssblk driver, but that requires that the driver indicate this
limited use case via the PTE_SPECIAL pte-flag and PFN_SPECIAL
pfn_t-flag. I otherwise do not want to see CONFIG_FS_DAX_LIMITED
spread outside of the s390/dcssblk use case.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG
  2019-05-07 18:37 ` [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG David Hildenbrand
@ 2019-05-07 21:02   ` Dan Williams
  2019-05-07 21:06     ` David Hildenbrand
  2019-05-13  7:48     ` David Hildenbrand
  0 siblings, 2 replies; 45+ messages in thread
From: Dan Williams @ 2019-05-07 21:02 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Tony Luck, Fenghua Yu,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Martin Schwidefsky, Heiko Carstens, Yoshinori Sato, Rich Felker,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Greg Kroah-Hartman,
	Rafael J. Wysocki, Michal Hocko, Mike Rapoport, Oscar Salvador,
	Kirill A. Shutemov, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Christophe Leroy, Nicholas Piggin, Vasily Gorbik,
	Rob Herring, Masahiro Yamada, mike.travis, Andrew Banman,
	Pavel Tatashin, Wei Yang, Arun KS, Qian Cai, Mathieu Malaterre,
	Baoquan He, Logan Gunthorpe

On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> Let's prepare for better error handling while adding memory by allowing
> to use arch_remove_memory() and __remove_pages() even if
> CONFIG_MEMORY_HOTREMOVE is not set. CONFIG_MEMORY_HOTREMOVE effectively
> covers
> - Offlining of system ram (memory block devices) - offline_pages()
> - Unplug of system ram - remove_memory()
> - Unplug/remap of device memory - devm_memremap()
>
> This allows e.g. for handling like
>
> arch_add_memory()
> rc = do_something();
> if (rc) {
>         arch_remove_memory();
> }
>
> Whereby do_something() will for example be memory block device creation
> after it has been factored out.

What's left after this? Can we just get rid of CONFIG_MEMORY_HOTREMOVE
option completely when CONFIG_MEMORY_HOTPLUG is enabled? It's not
clear to me why there was ever the option to compile out the remove
code when the add code is included.

> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: Rich Felker <dalias@libc.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Wei Yang <richardw.yang@linux.intel.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Mathieu Malaterre <malat@debian.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/ia64/mm/init.c            | 2 --
>  arch/powerpc/mm/mem.c          | 2 --
>  arch/s390/mm/init.c            | 2 --
>  arch/sh/mm/init.c              | 2 --
>  arch/x86/mm/init_32.c          | 2 --
>  arch/x86/mm/init_64.c          | 2 --
>  drivers/base/memory.c          | 2 --
>  include/linux/memory.h         | 2 --
>  include/linux/memory_hotplug.h | 2 --
>  mm/memory_hotplug.c            | 2 --
>  mm/sparse.c                    | 6 ------
>  11 files changed, 26 deletions(-)
>
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index d28e29103bdb..aae75fd7b810 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -681,7 +681,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>         return ret;
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void arch_remove_memory(int nid, u64 start, u64 size,
>                         struct vmem_altmap *altmap)
>  {
> @@ -693,4 +692,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>         __remove_pages(zone, start_pfn, nr_pages, altmap);
>  }
>  #endif
> -#endif
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index a2b78e72452f..ddc69b59575c 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -131,7 +131,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
>         return __add_pages(nid, start_pfn, nr_pages, restrictions);
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void __ref arch_remove_memory(int nid, u64 start, u64 size,
>                              struct vmem_altmap *altmap)
>  {
> @@ -164,7 +163,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
>         resize_hpt_for_hotplug(memblock_phys_mem_size());
>  }
>  #endif
> -#endif /* CONFIG_MEMORY_HOTPLUG */
>
>  #ifndef CONFIG_NEED_MULTIPLE_NODES
>  void __init mem_topology_setup(void)
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 1e0cbae69f12..eafa3c750efc 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -233,7 +233,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>         return rc;
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void arch_remove_memory(int nid, u64 start, u64 size,
>                         struct vmem_altmap *altmap)
>  {
> @@ -245,5 +244,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>         __remove_pages(zone, start_pfn, nr_pages, altmap);
>         vmem_remove_mapping(start, size);
>  }
> -#endif
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> index 5aeb4d7099a1..59c5fe511f25 100644
> --- a/arch/sh/mm/init.c
> +++ b/arch/sh/mm/init.c
> @@ -428,7 +428,6 @@ int memory_add_physaddr_to_nid(u64 addr)
>  EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
>  #endif
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void arch_remove_memory(int nid, u64 start, u64 size,
>                         struct vmem_altmap *altmap)
>  {
> @@ -439,5 +438,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>         zone = page_zone(pfn_to_page(start_pfn));
>         __remove_pages(zone, start_pfn, nr_pages, altmap);
>  }
> -#endif
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index 075e568098f2..8d4bf2d97d50 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -859,7 +859,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>         return __add_pages(nid, start_pfn, nr_pages, restrictions);
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void arch_remove_memory(int nid, u64 start, u64 size,
>                         struct vmem_altmap *altmap)
>  {
> @@ -871,7 +870,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>         __remove_pages(zone, start_pfn, nr_pages, altmap);
>  }
>  #endif
> -#endif
>
>  int kernel_set_to_readonly __read_mostly;
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 20d14254b686..f1b55ddea23f 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1131,7 +1131,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
>         remove_pagetable(start, end, false, altmap);
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void __meminit
>  kernel_physical_mapping_remove(unsigned long start, unsigned long end)
>  {
> @@ -1156,7 +1155,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
>         __remove_pages(zone, start_pfn, nr_pages, altmap);
>         kernel_physical_mapping_remove(start, start + size);
>  }
> -#endif
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>
>  static struct kcore_list kcore_vsyscall;
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index f180427e48f4..6e0cb4fda179 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -728,7 +728,6 @@ int hotplug_memory_register(int nid, struct mem_section *section)
>         return ret;
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void
>  unregister_memory(struct memory_block *memory)
>  {
> @@ -767,7 +766,6 @@ void unregister_memory_section(struct mem_section *section)
>  out_unlock:
>         mutex_unlock(&mem_sysfs_mutex);
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>
>  /* return true if the memory block is offlined, otherwise, return false */
>  bool is_memblock_offlined(struct memory_block *mem)
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index e1dc1bb2b787..474c7c60c8f2 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -112,9 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
>  extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>  int hotplug_memory_register(int nid, struct mem_section *section);
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  extern void unregister_memory_section(struct mem_section *);
> -#endif
>  extern int memory_dev_init(void);
>  extern int memory_notify(unsigned long val, void *v);
>  extern int memory_isolate_notify(unsigned long val, void *v);
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index ae892eef8b82..2d4de313926d 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -123,12 +123,10 @@ static inline bool movable_node_is_enabled(void)
>         return movable_node_enabled;
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  extern void arch_remove_memory(int nid, u64 start, u64 size,
>                                struct vmem_altmap *altmap);
>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
>                            unsigned long nr_pages, struct vmem_altmap *altmap);
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>
>  /*
>   * Do we want sysfs memblock files created. This will allow userspace to online
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 202febe88b58..7b5439839d67 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -317,7 +317,6 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>         return err;
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  /* find the smallest valid pfn in the range [start_pfn, end_pfn) */
>  static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
>                                      unsigned long start_pfn,
> @@ -581,7 +580,6 @@ void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>
>         set_zone_contiguous(zone);
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>
>  int set_online_page_callback(online_page_callback_t callback)
>  {
> diff --git a/mm/sparse.c b/mm/sparse.c
> index fd13166949b5..d1d5e05f5b8d 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -604,7 +604,6 @@ static void __kfree_section_memmap(struct page *memmap,
>
>         vmemmap_free(start, end, altmap);
>  }
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void free_map_bootmem(struct page *memmap)
>  {
>         unsigned long start = (unsigned long)memmap;
> @@ -612,7 +611,6 @@ static void free_map_bootmem(struct page *memmap)
>
>         vmemmap_free(start, end, NULL);
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>  #else
>  static struct page *__kmalloc_section_memmap(void)
>  {
> @@ -651,7 +649,6 @@ static void __kfree_section_memmap(struct page *memmap,
>                            get_order(sizeof(struct page) * PAGES_PER_SECTION));
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void free_map_bootmem(struct page *memmap)
>  {
>         unsigned long maps_section_nr, removing_section_nr, i;
> @@ -681,7 +678,6 @@ static void free_map_bootmem(struct page *memmap)
>                         put_page_bootmem(page);
>         }
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>
>  /**
> @@ -746,7 +742,6 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
>         return ret;
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  #ifdef CONFIG_MEMORY_FAILURE
>  static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
>  {
> @@ -823,5 +818,4 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
>                         PAGES_PER_SECTION - map_offset);
>         free_section_usemap(memmap, usemap, altmap);
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> --
> 2.20.1
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG
  2019-05-07 21:02   ` Dan Williams
@ 2019-05-07 21:06     ` David Hildenbrand
  2019-05-13  7:48     ` David Hildenbrand
  1 sibling, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 21:06 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Tony Luck, Fenghua Yu,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Martin Schwidefsky, Heiko Carstens, Yoshinori Sato, Rich Felker,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Greg Kroah-Hartman,
	Rafael J. Wysocki, Michal Hocko, Mike Rapoport, Oscar Salvador,
	Kirill A. Shutemov, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Christophe Leroy, Nicholas Piggin, Vasily Gorbik,
	Rob Herring, Masahiro Yamada, mike.travis, Andrew Banman,
	Pavel Tatashin, Wei Yang, Arun KS, Qian Cai, Mathieu Malaterre,
	Baoquan He, Logan Gunthorpe

On 07.05.19 23:02, Dan Williams wrote:
> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> Let's prepare for better error handling while adding memory by allowing
>> to use arch_remove_memory() and __remove_pages() even if
>> CONFIG_MEMORY_HOTREMOVE is not set. CONFIG_MEMORY_HOTREMOVE effectively
>> covers
>> - Offlining of system ram (memory block devices) - offline_pages()
>> - Unplug of system ram - remove_memory()
>> - Unplug/remap of device memory - devm_memremap()
>>
>> This allows e.g. for handling like
>>
>> arch_add_memory()
>> rc = do_something();
>> if (rc) {
>>         arch_remove_memory();
>> }
>>
>> Whereby do_something() will for example be memory block device creation
>> after it has been factored out.
> 
> What's left after this?

I tried to describe this above here:

- Offlining of system ram (memory block devices) - offline_pages()
- Unplug of system ram - remove_memory()
- Unplug/remap of device memory - devm_memremap()

So administrators cannot trigger offlining/removal of memory.

> Can we just get rid of CONFIG_MEMORY_HOTREMOVE
> option completely when CONFIG_MEMORY_HOTPLUG is enabled? It's not
> clear to me why there was ever the option to compile out the remove
> code when the add code is included.

I guess it was a configure option because initially, offlining/unplug
was extremely unstable. Now it's only slightly unstable :D

I would actually favor getting rid of CONFIG_MEMORY_HOTREMOVE
completely. After all, unplug always has to be triggered by an admin (HW
or software).

Opinions?

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory()
  2019-05-07 20:57       ` Dan Williams
@ 2019-05-07 21:13         ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 21:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Martin Schwidefsky,
	Heiko Carstens, Michal Hocko, Mike Rapoport, Vasily Gorbik,
	Oscar Salvador

On 07.05.19 22:57, Dan Williams wrote:
> On Tue, May 7, 2019 at 1:47 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 07.05.19 22:46, Dan Williams wrote:
>>> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> Will come in handy when wanting to handle errors after
>>>> arch_add_memory().
>>>>
>>>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>>>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>>>> Cc: Oscar Salvador <osalvador@suse.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  arch/s390/mm/init.c | 13 +++++++------
>>>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>>>> index 31b1071315d7..1e0cbae69f12 100644
>>>> --- a/arch/s390/mm/init.c
>>>> +++ b/arch/s390/mm/init.c
>>>> @@ -237,12 +237,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
>>>>  void arch_remove_memory(int nid, u64 start, u64 size,
>>>>                         struct vmem_altmap *altmap)
>>>>  {
>>>> -       /*
>>>> -        * There is no hardware or firmware interface which could trigger a
>>>> -        * hot memory remove on s390. So there is nothing that needs to be
>>>> -        * implemented.
>>>> -        */
>>>> -       BUG();
>>>> +       unsigned long start_pfn = start >> PAGE_SHIFT;
>>>> +       unsigned long nr_pages = size >> PAGE_SHIFT;
>>>> +       struct zone *zone;
>>>> +
>>>> +       zone = page_zone(pfn_to_page(start_pfn));
>>>
>>> Does s390 actually support passing in an altmap? If 'yes', I think it
>>> also needs the vmem_altmap_offset() fixup like x86-64:
>>>
>>>         /* With altmap the first mapped page is offset from @start */
>>>         if (altmap)
>>>                 page += vmem_altmap_offset(altmap);
>>>
>>> ...but I suspect it does not support altmap since
>>> arch/s390/mm/vmem.c::vmemmap_populate() does not arrange for 'struct
>>> page' capacity to be allocated out of an altmap defined page pool.
>>>
>>> I think it would be enough to disallow any arch_add_memory() on s390
>>> where @altmap is non-NULL. At least until s390 gains ZONE_DEVICE
>>> support and can enable the pmem use case.
>>>
>>
>> As far as I know, it doesn't yet, however I guess this could change once
>> virtio-pmem is supported?
> 
> I would expect and request virtio-pmem remain a non-starter on s390
> until s390 gains ZONE_DEVICE support. As it stands virtio-pmem is just
> another flavor of the general pmem driver and the pmem driver
> currently only exports ZONE_DEVICE pfns tagged by the PTE_DEVMAP
> pte-flag and PFN_DEV+PFN_MAP pfn_t-flags.

Yes, I think ZONE_DEVICE will be the way to go. On real HW, there will
never be anything mapped into the physical address space besides system
ram. However with virtio-pmem in virtual environments, we have the
option to change that.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
@ 2019-05-07 21:17   ` Dan Williams
  2019-05-07 21:27     ` David Hildenbrand
  2019-05-08  8:35   ` David Hildenbrand
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 45+ messages in thread
From: Dan Williams @ 2019-05-07 21:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang,
	Arun KS, Mathieu Malaterre

On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> Only memory to be added to the buddy and to be onlined/offlined by
> user space using memory block devices needs (and should have!) memory
> block devices.
>
> Factor out creation of memory block devices Create all devices after
> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
> because it is now effectively stale.
>
> Only after memory block devices have been added, memory can be onlined
> by user space. This implies, that memory is not visible to user space at
> all before arch_add_memory() succeeded.

Nice!

>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
>  include/linux/memory.h |  2 +-
>  mm/memory_hotplug.c    | 15 ++++-----
>  3 files changed, 53 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 6e0cb4fda179..862c202a18ca 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
>         return 0;
>  }
>
> +static void unregister_memory(struct memory_block *memory)
> +{
> +       BUG_ON(memory->dev.bus != &memory_subsys);

Given this should never happen and only a future kernel developer
might trip over it, do we really need to kill that developer's
machine? I.e. s/BUG/WARN/? I guess an argument can be made to move
such a change that to a follow-on patch since you're just preserving
existing behavior, but I figure might as well address these as the
code is refactored.

> +
> +       /* drop the ref. we got via find_memory_block() */
> +       put_device(&memory->dev);
> +       device_unregister(&memory->dev);
> +}
> +
>  /*
> - * need an interface for the VM to add new memory regions,
> - * but without onlining it.
> + * Create memory block devices for the given memory area. Start and size
> + * have to be aligned to memory block granularity. Memory block devices
> + * will be initialized as offline.
>   */
> -int hotplug_memory_register(int nid, struct mem_section *section)
> +int hotplug_memory_register(unsigned long start, unsigned long size)
>  {
> -       int ret = 0;
> +       unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
> +       unsigned long start_pfn = PFN_DOWN(start);
> +       unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
> +       unsigned long pfn;
>         struct memory_block *mem;
> +       int ret = 0;
>
> -       mutex_lock(&mem_sysfs_mutex);
> +       BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
> +       BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));

Perhaps:

    if (WARN_ON(...))
        return -EINVAL;

>
> -       mem = find_memory_block(section);
> -       if (mem) {
> -               mem->section_count++;
> -               put_device(&mem->dev);
> -       } else {
> -               ret = init_memory_block(&mem, section, MEM_OFFLINE);
> +       mutex_lock(&mem_sysfs_mutex);
> +       for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +               mem = find_memory_block(__pfn_to_section(pfn));
> +               if (mem) {
> +                       WARN_ON_ONCE(false);

?? Isn't that a nop?

> +                       put_device(&mem->dev);
> +                       continue;
> +               }
> +               ret = init_memory_block(&mem, __pfn_to_section(pfn),
> +                                       MEM_OFFLINE);
>                 if (ret)
> -                       goto out;
> -               mem->section_count++;
> +                       break;
> +               mem->section_count = memory_block_size_bytes() /
> +                                    MIN_MEMORY_BLOCK_SIZE;
> +       }
> +       if (ret) {
> +               end_pfn = pfn;
> +               for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +                       mem = find_memory_block(__pfn_to_section(pfn));
> +                       if (!mem)
> +                               continue;
> +                       mem->section_count = 0;
> +                       unregister_memory(mem);
> +               }
>         }
> -
> -out:
>         mutex_unlock(&mem_sysfs_mutex);
>         return ret;
>  }
>
> -static void
> -unregister_memory(struct memory_block *memory)
> -{
> -       BUG_ON(memory->dev.bus != &memory_subsys);
> -
> -       /* drop the ref. we got via find_memory_block() */
> -       put_device(&memory->dev);
> -       device_unregister(&memory->dev);
> -}
> -
> -void unregister_memory_section(struct mem_section *section)
> +static int remove_memory_section(struct mem_section *section)
>  {
>         struct memory_block *mem;
>
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index 474c7c60c8f2..95505fbb5f85 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -111,7 +111,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
>  extern void unregister_memory_notifier(struct notifier_block *nb);
>  extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
> -int hotplug_memory_register(int nid, struct mem_section *section);
> +int hotplug_memory_register(unsigned long start, unsigned long size);
>  extern void unregister_memory_section(struct mem_section *);
>  extern int memory_dev_init(void);
>  extern int memory_notify(unsigned long val, void *v);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 7b5439839d67..e1637c8a0723 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -258,13 +258,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>                 return -EEXIST;
>
>         ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
> -       if (ret < 0)
> -               return ret;
> -
> -       if (!want_memblock)
> -               return 0;
> -
> -       return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
> +       return ret < 0 ? ret : 0;
>  }
>
>  /*
> @@ -1106,6 +1100,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
>         if (ret < 0)
>                 goto error;
>
> +       /* create memory block devices after memory was added */
> +       ret = hotplug_memory_register(start, size);
> +       if (ret) {
> +               arch_remove_memory(nid, start, size, NULL);
> +               goto error;
> +       }
> +
>         if (new_node) {
>                 /* If sysfs file of new node can't be created, cpu on the node
>                  * can't be hot-added. There is no rollback way now.
> --
> 2.20.1
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-07 18:38 ` [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API David Hildenbrand
@ 2019-05-07 21:19   ` Dan Williams
  2019-05-07 21:24     ` David Hildenbrand
  0 siblings, 1 reply; 45+ messages in thread
From: Dan Williams @ 2019-05-07 21:19 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai,
	Arun KS, Mathieu Malaterre

On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> No longer needed, the callers of arch_add_memory() can handle this
> manually.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/linux/memory_hotplug.h | 8 --------
>  mm/memory_hotplug.c            | 9 +++------
>  2 files changed, 3 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 2d4de313926d..2f1f87e13baa 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
>                            unsigned long nr_pages, struct vmem_altmap *altmap);
>
> -/*
> - * Do we want sysfs memblock files created. This will allow userspace to online
> - * and offline memory explicitly. Lack of this bit means that the caller has to
> - * call move_pfn_range_to_zone to finish the initialization.
> - */
> -
> -#define MHP_MEMBLOCK_API               (1<<0)
> -
>  /* reasonably generic interface to expand the physical pages */
>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>                        struct mhp_restrictions *restrictions);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index e1637c8a0723..107f72952347 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
>  #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>
>  static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> -               struct vmem_altmap *altmap, bool want_memblock)
> +                                  struct vmem_altmap *altmap)
>  {
>         int ret;
>
> @@ -293,8 +293,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>         }
>
>         for (i = start_sec; i <= end_sec; i++) {
> -               err = __add_section(nid, section_nr_to_pfn(i), altmap,
> -                               restrictions->flags & MHP_MEMBLOCK_API);
> +               err = __add_section(nid, section_nr_to_pfn(i), altmap);
>
>                 /*
>                  * EEXIST is finally dealt with by ioresource collision
> @@ -1066,9 +1065,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>   */
>  int __ref add_memory_resource(int nid, struct resource *res)
>  {
> -       struct mhp_restrictions restrictions = {
> -               .flags = MHP_MEMBLOCK_API,
> -       };
> +       struct mhp_restrictions restrictions = {};

With mhp_restrictions.flags no longer needed, can we drop
mhp_restrictions and just go back to a plain @altmap argument?


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-07 21:19   ` Dan Williams
@ 2019-05-07 21:24     ` David Hildenbrand
  2019-05-07 21:25       ` Dan Williams
  0 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 21:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai,
	Arun KS, Mathieu Malaterre

On 07.05.19 23:19, Dan Williams wrote:
> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> No longer needed, the callers of arch_add_memory() can handle this
>> manually.
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Oscar Salvador <osalvador@suse.com>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> Cc: Qian Cai <cai@lca.pw>
>> Cc: Arun KS <arunks@codeaurora.org>
>> Cc: Mathieu Malaterre <malat@debian.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  include/linux/memory_hotplug.h | 8 --------
>>  mm/memory_hotplug.c            | 9 +++------
>>  2 files changed, 3 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>> index 2d4de313926d..2f1f87e13baa 100644
>> --- a/include/linux/memory_hotplug.h
>> +++ b/include/linux/memory_hotplug.h
>> @@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
>>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
>>                            unsigned long nr_pages, struct vmem_altmap *altmap);
>>
>> -/*
>> - * Do we want sysfs memblock files created. This will allow userspace to online
>> - * and offline memory explicitly. Lack of this bit means that the caller has to
>> - * call move_pfn_range_to_zone to finish the initialization.
>> - */
>> -
>> -#define MHP_MEMBLOCK_API               (1<<0)
>> -
>>  /* reasonably generic interface to expand the physical pages */
>>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>>                        struct mhp_restrictions *restrictions);
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index e1637c8a0723..107f72952347 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
>>  #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>>
>>  static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>> -               struct vmem_altmap *altmap, bool want_memblock)
>> +                                  struct vmem_altmap *altmap)
>>  {
>>         int ret;
>>
>> @@ -293,8 +293,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>>         }
>>
>>         for (i = start_sec; i <= end_sec; i++) {
>> -               err = __add_section(nid, section_nr_to_pfn(i), altmap,
>> -                               restrictions->flags & MHP_MEMBLOCK_API);
>> +               err = __add_section(nid, section_nr_to_pfn(i), altmap);
>>
>>                 /*
>>                  * EEXIST is finally dealt with by ioresource collision
>> @@ -1066,9 +1065,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>>   */
>>  int __ref add_memory_resource(int nid, struct resource *res)
>>  {
>> -       struct mhp_restrictions restrictions = {
>> -               .flags = MHP_MEMBLOCK_API,
>> -       };
>> +       struct mhp_restrictions restrictions = {};
> 
> With mhp_restrictions.flags no longer needed, can we drop
> mhp_restrictions and just go back to a plain @altmap argument?
> 

Oscar wants to use it to configure from where to allocate vmemmaps. That
was the original driver behind it.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-07 21:24     ` David Hildenbrand
@ 2019-05-07 21:25       ` Dan Williams
  2019-05-08  7:39         ` David Hildenbrand
  0 siblings, 1 reply; 45+ messages in thread
From: Dan Williams @ 2019-05-07 21:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai,
	Arun KS, Mathieu Malaterre

On Tue, May 7, 2019 at 2:24 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.05.19 23:19, Dan Williams wrote:
> > On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> No longer needed, the callers of arch_add_memory() can handle this
> >> manually.
> >>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: David Hildenbrand <david@redhat.com>
> >> Cc: Michal Hocko <mhocko@suse.com>
> >> Cc: Oscar Salvador <osalvador@suse.com>
> >> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> >> Cc: Wei Yang <richard.weiyang@gmail.com>
> >> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >> Cc: Qian Cai <cai@lca.pw>
> >> Cc: Arun KS <arunks@codeaurora.org>
> >> Cc: Mathieu Malaterre <malat@debian.org>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  include/linux/memory_hotplug.h | 8 --------
> >>  mm/memory_hotplug.c            | 9 +++------
> >>  2 files changed, 3 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> >> index 2d4de313926d..2f1f87e13baa 100644
> >> --- a/include/linux/memory_hotplug.h
> >> +++ b/include/linux/memory_hotplug.h
> >> @@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
> >>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
> >>                            unsigned long nr_pages, struct vmem_altmap *altmap);
> >>
> >> -/*
> >> - * Do we want sysfs memblock files created. This will allow userspace to online
> >> - * and offline memory explicitly. Lack of this bit means that the caller has to
> >> - * call move_pfn_range_to_zone to finish the initialization.
> >> - */
> >> -
> >> -#define MHP_MEMBLOCK_API               (1<<0)
> >> -
> >>  /* reasonably generic interface to expand the physical pages */
> >>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> >>                        struct mhp_restrictions *restrictions);
> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> >> index e1637c8a0723..107f72952347 100644
> >> --- a/mm/memory_hotplug.c
> >> +++ b/mm/memory_hotplug.c
> >> @@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
> >>  #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
> >>
> >>  static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> >> -               struct vmem_altmap *altmap, bool want_memblock)
> >> +                                  struct vmem_altmap *altmap)
> >>  {
> >>         int ret;
> >>
> >> @@ -293,8 +293,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
> >>         }
> >>
> >>         for (i = start_sec; i <= end_sec; i++) {
> >> -               err = __add_section(nid, section_nr_to_pfn(i), altmap,
> >> -                               restrictions->flags & MHP_MEMBLOCK_API);
> >> +               err = __add_section(nid, section_nr_to_pfn(i), altmap);
> >>
> >>                 /*
> >>                  * EEXIST is finally dealt with by ioresource collision
> >> @@ -1066,9 +1065,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
> >>   */
> >>  int __ref add_memory_resource(int nid, struct resource *res)
> >>  {
> >> -       struct mhp_restrictions restrictions = {
> >> -               .flags = MHP_MEMBLOCK_API,
> >> -       };
> >> +       struct mhp_restrictions restrictions = {};
> >
> > With mhp_restrictions.flags no longer needed, can we drop
> > mhp_restrictions and just go back to a plain @altmap argument?
> >
>
> Oscar wants to use it to configure from where to allocate vmemmaps. That
> was the original driver behind it.
>

Ah, ok, makes sense.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 6/8] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
  2019-05-07 18:38 ` [PATCH v2 6/8] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
@ 2019-05-07 21:27   ` Dan Williams
  0 siblings, 0 replies; 45+ messages in thread
From: Dan Williams @ 2019-05-07 21:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Andrew Banman, Ingo Molnar,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Oscar Salvador, Jonathan Cameron, Michal Hocko, Pavel Tatashin,
	Arun KS, Mathieu Malaterre

On Tue, May 7, 2019 at 11:39 AM David Hildenbrand <david@redhat.com> wrote:
>
> Let's factor out removing of memory block devices, which is only
> necessary for memory added via add_memory() and friends that created
> memory block devices. Remove the devices before calling
> arch_remove_memory().
>
> This finishes factoring out memory block device handling from
> arch_add_memory() and arch_remove_memory().

Also nice! makes it easier in the future for the "device-memory" use
case to not avoid messing up the typical memory hotplug flow.

>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/memory.c  | 39 +++++++++++++++++++--------------------
>  drivers/base/node.c    | 11 ++++++-----
>  include/linux/memory.h |  2 +-
>  include/linux/node.h   |  6 ++----
>  mm/memory_hotplug.c    |  5 +++--
>  5 files changed, 31 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 862c202a18ca..47ff49058d1f 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -756,32 +756,31 @@ int hotplug_memory_register(unsigned long start, unsigned long size)
>         return ret;
>  }
>
> -static int remove_memory_section(struct mem_section *section)
> +/*
> + * Remove memory block devices for the given memory area. Start and size
> + * have to be aligned to memory block granularity. Memory block devices
> + * have to be offline.
> + */
> +void hotplug_memory_unregister(unsigned long start, unsigned long size)
>  {
> +       unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
> +       unsigned long start_pfn = PFN_DOWN(start);
> +       unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>         struct memory_block *mem;
> +       unsigned long pfn;
>
> -       if (WARN_ON_ONCE(!present_section(section)))
> -               return;
> +       BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
> +       BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));

Similar BUG_ON vs comments WARN_ON comments as the previous patch.

>
>         mutex_lock(&mem_sysfs_mutex);
> -
> -       /*
> -        * Some users of the memory hotplug do not want/need memblock to
> -        * track all sections. Skip over those.
> -        */
> -       mem = find_memory_block(section);
> -       if (!mem)
> -               goto out_unlock;
> -
> -       unregister_mem_sect_under_nodes(mem, __section_nr(section));
> -
> -       mem->section_count--;
> -       if (mem->section_count == 0)
> +       for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +               mem = find_memory_block(__pfn_to_section(pfn));
> +               if (!mem)
> +                       continue;
> +               mem->section_count = 0;
> +               unregister_memory_block_under_nodes(mem);
>                 unregister_memory(mem);
> -       else
> -               put_device(&mem->dev);
> -
> -out_unlock:
> +       }
>         mutex_unlock(&mem_sysfs_mutex);
>  }
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 8598fcbd2a17..04fdfa99b8bc 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -801,9 +801,10 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
>         return 0;
>  }
>
> -/* unregister memory section under all nodes that it spans */
> -int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -                                   unsigned long phys_index)
> +/*
> + * Unregister memory block device under all nodes that it spans.
> + */
> +int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  {
>         NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
>         unsigned long pfn, sect_start_pfn, sect_end_pfn;
> @@ -816,8 +817,8 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>                 return -ENOMEM;
>         nodes_clear(*unlinked_nodes);
>
> -       sect_start_pfn = section_nr_to_pfn(phys_index);
> -       sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
> +       sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
> +       sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
>         for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
>                 int nid;
>
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index 95505fbb5f85..aa236c2a0466 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -112,7 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
>  extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>  int hotplug_memory_register(unsigned long start, unsigned long size);
> -extern void unregister_memory_section(struct mem_section *);
> +void hotplug_memory_unregister(unsigned long start, unsigned long size);
>  extern int memory_dev_init(void);
>  extern int memory_notify(unsigned long val, void *v);
>  extern int memory_isolate_notify(unsigned long val, void *v);
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 1a557c589ecb..02a29e71b175 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -139,8 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
>  extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
>  extern int register_mem_sect_under_node(struct memory_block *mem_blk,
>                                                 void *arg);
> -extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -                                          unsigned long phys_index);
> +extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
>
>  extern int register_memory_node_under_compute_node(unsigned int mem_nid,
>                                                    unsigned int cpu_nid,
> @@ -176,8 +175,7 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
>  {
>         return 0;
>  }
> -static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -                                                 unsigned long phys_index)
> +static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  {
>         return 0;
>  }
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 107f72952347..527fe4f9c620 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -519,8 +519,6 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
>         if (WARN_ON_ONCE(!valid_section(ms)))
>                 return;
>
> -       unregister_memory_section(ms);
> -
>         scn_nr = __section_nr(ms);
>         start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
>         __remove_zone(zone, start_pfn);
> @@ -1844,6 +1842,9 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
>         memblock_free(start, size);
>         memblock_remove(start, size);
>
> +       /* remove memory block devices before removing memory */
> +       hotplug_memory_unregister(start, size);
> +
>         arch_remove_memory(nid, start, size, NULL);
>         __release_memory_resource(start, size);

Other than the BUG_ON concern you can add

Reviewed-by: Dan Williams <dan.j.williams@intel.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-07 21:17   ` Dan Williams
@ 2019-05-07 21:27     ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2019-05-07 21:27 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang,
	Arun KS, Mathieu Malaterre

>> +static void unregister_memory(struct memory_block *memory)
>> +{
>> +       BUG_ON(memory->dev.bus != &memory_subsys);
> 
> Given this should never happen and only a future kernel developer
> might trip over it, do we really need to kill that developer's
> machine? I.e. s/BUG/WARN/? I guess an argument can be made to move
> such a change that to a follow-on patch since you're just preserving
> existing behavior, but I figure might as well address these as the
> code is refactored.

I assume only

if (WARN ...)
	return;

makes sense then, right?

> 
>> +
>> +       /* drop the ref. we got via find_memory_block() */
>> +       put_device(&memory->dev);
>> +       device_unregister(&memory->dev);
>> +}
>> +
>>  /*
>> - * need an interface for the VM to add new memory regions,
>> - * but without onlining it.
>> + * Create memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * will be initialized as offline.
>>   */
>> -int hotplug_memory_register(int nid, struct mem_section *section)
>> +int hotplug_memory_register(unsigned long start, unsigned long size)
>>  {
>> -       int ret = 0;
>> +       unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
>> +       unsigned long start_pfn = PFN_DOWN(start);
>> +       unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>> +       unsigned long pfn;
>>         struct memory_block *mem;
>> +       int ret = 0;
>>
>> -       mutex_lock(&mem_sysfs_mutex);
>> +       BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
>> +       BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
> 
> Perhaps:
> 
>     if (WARN_ON(...))
>         return -EINVAL;
> 

Yes, guess this souldn't hurt.

>>
>> -       mem = find_memory_block(section);
>> -       if (mem) {
>> -               mem->section_count++;
>> -               put_device(&mem->dev);
>> -       } else {
>> -               ret = init_memory_block(&mem, section, MEM_OFFLINE);
>> +       mutex_lock(&mem_sysfs_mutex);
>> +       for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
>> +               mem = find_memory_block(__pfn_to_section(pfn));
>> +               if (mem) {
>> +                       WARN_ON_ONCE(false);
> 
> ?? Isn't that a nop?

Yes, that makes no sense :)

Thanks!

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-05-07 18:38 ` [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
@ 2019-05-08  0:15   ` Dan Williams
  2019-05-08  7:21     ` David Hildenbrand
  0 siblings, 1 reply; 45+ messages in thread
From: Dan Williams @ 2019-05-08  0:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Oscar Salvador, Jonathan Cameron

On Tue, May 7, 2019 at 11:39 AM David Hildenbrand <david@redhat.com> wrote:
>
> We really don't want anything during memory hotunplug to fail.
> We always pass a valid memory block device, that check can go. Avoid
> allocating memory and eventually failing. As we are always called under
> lock, we can use a static piece of memory. This avoids having to put
> the structure onto the stack, having to guess about the stack size
> of callers.
>
> Patch inspired by a patch from Oscar Salvador.
>
> In the future, there might be no need to iterate over nodes at all.
> mem->nid should tell us exactly what to remove. Memory block devices
> with mixed nodes (added during boot) should properly fenced off and never
> removed.
>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/node.c  | 18 +++++-------------
>  include/linux/node.h |  5 ++---
>  2 files changed, 7 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 04fdfa99b8bc..9be88fd05147 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
>
>  /*
>   * Unregister memory block device under all nodes that it spans.
> + * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).

Given this comment can bitrot relative to the implementation lets
instead add an explicit:

    lockdep_assert_held(&mem_sysfs_mutex);

With that you can add:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 8/8] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section
  2019-05-07 18:38 ` [PATCH v2 8/8] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
@ 2019-05-08  0:30   ` Dan Williams
  0 siblings, 0 replies; 45+ messages in thread
From: Dan Williams @ 2019-05-08  0:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton

On Tue, May 7, 2019 at 11:39 AM David Hildenbrand <david@redhat.com> wrote:
>
> Unused, and memory unplug path should never care about zones. This is
> the job of memory offlining. ZONE_DEVICE might require special care -
> the caller of arch_remove_memory() should handle this.

The ZONE_DEVICE usage does not require special care so you can drop
that comment. The only place it's used in the subsection patches is to
lookup the node-id, but it turns out that the resulting node-id is
then never used.

With the ZONE_DEVICE mention dropped out of changelog you can add:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-05-08  0:15   ` Dan Williams
@ 2019-05-08  7:21     ` David Hildenbrand
  2019-05-08 13:50       ` Dan Williams
  0 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-08  7:21 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Oscar Salvador, Jonathan Cameron


>>  drivers/base/node.c  | 18 +++++-------------
>>  include/linux/node.h |  5 ++---
>>  2 files changed, 7 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index 04fdfa99b8bc..9be88fd05147 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
>>
>>  /*
>>   * Unregister memory block device under all nodes that it spans.
>> + * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).
> 
> Given this comment can bitrot relative to the implementation lets
> instead add an explicit:
> 
>     lockdep_assert_held(&mem_sysfs_mutex);

That would require to make the mutex non-static. Is that what you
suggest, or any other alternative?

Thanks Dan!

> 
> With that you can add:
> 
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> 


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-07 21:25       ` Dan Williams
@ 2019-05-08  7:39         ` David Hildenbrand
  2019-05-08 23:08           ` osalvador
  0 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-08  7:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai,
	Arun KS, Mathieu Malaterre

On 07.05.19 23:25, Dan Williams wrote:
> On Tue, May 7, 2019 at 2:24 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 07.05.19 23:19, Dan Williams wrote:
>>> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> No longer needed, the callers of arch_add_memory() can handle this
>>>> manually.
>>>>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>> Cc: Oscar Salvador <osalvador@suse.com>
>>>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>>>> Cc: Wei Yang <richard.weiyang@gmail.com>
>>>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>> Cc: Qian Cai <cai@lca.pw>
>>>> Cc: Arun KS <arunks@codeaurora.org>
>>>> Cc: Mathieu Malaterre <malat@debian.org>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  include/linux/memory_hotplug.h | 8 --------
>>>>  mm/memory_hotplug.c            | 9 +++------
>>>>  2 files changed, 3 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>>>> index 2d4de313926d..2f1f87e13baa 100644
>>>> --- a/include/linux/memory_hotplug.h
>>>> +++ b/include/linux/memory_hotplug.h
>>>> @@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
>>>>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
>>>>                            unsigned long nr_pages, struct vmem_altmap *altmap);
>>>>
>>>> -/*
>>>> - * Do we want sysfs memblock files created. This will allow userspace to online
>>>> - * and offline memory explicitly. Lack of this bit means that the caller has to
>>>> - * call move_pfn_range_to_zone to finish the initialization.
>>>> - */
>>>> -
>>>> -#define MHP_MEMBLOCK_API               (1<<0)
>>>> -
>>>>  /* reasonably generic interface to expand the physical pages */
>>>>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>>>>                        struct mhp_restrictions *restrictions);
>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>> index e1637c8a0723..107f72952347 100644
>>>> --- a/mm/memory_hotplug.c
>>>> +++ b/mm/memory_hotplug.c
>>>> @@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
>>>>  #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>>>>
>>>>  static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>>>> -               struct vmem_altmap *altmap, bool want_memblock)
>>>> +                                  struct vmem_altmap *altmap)
>>>>  {
>>>>         int ret;
>>>>
>>>> @@ -293,8 +293,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>>>>         }
>>>>
>>>>         for (i = start_sec; i <= end_sec; i++) {
>>>> -               err = __add_section(nid, section_nr_to_pfn(i), altmap,
>>>> -                               restrictions->flags & MHP_MEMBLOCK_API);
>>>> +               err = __add_section(nid, section_nr_to_pfn(i), altmap);
>>>>
>>>>                 /*
>>>>                  * EEXIST is finally dealt with by ioresource collision
>>>> @@ -1066,9 +1065,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>>>>   */
>>>>  int __ref add_memory_resource(int nid, struct resource *res)
>>>>  {
>>>> -       struct mhp_restrictions restrictions = {
>>>> -               .flags = MHP_MEMBLOCK_API,
>>>> -       };
>>>> +       struct mhp_restrictions restrictions = {};
>>>
>>> With mhp_restrictions.flags no longer needed, can we drop
>>> mhp_restrictions and just go back to a plain @altmap argument?
>>>
>>
>> Oscar wants to use it to configure from where to allocate vmemmaps. That
>> was the original driver behind it.
>>
> 
> Ah, ok, makes sense.
> 

However I haven't really thought it through yet, smalles like that could
as well just be handled by the caller of
arch_add_memory()/arch_remove_memory() eventually, passing it via
something like the altmap parameter.

Let's leave it in place for now, we can talk about that once we have new
patches from Oscar.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
  2019-05-07 21:17   ` Dan Williams
@ 2019-05-08  8:35   ` David Hildenbrand
  2019-05-09 12:43   ` Wei Yang
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2019-05-08  8:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, Greg Kroah-Hartman, Rafael J. Wysocki,
	mike.travis, Ingo Molnar, Andrew Banman, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang, Arun KS,
	Mathieu Malaterre

On 07.05.19 20:38, David Hildenbrand wrote:
> Only memory to be added to the buddy and to be onlined/offlined by
> user space using memory block devices needs (and should have!) memory
> block devices.
> 
> Factor out creation of memory block devices Create all devices after
> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
> because it is now effectively stale.
> 
> Only after memory block devices have been added, memory can be onlined
> by user space. This implies, that memory is not visible to user space at
> all before arch_add_memory() succeeded.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
>  include/linux/memory.h |  2 +-
>  mm/memory_hotplug.c    | 15 ++++-----
>  3 files changed, 53 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 6e0cb4fda179..862c202a18ca 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
>  	return 0;
>  }
>  
> +static void unregister_memory(struct memory_block *memory)
> +{
> +	BUG_ON(memory->dev.bus != &memory_subsys);
> +
> +	/* drop the ref. we got via find_memory_block() */
> +	put_device(&memory->dev);
> +	device_unregister(&memory->dev);
> +}
> +
>  /*
> - * need an interface for the VM to add new memory regions,
> - * but without onlining it.
> + * Create memory block devices for the given memory area. Start and size
> + * have to be aligned to memory block granularity. Memory block devices
> + * will be initialized as offline.
>   */
> -int hotplug_memory_register(int nid, struct mem_section *section)
> +int hotplug_memory_register(unsigned long start, unsigned long size)
>  {
> -	int ret = 0;
> +	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
> +	unsigned long start_pfn = PFN_DOWN(start);
> +	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
> +	unsigned long pfn;
>  	struct memory_block *mem;
> +	int ret = 0;
>  
> -	mutex_lock(&mem_sysfs_mutex);
> +	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
> +	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
>  
> -	mem = find_memory_block(section);
> -	if (mem) {
> -		mem->section_count++;
> -		put_device(&mem->dev);
> -	} else {
> -		ret = init_memory_block(&mem, section, MEM_OFFLINE);
> +	mutex_lock(&mem_sysfs_mutex);
> +	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +		mem = find_memory_block(__pfn_to_section(pfn));
> +		if (mem) {
> +			WARN_ON_ONCE(false);
> +			put_device(&mem->dev);
> +			continue;
> +		}
> +		ret = init_memory_block(&mem, __pfn_to_section(pfn),
> +					MEM_OFFLINE);
>  		if (ret)
> -			goto out;
> -		mem->section_count++;
> +			break;
> +		mem->section_count = memory_block_size_bytes() /
> +				     MIN_MEMORY_BLOCK_SIZE;
> +	}
> +	if (ret) {
> +		end_pfn = pfn;
> +		for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +			mem = find_memory_block(__pfn_to_section(pfn));
> +			if (!mem)
> +				continue;
> +			mem->section_count = 0;
> +			unregister_memory(mem);
> +		}
>  	}
> -
> -out:
>  	mutex_unlock(&mem_sysfs_mutex);
>  	return ret;
>  }
>  
> -static void
> -unregister_memory(struct memory_block *memory)
> -{
> -	BUG_ON(memory->dev.bus != &memory_subsys);
> -
> -	/* drop the ref. we got via find_memory_block() */
> -	put_device(&memory->dev);
> -	device_unregister(&memory->dev);
> -}
> -
> -void unregister_memory_section(struct mem_section *section)
> +static int remove_memory_section(struct mem_section *section)
>  {

The function change is misplaces in this patch will drop it so this
patch compiles without the other patches.


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-05-08  7:21     ` David Hildenbrand
@ 2019-05-08 13:50       ` Dan Williams
  0 siblings, 0 replies; 45+ messages in thread
From: Dan Williams @ 2019-05-08 13:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Oscar Salvador, Jonathan Cameron

On Wed, May 8, 2019 at 12:22 AM David Hildenbrand <david@redhat.com> wrote:
>
>
> >>  drivers/base/node.c  | 18 +++++-------------
> >>  include/linux/node.h |  5 ++---
> >>  2 files changed, 7 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/base/node.c b/drivers/base/node.c
> >> index 04fdfa99b8bc..9be88fd05147 100644
> >> --- a/drivers/base/node.c
> >> +++ b/drivers/base/node.c
> >> @@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
> >>
> >>  /*
> >>   * Unregister memory block device under all nodes that it spans.
> >> + * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).
> >
> > Given this comment can bitrot relative to the implementation lets
> > instead add an explicit:
> >
> >     lockdep_assert_held(&mem_sysfs_mutex);
>
> That would require to make the mutex non-static. Is that what you
> suggest, or any other alternative?

If the concern is other code paths taking the lock when they shouldn't
then you could make a public "lockdep_assert_mem_sysfs_held()" to do
the same, but I otherwise think the benefit of inline lock validation
is worth the price of adding a new non-static symbol.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-08  7:39         ` David Hildenbrand
@ 2019-05-08 23:08           ` osalvador
  2019-05-09  7:05             ` David Hildenbrand
  0 siblings, 1 reply; 45+ messages in thread
From: osalvador @ 2019-05-08 23:08 UTC (permalink / raw)
  To: David Hildenbrand, Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Michal Hocko,
	Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai, Arun KS,
	Mathieu Malaterre

On Wed, 2019-05-08 at 09:39 +0200, David Hildenbrand wrote:
> However I haven't really thought it through yet, smalles like that
> could
> as well just be handled by the caller of
> arch_add_memory()/arch_remove_memory() eventually, passing it via
> something like the altmap parameter.
> 
> Let's leave it in place for now, we can talk about that once we have
> new
> patches from Oscar.
Hi David,

I plan to send a new patchset once this is and Dan's are merged,
otherwise I will have a mayhem with the conflicts.

I also plan/want to review this patchset, but time is tight this week.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-08 23:08           ` osalvador
@ 2019-05-09  7:05             ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2019-05-09  7:05 UTC (permalink / raw)
  To: osalvador, Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Michal Hocko,
	Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai, Arun KS,
	Mathieu Malaterre

On 09.05.19 01:08, osalvador wrote:
> On Wed, 2019-05-08 at 09:39 +0200, David Hildenbrand wrote:
>> However I haven't really thought it through yet, smalles like that
>> could
>> as well just be handled by the caller of
>> arch_add_memory()/arch_remove_memory() eventually, passing it via
>> something like the altmap parameter.
>>
>> Let's leave it in place for now, we can talk about that once we have
>> new
>> patches from Oscar.
> Hi David,
> 
> I plan to send a new patchset once this is and Dan's are merged,
> otherwise I will have a mayhem with the conflicts.
> 
> I also plan/want to review this patchset, but time is tight this week.
> 

Sure, take your time. I'll resend this patch set most probably tomorrow
or early next week. Cheers!

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  2019-05-07 18:37 ` [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
  2019-05-07 20:38   ` Dan Williams
@ 2019-05-09 12:23   ` Wei Yang
  1 sibling, 0 replies; 45+ messages in thread
From: Wei Yang @ 2019-05-09 12:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Oscar Salvador, Michal Hocko,
	Pavel Tatashin, Qian Cai, Wei Yang, Arun KS, Mathieu Malaterre

On Tue, May 07, 2019 at 08:37:57PM +0200, David Hildenbrand wrote:
>By converting start and size to page granularity, we actually ignore
>unaligned parts within a page instead of properly bailing out with an
>error.
>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: Qian Cai <cai@lca.pw>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>


-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
  2019-05-07 21:17   ` Dan Williams
  2019-05-08  8:35   ` David Hildenbrand
@ 2019-05-09 12:43   ` Wei Yang
  2019-05-09 12:50     ` David Hildenbrand
  2019-05-09 13:55   ` Wei Yang
  2019-05-09 14:31   ` Wei Yang
  4 siblings, 1 reply; 45+ messages in thread
From: Wei Yang @ 2019-05-09 12:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang,
	Arun KS, Mathieu Malaterre

On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>Only memory to be added to the buddy and to be onlined/offlined by
>user space using memory block devices needs (and should have!) memory
>block devices.
>
>Factor out creation of memory block devices Create all devices after
>arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>because it is now effectively stale.
>
>Only after memory block devices have been added, memory can be onlined
>by user space. This implies, that memory is not visible to user space at
>all before arch_add_memory() succeeded.
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: Qian Cai <cai@lca.pw>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
> include/linux/memory.h |  2 +-
> mm/memory_hotplug.c    | 15 ++++-----
> 3 files changed, 53 insertions(+), 34 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 6e0cb4fda179..862c202a18ca 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
> 	return 0;
> }
> 
>+static void unregister_memory(struct memory_block *memory)
>+{
>+	BUG_ON(memory->dev.bus != &memory_subsys);
>+
>+	/* drop the ref. we got via find_memory_block() */
>+	put_device(&memory->dev);
>+	device_unregister(&memory->dev);
>+}
>+
> /*
>- * need an interface for the VM to add new memory regions,
>- * but without onlining it.
>+ * Create memory block devices for the given memory area. Start and size
>+ * have to be aligned to memory block granularity. Memory block devices
>+ * will be initialized as offline.
>  */
>-int hotplug_memory_register(int nid, struct mem_section *section)
>+int hotplug_memory_register(unsigned long start, unsigned long size)
> {
>-	int ret = 0;
>+	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
>+	unsigned long start_pfn = PFN_DOWN(start);
>+	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>+	unsigned long pfn;
> 	struct memory_block *mem;
>+	int ret = 0;
> 
>-	mutex_lock(&mem_sysfs_mutex);
>+	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
>+	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));

After this change, the call flow looks like this:

add_memory_resource
    check_hotplug_memory_range
    hotplug_memory_register

Since in check_hotplug_memory_range() has checked the boundary, do we need to
check here again?

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-09 12:43   ` Wei Yang
@ 2019-05-09 12:50     ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2019-05-09 12:50 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre

On 09.05.19 14:43, Wei Yang wrote:
> On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>> Only memory to be added to the buddy and to be onlined/offlined by
>> user space using memory block devices needs (and should have!) memory
>> block devices.
>>
>> Factor out creation of memory block devices Create all devices after
>> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>> because it is now effectively stale.
>>
>> Only after memory block devices have been added, memory can be onlined
>> by user space. This implies, that memory is not visible to user space at
>> all before arch_add_memory() succeeded.
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Qian Cai <cai@lca.pw>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Arun KS <arunks@codeaurora.org>
>> Cc: Mathieu Malaterre <malat@debian.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
>> include/linux/memory.h |  2 +-
>> mm/memory_hotplug.c    | 15 ++++-----
>> 3 files changed, 53 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 6e0cb4fda179..862c202a18ca 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
>> 	return 0;
>> }
>>
>> +static void unregister_memory(struct memory_block *memory)
>> +{
>> +	BUG_ON(memory->dev.bus != &memory_subsys);
>> +
>> +	/* drop the ref. we got via find_memory_block() */
>> +	put_device(&memory->dev);
>> +	device_unregister(&memory->dev);
>> +}
>> +
>> /*
>> - * need an interface for the VM to add new memory regions,
>> - * but without onlining it.
>> + * Create memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * will be initialized as offline.
>>  */
>> -int hotplug_memory_register(int nid, struct mem_section *section)
>> +int hotplug_memory_register(unsigned long start, unsigned long size)
>> {
>> -	int ret = 0;
>> +	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
>> +	unsigned long start_pfn = PFN_DOWN(start);
>> +	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>> +	unsigned long pfn;
>> 	struct memory_block *mem;
>> +	int ret = 0;
>>
>> -	mutex_lock(&mem_sysfs_mutex);
>> +	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
>> +	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
> 
> After this change, the call flow looks like this:
> 
> add_memory_resource
>     check_hotplug_memory_range
>     hotplug_memory_register
> 
> Since in check_hotplug_memory_range() has checked the boundary, do we need to
> check here again?
> 

I prefer to check for such requirements explicitly in applicable places,
especially if they are placed in different files. Makes code easier to
get. WARN_ON_ONCE will indicate that this has to be assured by the caller.

Thanks!

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
                     ` (2 preceding siblings ...)
  2019-05-09 12:43   ` Wei Yang
@ 2019-05-09 13:55   ` Wei Yang
  2019-05-09 14:05     ` David Hildenbrand
  2019-05-09 14:31   ` Wei Yang
  4 siblings, 1 reply; 45+ messages in thread
From: Wei Yang @ 2019-05-09 13:55 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang,
	Arun KS, Mathieu Malaterre

On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>Only memory to be added to the buddy and to be onlined/offlined by
>user space using memory block devices needs (and should have!) memory
>block devices.
>
>Factor out creation of memory block devices Create all devices after
>arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>because it is now effectively stale.
>
>Only after memory block devices have been added, memory can be onlined
>by user space. This implies, that memory is not visible to user space at
>all before arch_add_memory() succeeded.
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: Qian Cai <cai@lca.pw>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
> include/linux/memory.h |  2 +-
> mm/memory_hotplug.c    | 15 ++++-----
> 3 files changed, 53 insertions(+), 34 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 6e0cb4fda179..862c202a18ca 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
> 	return 0;
> }
> 
>+static void unregister_memory(struct memory_block *memory)
>+{
>+	BUG_ON(memory->dev.bus != &memory_subsys);
>+
>+	/* drop the ref. we got via find_memory_block() */
>+	put_device(&memory->dev);
>+	device_unregister(&memory->dev);
>+}
>+
> /*
>- * need an interface for the VM to add new memory regions,
>- * but without onlining it.
>+ * Create memory block devices for the given memory area. Start and size
>+ * have to be aligned to memory block granularity. Memory block devices
>+ * will be initialized as offline.
>  */
>-int hotplug_memory_register(int nid, struct mem_section *section)
>+int hotplug_memory_register(unsigned long start, unsigned long size)
> {
>-	int ret = 0;
>+	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
>+	unsigned long start_pfn = PFN_DOWN(start);
>+	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>+	unsigned long pfn;
> 	struct memory_block *mem;
>+	int ret = 0;
> 
>-	mutex_lock(&mem_sysfs_mutex);
>+	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
>+	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
> 
>-	mem = find_memory_block(section);
>-	if (mem) {
>-		mem->section_count++;
>-		put_device(&mem->dev);
>-	} else {
>-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
>+	mutex_lock(&mem_sysfs_mutex);
>+	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
>+		mem = find_memory_block(__pfn_to_section(pfn));
>+		if (mem) {
>+			WARN_ON_ONCE(false);

One question here, the purpose of WARN_ON_ONCE(false) is? Would we trigger
this?

>+			put_device(&mem->dev);
>+			continue;
>+		}
>+		ret = init_memory_block(&mem, __pfn_to_section(pfn),
>+					MEM_OFFLINE);
> 		if (ret)
>-			goto out;
>-		mem->section_count++;
>+			break;
>+		mem->section_count = memory_block_size_bytes() /
>+				     MIN_MEMORY_BLOCK_SIZE;

Maybe we can leverage sections_per_block variable.

                mem->section_count = sections_per_block;

>+	}
>+	if (ret) {
>+		end_pfn = pfn;
>+		for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
>+			mem = find_memory_block(__pfn_to_section(pfn));
>+			if (!mem)
>+				continue;
>+			mem->section_count = 0;
>+			unregister_memory(mem);
>+		}
> 	}

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-09 13:55   ` Wei Yang
@ 2019-05-09 14:05     ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2019-05-09 14:05 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre

On 09.05.19 15:55, Wei Yang wrote:
> On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>> Only memory to be added to the buddy and to be onlined/offlined by
>> user space using memory block devices needs (and should have!) memory
>> block devices.
>>
>> Factor out creation of memory block devices Create all devices after
>> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>> because it is now effectively stale.
>>
>> Only after memory block devices have been added, memory can be onlined
>> by user space. This implies, that memory is not visible to user space at
>> all before arch_add_memory() succeeded.
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Qian Cai <cai@lca.pw>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Arun KS <arunks@codeaurora.org>
>> Cc: Mathieu Malaterre <malat@debian.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
>> include/linux/memory.h |  2 +-
>> mm/memory_hotplug.c    | 15 ++++-----
>> 3 files changed, 53 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 6e0cb4fda179..862c202a18ca 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
>> 	return 0;
>> }
>>
>> +static void unregister_memory(struct memory_block *memory)
>> +{
>> +	BUG_ON(memory->dev.bus != &memory_subsys);
>> +
>> +	/* drop the ref. we got via find_memory_block() */
>> +	put_device(&memory->dev);
>> +	device_unregister(&memory->dev);
>> +}
>> +
>> /*
>> - * need an interface for the VM to add new memory regions,
>> - * but without onlining it.
>> + * Create memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * will be initialized as offline.
>>  */
>> -int hotplug_memory_register(int nid, struct mem_section *section)
>> +int hotplug_memory_register(unsigned long start, unsigned long size)
>> {
>> -	int ret = 0;
>> +	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
>> +	unsigned long start_pfn = PFN_DOWN(start);
>> +	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>> +	unsigned long pfn;
>> 	struct memory_block *mem;
>> +	int ret = 0;
>>
>> -	mutex_lock(&mem_sysfs_mutex);
>> +	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
>> +	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
>>
>> -	mem = find_memory_block(section);
>> -	if (mem) {
>> -		mem->section_count++;
>> -		put_device(&mem->dev);
>> -	} else {
>> -		ret = init_memory_block(&mem, section, MEM_OFFLINE);
>> +	mutex_lock(&mem_sysfs_mutex);
>> +	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
>> +		mem = find_memory_block(__pfn_to_section(pfn));
>> +		if (mem) {
>> +			WARN_ON_ONCE(false);
> 
> One question here, the purpose of WARN_ON_ONCE(false) is? Would we trigger
> this?

Would happen if something goes terribly wrong. We might want to remove
this once we are sure this will not happen.

I replaced it in the meantime by a

if (WARN_ON_ONCE(mem)) {
	put_device(&mem->dev);
	ret = -EEXIST;
	break;
}

> 
>> +			put_device(&mem->dev);
>> +			continue;
>> +		}
>> +		ret = init_memory_block(&mem, __pfn_to_section(pfn),
>> +					MEM_OFFLINE);
>> 		if (ret)
>> -			goto out;
>> -		mem->section_count++;
>> +			break;
>> +		mem->section_count = memory_block_size_bytes() /
>> +				     MIN_MEMORY_BLOCK_SIZE;
> 
> Maybe we can leverage sections_per_block variable.

Most certainly if it does what I think it does :) thanks!


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
                     ` (3 preceding siblings ...)
  2019-05-09 13:55   ` Wei Yang
@ 2019-05-09 14:31   ` Wei Yang
  2019-05-09 14:58     ` David Hildenbrand
  4 siblings, 1 reply; 45+ messages in thread
From: Wei Yang @ 2019-05-09 14:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang,
	Arun KS, Mathieu Malaterre

On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>Only memory to be added to the buddy and to be onlined/offlined by
>user space using memory block devices needs (and should have!) memory
>block devices.
>
>Factor out creation of memory block devices Create all devices after
>arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>because it is now effectively stale.
>
>Only after memory block devices have been added, memory can be onlined
>by user space. This implies, that memory is not visible to user space at
>all before arch_add_memory() succeeded.
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: Qian Cai <cai@lca.pw>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
> include/linux/memory.h |  2 +-
> mm/memory_hotplug.c    | 15 ++++-----
> 3 files changed, 53 insertions(+), 34 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 6e0cb4fda179..862c202a18ca 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
> 	return 0;
> }
> 
>+static void unregister_memory(struct memory_block *memory)
>+{
>+	BUG_ON(memory->dev.bus != &memory_subsys);
>+
>+	/* drop the ref. we got via find_memory_block() */
>+	put_device(&memory->dev);
>+	device_unregister(&memory->dev);
>+}
>+
> /*
>- * need an interface for the VM to add new memory regions,
>- * but without onlining it.
>+ * Create memory block devices for the given memory area. Start and size
>+ * have to be aligned to memory block granularity. Memory block devices
>+ * will be initialized as offline.
>  */
>-int hotplug_memory_register(int nid, struct mem_section *section)
>+int hotplug_memory_register(unsigned long start, unsigned long size)

One trivial suggestion about the function name.

For memory_block device, sometimes we use the full name

    find_memory_block
    init_memory_block
    add_memory_block

But sometimes we use *nick* name

    hotplug_memory_register
    register_memory
    unregister_memory

This is a little bit confusion.

Can we use one name convention here? 

[...]

> /*
>@@ -1106,6 +1100,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
> 	if (ret < 0)
> 		goto error;
> 
>+	/* create memory block devices after memory was added */
>+	ret = hotplug_memory_register(start, size);
>+	if (ret) {
>+		arch_remove_memory(nid, start, size, NULL);

Functionally, it works I think.

But arch_remove_memory() would remove pages from zone. At this point, we just
allocate section/mmap for pages, the zones are empty and pages are not
connected to zone.

Function  zone = page_zone(page); always gets zone #0, since pages->flags is 0
at  this point. This is not exact.

Would we add some comment to mention this? Or we need to clean up
arch_remove_memory() to take out __remove_zone()?


>+		goto error;
>+	}
>+
> 	if (new_node) {
> 		/* If sysfs file of new node can't be created, cpu on the node
> 		 * can't be hot-added. There is no rollback way now.
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-09 14:31   ` Wei Yang
@ 2019-05-09 14:58     ` David Hildenbrand
  2019-05-09 21:50       ` Wei Yang
  0 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-09 14:58 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre

On 09.05.19 16:31, Wei Yang wrote:
> On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>> Only memory to be added to the buddy and to be onlined/offlined by
>> user space using memory block devices needs (and should have!) memory
>> block devices.
>>
>> Factor out creation of memory block devices Create all devices after
>> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>> because it is now effectively stale.
>>
>> Only after memory block devices have been added, memory can be onlined
>> by user space. This implies, that memory is not visible to user space at
>> all before arch_add_memory() succeeded.
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Qian Cai <cai@lca.pw>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Arun KS <arunks@codeaurora.org>
>> Cc: Mathieu Malaterre <malat@debian.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
>> include/linux/memory.h |  2 +-
>> mm/memory_hotplug.c    | 15 ++++-----
>> 3 files changed, 53 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 6e0cb4fda179..862c202a18ca 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
>> 	return 0;
>> }
>>
>> +static void unregister_memory(struct memory_block *memory)
>> +{
>> +	BUG_ON(memory->dev.bus != &memory_subsys);
>> +
>> +	/* drop the ref. we got via find_memory_block() */
>> +	put_device(&memory->dev);
>> +	device_unregister(&memory->dev);
>> +}
>> +
>> /*
>> - * need an interface for the VM to add new memory regions,
>> - * but without onlining it.
>> + * Create memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * will be initialized as offline.
>>  */
>> -int hotplug_memory_register(int nid, struct mem_section *section)
>> +int hotplug_memory_register(unsigned long start, unsigned long size)
> 
> One trivial suggestion about the function name.
> 
> For memory_block device, sometimes we use the full name
> 
>     find_memory_block
>     init_memory_block
>     add_memory_block
> 
> But sometimes we use *nick* name
> 
>     hotplug_memory_register
>     register_memory
>     unregister_memory
> 
> This is a little bit confusion.
> 
> Can we use one name convention here?

We can just go for

crate_memory_blocks() and free_memory_blocks(). Or do
you have better suggestions?

(I would actually even prefer "memory_block_devices", because memory
blocks have different meanins)

> 
> [...]
> 
>> /*
>> @@ -1106,6 +1100,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
>> 	if (ret < 0)
>> 		goto error;
>>
>> +	/* create memory block devices after memory was added */
>> +	ret = hotplug_memory_register(start, size);
>> +	if (ret) {
>> +		arch_remove_memory(nid, start, size, NULL);
> 
> Functionally, it works I think.
> 
> But arch_remove_memory() would remove pages from zone. At this point, we just
> allocate section/mmap for pages, the zones are empty and pages are not
> connected to zone.
> 
> Function  zone = page_zone(page); always gets zone #0, since pages->flags is 0
> at  this point. This is not exact.
> 
> Would we add some comment to mention this? Or we need to clean up
> arch_remove_memory() to take out __remove_zone()?

That is precisely what is on my list next (see cover letter).This is
already broken when memory that was never onlined is removed again.
So I am planning to fix that independently.


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-09 14:58     ` David Hildenbrand
@ 2019-05-09 21:50       ` Wei Yang
  2019-05-09 22:18         ` David Hildenbrand
  0 siblings, 1 reply; 45+ messages in thread
From: Wei Yang @ 2019-05-09 21:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-mm, linux-kernel, linux-ia64, linuxppc-dev,
	linux-s390, linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre

On Thu, May 09, 2019 at 04:58:56PM +0200, David Hildenbrand wrote:
>On 09.05.19 16:31, Wei Yang wrote:
>> On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>>> Only memory to be added to the buddy and to be onlined/offlined by
>>> user space using memory block devices needs (and should have!) memory
>>> block devices.
>>>
>>> Factor out creation of memory block devices Create all devices after
>>> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>>> because it is now effectively stale.
>>>
>>> Only after memory block devices have been added, memory can be onlined
>>> by user space. This implies, that memory is not visible to user space at
>>> all before arch_add_memory() succeeded.
>>>
>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Cc: Andrew Banman <andrew.banman@hpe.com>
>>> Cc: Oscar Salvador <osalvador@suse.de>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>>> Cc: Qian Cai <cai@lca.pw>
>>> Cc: Wei Yang <richard.weiyang@gmail.com>
>>> Cc: Arun KS <arunks@codeaurora.org>
>>> Cc: Mathieu Malaterre <malat@debian.org>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
>>> include/linux/memory.h |  2 +-
>>> mm/memory_hotplug.c    | 15 ++++-----
>>> 3 files changed, 53 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>>> index 6e0cb4fda179..862c202a18ca 100644
>>> --- a/drivers/base/memory.c
>>> +++ b/drivers/base/memory.c
>>> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
>>> 	return 0;
>>> }
>>>
>>> +static void unregister_memory(struct memory_block *memory)
>>> +{
>>> +	BUG_ON(memory->dev.bus != &memory_subsys);
>>> +
>>> +	/* drop the ref. we got via find_memory_block() */
>>> +	put_device(&memory->dev);
>>> +	device_unregister(&memory->dev);
>>> +}
>>> +
>>> /*
>>> - * need an interface for the VM to add new memory regions,
>>> - * but without onlining it.
>>> + * Create memory block devices for the given memory area. Start and size
>>> + * have to be aligned to memory block granularity. Memory block devices
>>> + * will be initialized as offline.
>>>  */
>>> -int hotplug_memory_register(int nid, struct mem_section *section)
>>> +int hotplug_memory_register(unsigned long start, unsigned long size)
>> 
>> One trivial suggestion about the function name.
>> 
>> For memory_block device, sometimes we use the full name
>> 
>>     find_memory_block
>>     init_memory_block
>>     add_memory_block
>> 
>> But sometimes we use *nick* name
>> 
>>     hotplug_memory_register
>>     register_memory
>>     unregister_memory
>> 
>> This is a little bit confusion.
>> 
>> Can we use one name convention here?
>
>We can just go for
>
>crate_memory_blocks() and free_memory_blocks(). Or do
>you have better suggestions?

s/crate/create/

Looks good to me.

>
>(I would actually even prefer "memory_block_devices", because memory
>blocks have different meanins)
>

Agree with you, this comes to my mind sometime ago :-)

>> 
>> [...]
>> 
>>> /*
>>> @@ -1106,6 +1100,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
>>> 	if (ret < 0)
>>> 		goto error;
>>>
>>> +	/* create memory block devices after memory was added */
>>> +	ret = hotplug_memory_register(start, size);
>>> +	if (ret) {
>>> +		arch_remove_memory(nid, start, size, NULL);
>> 
>> Functionally, it works I think.
>> 
>> But arch_remove_memory() would remove pages from zone. At this point, we just
>> allocate section/mmap for pages, the zones are empty and pages are not
>> connected to zone.
>> 
>> Function  zone = page_zone(page); always gets zone #0, since pages->flags is 0
>> at  this point. This is not exact.
>> 
>> Would we add some comment to mention this? Or we need to clean up
>> arch_remove_memory() to take out __remove_zone()?
>
>That is precisely what is on my list next (see cover letter).This is
>already broken when memory that was never onlined is removed again.
>So I am planning to fix that independently.
>

Sounds great :-)

Hope you would cc me in the following series.

>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-09 21:50       ` Wei Yang
@ 2019-05-09 22:18         ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2019-05-09 22:18 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre

> Looks good to me.
> 
>>
>> (I would actually even prefer "memory_block_devices", because memory
>> blocks have different meanins)
>>
> 
> Agree with you, this comes to my mind sometime ago :-)

We have memblocks, memory_blocks  ... I guess memory_block_device is
unique :)

> 
>>>
>>> [...]
>>>
>>>> /*
>>>> @@ -1106,6 +1100,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
>>>> 	if (ret < 0)
>>>> 		goto error;
>>>>
>>>> +	/* create memory block devices after memory was added */
>>>> +	ret = hotplug_memory_register(start, size);
>>>> +	if (ret) {
>>>> +		arch_remove_memory(nid, start, size, NULL);
>>>
>>> Functionally, it works I think.
>>>
>>> But arch_remove_memory() would remove pages from zone. At this point, we just
>>> allocate section/mmap for pages, the zones are empty and pages are not
>>> connected to zone.
>>>
>>> Function  zone = page_zone(page); always gets zone #0, since pages->flags is 0
>>> at  this point. This is not exact.
>>>
>>> Would we add some comment to mention this? Or we need to clean up
>>> arch_remove_memory() to take out __remove_zone()?
>>
>> That is precisely what is on my list next (see cover letter).This is
>> already broken when memory that was never onlined is removed again.
>> So I am planning to fix that independently.
>>
> 
> Sounds great :-)

Especially, I suspect a lot of bugs in the area of

1. Remove memory that has never been onlined
2. Remove memory that has been onlined/offlined a couple of times
3. Remove memory that has been onlined to different zones.

Will see when refactoring if my intuition is right :)

> 
> Hope you would cc me in the following series.


Sure! Thanks!


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG
  2019-05-07 21:02   ` Dan Williams
  2019-05-07 21:06     ` David Hildenbrand
@ 2019-05-13  7:48     ` David Hildenbrand
  2019-05-13  8:20       ` David Hildenbrand
  1 sibling, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2019-05-13  7:48 UTC (permalink / raw)
  To: Dan Williams, Michal Hocko, Oscar Salvador
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Tony Luck, Fenghua Yu,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Martin Schwidefsky, Heiko Carstens, Yoshinori Sato, Rich Felker,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Greg Kroah-Hartman,
	Rafael J. Wysocki, Mike Rapoport, Kirill A. Shutemov,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Christophe Leroy, Nicholas Piggin, Vasily Gorbik, Rob Herring,
	Masahiro Yamada, mike.travis, Andrew Banman, Pavel Tatashin,
	Wei Yang, Arun KS, Qian Cai, Mathieu Malaterre, Baoquan He,
	Logan Gunthorpe

On 07.05.19 23:02, Dan Williams wrote:
> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> Let's prepare for better error handling while adding memory by allowing
>> to use arch_remove_memory() and __remove_pages() even if
>> CONFIG_MEMORY_HOTREMOVE is not set. CONFIG_MEMORY_HOTREMOVE effectively
>> covers
>> - Offlining of system ram (memory block devices) - offline_pages()
>> - Unplug of system ram - remove_memory()
>> - Unplug/remap of device memory - devm_memremap()
>>
>> This allows e.g. for handling like
>>
>> arch_add_memory()
>> rc = do_something();
>> if (rc) {
>>         arch_remove_memory();
>> }
>>
>> Whereby do_something() will for example be memory block device creation
>> after it has been factored out.
> 
> What's left after this? Can we just get rid of CONFIG_MEMORY_HOTREMOVE
> option completely when CONFIG_MEMORY_HOTPLUG is enabled? It's not
> clear to me why there was ever the option to compile out the remove
> code when the add code is included.
> 

If there are no other comments, I will go ahead and rip out
CONFIG_MEMORY_HOTREMOVE completely, gluing the functionality to
CONFIG_MEMORY_HOTPLUG.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG
  2019-05-13  7:48     ` David Hildenbrand
@ 2019-05-13  8:20       ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2019-05-13  8:20 UTC (permalink / raw)
  To: Dan Williams, Michal Hocko, Oscar Salvador
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Tony Luck, Fenghua Yu,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Martin Schwidefsky, Heiko Carstens, Yoshinori Sato, Rich Felker,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Greg Kroah-Hartman,
	Rafael J. Wysocki, Mike Rapoport, Kirill A. Shutemov,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Christophe Leroy, Nicholas Piggin, Vasily Gorbik, Rob Herring,
	Masahiro Yamada, mike.travis, Andrew Banman, Pavel Tatashin,
	Wei Yang, Arun KS, Qian Cai, Mathieu Malaterre, Baoquan He,
	Logan Gunthorpe

On 13.05.19 09:48, David Hildenbrand wrote:
> On 07.05.19 23:02, Dan Williams wrote:
>> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>>
>>> Let's prepare for better error handling while adding memory by allowing
>>> to use arch_remove_memory() and __remove_pages() even if
>>> CONFIG_MEMORY_HOTREMOVE is not set. CONFIG_MEMORY_HOTREMOVE effectively
>>> covers
>>> - Offlining of system ram (memory block devices) - offline_pages()
>>> - Unplug of system ram - remove_memory()
>>> - Unplug/remap of device memory - devm_memremap()
>>>
>>> This allows e.g. for handling like
>>>
>>> arch_add_memory()
>>> rc = do_something();
>>> if (rc) {
>>>         arch_remove_memory();
>>> }
>>>
>>> Whereby do_something() will for example be memory block device creation
>>> after it has been factored out.
>>
>> What's left after this? Can we just get rid of CONFIG_MEMORY_HOTREMOVE
>> option completely when CONFIG_MEMORY_HOTPLUG is enabled? It's not
>> clear to me why there was ever the option to compile out the remove
>> code when the add code is included.
>>
> 
> If there are no other comments, I will go ahead and rip out
> CONFIG_MEMORY_HOTREMOVE completely, gluing the functionality to
> CONFIG_MEMORY_HOTPLUG.
> 

Hmmmm, however this will require CONFIG_MEMORY_HOTPLUG to require

- MEMORY_ISOLATION
- HAVE_BOOTMEM_INFO_NODE if (X86_64 || PPC64)

And depends on
- MIGRATION

Which would limit the configurations where memory hotplug would be
available. I guess going with this patch here is ok as a first step.

I just realized, that we'll need arch_remove_memory() for arm64 to make
this patch here work.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2019-05-13  8:20 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-07 18:37 [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling David Hildenbrand
2019-05-07 18:37 ` [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
2019-05-07 20:38   ` Dan Williams
2019-05-09 12:23   ` Wei Yang
2019-05-07 18:37 ` [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory() David Hildenbrand
2019-05-07 20:46   ` Dan Williams
2019-05-07 20:47     ` David Hildenbrand
2019-05-07 20:57       ` Dan Williams
2019-05-07 21:13         ` David Hildenbrand
2019-05-07 18:37 ` [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG David Hildenbrand
2019-05-07 21:02   ` Dan Williams
2019-05-07 21:06     ` David Hildenbrand
2019-05-13  7:48     ` David Hildenbrand
2019-05-13  8:20       ` David Hildenbrand
2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
2019-05-07 21:17   ` Dan Williams
2019-05-07 21:27     ` David Hildenbrand
2019-05-08  8:35   ` David Hildenbrand
2019-05-09 12:43   ` Wei Yang
2019-05-09 12:50     ` David Hildenbrand
2019-05-09 13:55   ` Wei Yang
2019-05-09 14:05     ` David Hildenbrand
2019-05-09 14:31   ` Wei Yang
2019-05-09 14:58     ` David Hildenbrand
2019-05-09 21:50       ` Wei Yang
2019-05-09 22:18         ` David Hildenbrand
2019-05-07 18:38 ` [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API David Hildenbrand
2019-05-07 21:19   ` Dan Williams
2019-05-07 21:24     ` David Hildenbrand
2019-05-07 21:25       ` Dan Williams
2019-05-08  7:39         ` David Hildenbrand
2019-05-08 23:08           ` osalvador
2019-05-09  7:05             ` David Hildenbrand
2019-05-07 18:38 ` [PATCH v2 6/8] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
2019-05-07 21:27   ` Dan Williams
2019-05-07 18:38 ` [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
2019-05-08  0:15   ` Dan Williams
2019-05-08  7:21     ` David Hildenbrand
2019-05-08 13:50       ` Dan Williams
2019-05-07 18:38 ` [PATCH v2 8/8] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
2019-05-08  0:30   ` Dan Williams
2019-05-07 19:04 ` [PATCH v2 0/8] mm/memory_hotplug: Factor out memory block device handling Dan Williams
2019-05-07 19:21   ` David Hildenbrand
2019-05-07 19:37     ` David Hildenbrand
2019-05-07 20:36       ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).