linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling
@ 2019-05-27 11:11 David Hildenbrand
  2019-05-27 11:11 ` [PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
                   ` (11 more replies)
  0 siblings, 12 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-arm-kernel, akpm, Dan Williams, Wei Yang, Igor Mammedov,
	David Hildenbrand, Alex Deucher, Andrew Banman, Andy Lutomirski,
	Anshuman Khandual, Ard Biesheuvel, Arun KS, Baoquan He,
	Benjamin Herrenschmidt, Borislav Petkov, Catalin Marinas,
	Chintan Pandya, Christophe Leroy, Chris Wilson, Dave Hansen,
	David S. Miller, Fenghua Yu, Greg Kroah-Hartman, Heiko Carstens,
	H. Peter Anvin, Ingo Molnar, Ingo Molnar, Jonathan Cameron,
	Joonsoo Kim, Jun Yao, Kirill A. Shutemov, Logan Gunthorpe,
	Mark Brown, Mark Rutland, Martin Schwidefsky, Masahiro Yamada,
	Mathieu Malaterre, Michael Ellerman, Michal Hocko, Mike Rapoport,
	Mike Rapoport, mike.travis, Nicholas Piggin, Oscar Salvador,
	Oscar Salvador, Paul Mackerras, Pavel Tatashin, Pavel Tatashin,
	Peter Zijlstra, Qian Cai, Rafael J. Wysocki, Rich Felker,
	Rob Herring, Robin Murphy, Thomas Gleixner, Tony Luck,
	Vasily Gorbik, Wei Yang, Will Deacon, Yoshinori Sato, Yu Zhao

We only want memory block devices for memory to be onlined/offlined
(add/remove from the buddy). This is required so user space can
online/offline memory and kdump gets notified about newly onlined memory.

Let's factor out creation/removal of memory block devices. This helps
to further cleanup arch_add_memory/arch_remove_memory() and to make
implementation of new features easier - especially sub-section
memory hot add from Dan.

Anshuman Khandual is currently working on arch_remove_memory(). I added
a temporary solution via "arm64/mm: Add temporary arch_remove_memory()
implementation", that is sufficient as a firsts tep in the context of
this series. (we don't cleanup page tables in case anything goes
wrong already)

Did a quick sanity test with DIMM plug/unplug, making sure all devices
and sysfs links properly get added/removed. Compile tested on s390x and
x86-64.

Based on next/master.

Next refactoring on my list will be making sure that remove_memory()
will never deal with zones / access "struct pages". Any kind of zone
handling will have to be done when offlining system memory / before
removing device memory. I am thinking about remove_pfn_range_from_zone()",
du undo everything "move_pfn_range_to_zone()" did.

v2 -> v3:
- Add "s390x/mm: Fail when an altmap is used for arch_add_memory()"
- Add "arm64/mm: Add temporary arch_remove_memory() implementation"
- Add "drivers/base/memory: Pass a block_id to init_memory_block()"
- Various changes to "mm/memory_hotplug: Create memory block devices
  after arch_add_memory()" and "mm/memory_hotplug: Create memory block
  devices after arch_add_memory()" due to switching from sections to
  block_id's.

v1 -> v2:
- s390x/mm: Implement arch_remove_memory()
-- remove mapping after "__remove_pages"

David Hildenbrand (11):
  mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  s390x/mm: Fail when an altmap is used for arch_add_memory()
  s390x/mm: Implement arch_remove_memory()
  arm64/mm: Add temporary arch_remove_memory() implementation
  drivers/base/memory: Pass a block_id to init_memory_block()
  mm/memory_hotplug: Allow arch_remove_pages() without
    CONFIG_MEMORY_HOTREMOVE
  mm/memory_hotplug: Create memory block devices after arch_add_memory()
  mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  mm/memory_hotplug: Remove memory block devices before
    arch_remove_memory()
  mm/memory_hotplug: Make unregister_memory_block_under_nodes() never
    fail
  mm/memory_hotplug: Remove "zone" parameter from
    sparse_remove_one_section

 arch/arm64/mm/mmu.c            |  17 +++++
 arch/ia64/mm/init.c            |   2 -
 arch/powerpc/mm/mem.c          |   2 -
 arch/s390/mm/init.c            |  18 +++--
 arch/sh/mm/init.c              |   2 -
 arch/x86/mm/init_32.c          |   2 -
 arch/x86/mm/init_64.c          |   2 -
 drivers/base/memory.c          | 134 +++++++++++++++++++--------------
 drivers/base/node.c            |  27 +++----
 include/linux/memory.h         |   6 +-
 include/linux/memory_hotplug.h |  12 +--
 include/linux/node.h           |   7 +-
 mm/memory_hotplug.c            |  44 +++++------
 mm/sparse.c                    |  10 +--
 14 files changed, 140 insertions(+), 145 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  2019-05-27 11:11 [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling David Hildenbrand
@ 2019-05-27 11:11 ` David Hildenbrand
  2019-05-30 17:53   ` Pavel Tatashin
                     ` (2 more replies)
  2019-05-27 11:11 ` [PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory() David Hildenbrand
                   ` (10 subsequent siblings)
  11 siblings, 3 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-arm-kernel, akpm, Dan Williams, Wei Yang, Igor Mammedov,
	David Hildenbrand, Oscar Salvador, Michal Hocko, Pavel Tatashin,
	Qian Cai, Arun KS, Mathieu Malaterre, Wei Yang

By converting start and size to page granularity, we actually ignore
unaligned parts within a page instead of properly bailing out with an
error.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e096c987d261..762887b2358b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1051,16 +1051,11 @@ int try_online_node(int nid)
 
 static int check_hotplug_memory_range(u64 start, u64 size)
 {
-	unsigned long block_sz = memory_block_size_bytes();
-	u64 block_nr_pages = block_sz >> PAGE_SHIFT;
-	u64 nr_pages = size >> PAGE_SHIFT;
-	u64 start_pfn = PFN_DOWN(start);
-
 	/* memory range must be block size aligned */
-	if (!nr_pages || !IS_ALIGNED(start_pfn, block_nr_pages) ||
-	    !IS_ALIGNED(nr_pages, block_nr_pages)) {
+	if (!size || !IS_ALIGNED(start, memory_block_size_bytes()) ||
+	    !IS_ALIGNED(size, memory_block_size_bytes())) {
 		pr_err("Block size [%#lx] unaligned hotplug range: start %#llx, size %#llx",
-		       block_sz, start, size);
+		       memory_block_size_bytes(), start, size);
 		return -EINVAL;
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory()
  2019-05-27 11:11 [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling David Hildenbrand
  2019-05-27 11:11 ` [PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
@ 2019-05-27 11:11 ` David Hildenbrand
  2019-06-10 17:07   ` Oscar Salvador
  2019-07-01  7:43   ` Michal Hocko
  2019-05-27 11:11 ` [PATCH v3 03/11] s390x/mm: Implement arch_remove_memory() David Hildenbrand
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-arm-kernel, akpm, Dan Williams, Wei Yang, Igor Mammedov,
	David Hildenbrand, Martin Schwidefsky, Heiko Carstens,
	Michal Hocko, Mike Rapoport, Vasily Gorbik, Oscar Salvador

ZONE_DEVICE is not yet supported, fail if an altmap is passed, so we
don't forget arch_add_memory()/arch_remove_memory() when unlocking
support.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 14d1eae9fe43..d552e330fbcc 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -226,6 +226,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	unsigned long size_pages = PFN_DOWN(size);
 	int rc;
 
+	if (WARN_ON_ONCE(restrictions->altmap))
+		return -EINVAL;
+
 	rc = vmem_add_mapping(start, size);
 	if (rc)
 		return rc;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 03/11] s390x/mm: Implement arch_remove_memory()
  2019-05-27 11:11 [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling David Hildenbrand
  2019-05-27 11:11 ` [PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
  2019-05-27 11:11 ` [PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory() David Hildenbrand
@ 2019-05-27 11:11 ` David Hildenbrand
  2019-07-01  7:45   ` Michal Hocko
  2019-05-27 11:11 ` [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation David Hildenbrand
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 68+ messages in thread
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-arm-kernel, akpm, Dan Williams, Wei Yang, Igor Mammedov,
	David Hildenbrand, Martin Schwidefsky, Heiko Carstens,
	Michal Hocko, Mike Rapoport, Vasily Gorbik, Oscar Salvador

Will come in handy when wanting to handle errors after
arch_add_memory().

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/init.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index d552e330fbcc..14955e0a9fcf 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -243,12 +243,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
 void arch_remove_memory(int nid, u64 start, u64 size,
 			struct vmem_altmap *altmap)
 {
-	/*
-	 * There is no hardware or firmware interface which could trigger a
-	 * hot memory remove on s390. So there is nothing that needs to be
-	 * implemented.
-	 */
-	BUG();
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	struct zone *zone;
+
+	zone = page_zone(pfn_to_page(start_pfn));
+	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	vmem_remove_mapping(start, size);
 }
 #endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation
  2019-05-27 11:11 [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling David Hildenbrand
                   ` (2 preceding siblings ...)
  2019-05-27 11:11 ` [PATCH v3 03/11] s390x/mm: Implement arch_remove_memory() David Hildenbrand
@ 2019-05-27 11:11 ` David Hildenbrand
  2019-06-03 21:41   ` Wei Yang
  2019-07-01 12:48   ` Michal Hocko
  2019-05-27 11:11 ` [PATCH v3 05/11] drivers/base/memory: Pass a block_id to init_memory_block() David Hildenbrand
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-arm-kernel, akpm, Dan Williams, Wei Yang, Igor Mammedov,
	David Hildenbrand, Catalin Marinas, Will Deacon, Mark Rutland,
	Ard Biesheuvel, Chintan Pandya, Mike Rapoport, Jun Yao, Yu Zhao,
	Robin Murphy, Anshuman Khandual

A proper arch_remove_memory() implementation is on its way, which also
cleanly removes page tables in arch_add_memory() in case something goes
wrong.

As we want to use arch_remove_memory() in case something goes wrong
during memory hotplug after arch_add_memory() finished, let's add
a temporary hack that is sufficient enough until we get a proper
implementation that cleans up page table entries.

We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up
patches.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/arm64/mm/mmu.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a1bfc4413982..e569a543c384 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1084,4 +1084,23 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
 			   restrictions);
 }
+#ifdef CONFIG_MEMORY_HOTREMOVE
+void arch_remove_memory(int nid, u64 start, u64 size,
+			struct vmem_altmap *altmap)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	struct zone *zone;
+
+	/*
+	 * FIXME: Cleanup page tables (also in arch_add_memory() in case
+	 * adding fails). Until then, this function should only be used
+	 * during memory hotplug (adding memory), not for memory
+	 * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
+	 * unlocked yet.
+	 */
+	zone = page_zone(pfn_to_page(start_pfn));
+	__remove_pages(zone, start_pfn, nr_pages, altmap);
+}
+#endif
 #endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 05/11] drivers/base/memory: Pass a block_id to init_memory_block()
  2019-05-27 11:11 [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling David Hildenbrand
                   ` (3 preceding siblings ...)
  2019-05-27 11:11 ` [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation David Hildenbrand
@ 2019-05-27 11:11 ` David Hildenbrand
  2019-06-03 21:49   ` Wei Yang
  2019-07-01  7:56   ` Michal Hocko
  2019-05-27 11:11 ` [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE David Hildenbrand
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-arm-kernel, akpm, Dan Williams, Wei Yang, Igor Mammedov,
	David Hildenbrand, Greg Kroah-Hartman, Rafael J. Wysocki

We'll rework hotplug_memory_register() shortly, so it no longer consumes
pass a section.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f180427e48f4..f914fa6fe350 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -651,21 +651,18 @@ int register_memory(struct memory_block *memory)
 	return ret;
 }
 
-static int init_memory_block(struct memory_block **memory,
-			     struct mem_section *section, unsigned long state)
+static int init_memory_block(struct memory_block **memory, int block_id,
+			     unsigned long state)
 {
 	struct memory_block *mem;
 	unsigned long start_pfn;
-	int scn_nr;
 	int ret = 0;
 
 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return -ENOMEM;
 
-	scn_nr = __section_nr(section);
-	mem->start_section_nr =
-			base_memory_block_id(scn_nr) * sections_per_block;
+	mem->start_section_nr = block_id * sections_per_block;
 	mem->end_section_nr = mem->start_section_nr + sections_per_block - 1;
 	mem->state = state;
 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
@@ -694,7 +691,8 @@ static int add_memory_block(int base_section_nr)
 
 	if (section_count == 0)
 		return 0;
-	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
+	ret = init_memory_block(&mem, base_memory_block_id(base_section_nr),
+				MEM_ONLINE);
 	if (ret)
 		return ret;
 	mem->section_count = section_count;
@@ -707,6 +705,7 @@ static int add_memory_block(int base_section_nr)
  */
 int hotplug_memory_register(int nid, struct mem_section *section)
 {
+	int block_id = base_memory_block_id(__section_nr(section));
 	int ret = 0;
 	struct memory_block *mem;
 
@@ -717,7 +716,7 @@ int hotplug_memory_register(int nid, struct mem_section *section)
 		mem->section_count++;
 		put_device(&mem->dev);
 	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
+		ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
 		if (ret)
 			goto out;
 		mem->section_count++;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE
  2019-05-27 11:11 [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling David Hildenbrand
                   ` (4 preceding siblings ...)
  2019-05-27 11:11 ` [PATCH v3 05/11] drivers/base/memory: Pass a block_id to init_memory_block() David Hildenbrand
@ 2019-05-27 11:11 ` David Hildenbrand
  2019-05-30 17:56   ` Pavel Tatashin
                     ` (2 more replies)
  2019-05-27 11:11 ` [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
                   ` (5 subsequent siblings)
  11 siblings, 3 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-arm-kernel, akpm, Dan Williams, Wei Yang, Igor Mammedov,
	David Hildenbrand, Tony Luck, Fenghua Yu, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Greg Kroah-Hartman,
	Rafael J. Wysocki, Michal Hocko, Mike Rapoport, Oscar Salvador,
	Kirill A. Shutemov, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Christophe Leroy, Nicholas Piggin, Vasily Gorbik,
	Rob Herring, Masahiro Yamada, mike.travis, Andrew Banman,
	Pavel Tatashin, Wei Yang, Arun KS, Qian Cai, Mathieu Malaterre,
	Baoquan He, Logan Gunthorpe, Anshuman Khandual

We want to improve error handling while adding memory by allowing
to use arch_remove_memory() and __remove_pages() even if
CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:

	arch_add_memory()
	rc = do_something();
	if (rc) {
		arch_remove_memory();
	}

We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
quite some dependencies for memory offlining.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/arm64/mm/mmu.c            | 2 --
 arch/ia64/mm/init.c            | 2 --
 arch/powerpc/mm/mem.c          | 2 --
 arch/s390/mm/init.c            | 2 --
 arch/sh/mm/init.c              | 2 --
 arch/x86/mm/init_32.c          | 2 --
 arch/x86/mm/init_64.c          | 2 --
 drivers/base/memory.c          | 2 --
 include/linux/memory.h         | 2 --
 include/linux/memory_hotplug.h | 2 --
 mm/memory_hotplug.c            | 2 --
 mm/sparse.c                    | 6 ------
 12 files changed, 28 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e569a543c384..9ccd7539f2d4 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1084,7 +1084,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
 			   restrictions);
 }
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
 			struct vmem_altmap *altmap)
 {
@@ -1103,4 +1102,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 	__remove_pages(zone, start_pfn, nr_pages, altmap);
 }
 #endif
-#endif
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index d28e29103bdb..aae75fd7b810 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -681,7 +681,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return ret;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
 			struct vmem_altmap *altmap)
 {
@@ -693,4 +692,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 	__remove_pages(zone, start_pfn, nr_pages, altmap);
 }
 #endif
-#endif
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index e885fe2aafcc..e4bc2dc3f593 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -130,7 +130,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
 	return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void __ref arch_remove_memory(int nid, u64 start, u64 size,
 			     struct vmem_altmap *altmap)
 {
@@ -164,7 +163,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 		pr_warn("Hash collision while resizing HPT\n");
 }
 #endif
-#endif /* CONFIG_MEMORY_HOTPLUG */
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 void __init mem_topology_setup(void)
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 14955e0a9fcf..ffb81fe95c77 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -239,7 +239,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return rc;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
 			struct vmem_altmap *altmap)
 {
@@ -251,5 +250,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 	__remove_pages(zone, start_pfn, nr_pages, altmap);
 	vmem_remove_mapping(start, size);
 }
-#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 13c6a6bb5fd9..dfdbaa50946e 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -429,7 +429,6 @@ int memory_add_physaddr_to_nid(u64 addr)
 EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 #endif
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
 			struct vmem_altmap *altmap)
 {
@@ -440,5 +439,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 	zone = page_zone(pfn_to_page(start_pfn));
 	__remove_pages(zone, start_pfn, nr_pages, altmap);
 }
-#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index f265a4316179..4068abb9427f 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -860,7 +860,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void arch_remove_memory(int nid, u64 start, u64 size,
 			struct vmem_altmap *altmap)
 {
@@ -872,7 +871,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
 	__remove_pages(zone, start_pfn, nr_pages, altmap);
 }
 #endif
-#endif
 
 int kernel_set_to_readonly __read_mostly;
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 693aaf28d5fe..8335ac6e1112 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1196,7 +1196,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
 	remove_pagetable(start, end, false, altmap);
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 static void __meminit
 kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 {
@@ -1221,7 +1220,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 	__remove_pages(zone, start_pfn, nr_pages, altmap);
 	kernel_physical_mapping_remove(start, start + size);
 }
-#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 static struct kcore_list kcore_vsyscall;
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f914fa6fe350..ac17c95a5f28 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -727,7 +727,6 @@ int hotplug_memory_register(int nid, struct mem_section *section)
 	return ret;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 static void
 unregister_memory(struct memory_block *memory)
 {
@@ -766,7 +765,6 @@ void unregister_memory_section(struct mem_section *section)
 out_unlock:
 	mutex_unlock(&mem_sysfs_mutex);
 }
-#endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /* return true if the memory block is offlined, otherwise, return false */
 bool is_memblock_offlined(struct memory_block *mem)
diff --git a/include/linux/memory.h b/include/linux/memory.h
index e1dc1bb2b787..474c7c60c8f2 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -112,9 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
 int hotplug_memory_register(int nid, struct mem_section *section);
-#ifdef CONFIG_MEMORY_HOTREMOVE
 extern void unregister_memory_section(struct mem_section *);
-#endif
 extern int memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
 extern int memory_isolate_notify(unsigned long val, void *v);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ae892eef8b82..2d4de313926d 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -123,12 +123,10 @@ static inline bool movable_node_is_enabled(void)
 	return movable_node_enabled;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 extern void arch_remove_memory(int nid, u64 start, u64 size,
 			       struct vmem_altmap *altmap);
 extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
 			   unsigned long nr_pages, struct vmem_altmap *altmap);
-#endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /*
  * Do we want sysfs memblock files created. This will allow userspace to online
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 762887b2358b..4b9d2974f86c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -318,7 +318,6 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	return err;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 /* find the smallest valid pfn in the range [start_pfn, end_pfn) */
 static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
 				     unsigned long start_pfn,
@@ -582,7 +581,6 @@ void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 
 	set_zone_contiguous(zone);
 }
-#endif /* CONFIG_MEMORY_HOTREMOVE */
 
 int set_online_page_callback(online_page_callback_t callback)
 {
diff --git a/mm/sparse.c b/mm/sparse.c
index fd13166949b5..d1d5e05f5b8d 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -604,7 +604,6 @@ static void __kfree_section_memmap(struct page *memmap,
 
 	vmemmap_free(start, end, altmap);
 }
-#ifdef CONFIG_MEMORY_HOTREMOVE
 static void free_map_bootmem(struct page *memmap)
 {
 	unsigned long start = (unsigned long)memmap;
@@ -612,7 +611,6 @@ static void free_map_bootmem(struct page *memmap)
 
 	vmemmap_free(start, end, NULL);
 }
-#endif /* CONFIG_MEMORY_HOTREMOVE */
 #else
 static struct page *__kmalloc_section_memmap(void)
 {
@@ -651,7 +649,6 @@ static void __kfree_section_memmap(struct page *memmap,
 			   get_order(sizeof(struct page) * PAGES_PER_SECTION));
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 static void free_map_bootmem(struct page *memmap)
 {
 	unsigned long maps_section_nr, removing_section_nr, i;
@@ -681,7 +678,6 @@ static void free_map_bootmem(struct page *memmap)
 			put_page_bootmem(page);
 	}
 }
-#endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
 /**
@@ -746,7 +742,6 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
 	return ret;
 }
 
-#ifdef CONFIG_MEMORY_HOTREMOVE
 #ifdef CONFIG_MEMORY_FAILURE
 static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
 {
@@ -823,5 +818,4 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
 			PAGES_PER_SECTION - map_offset);
 	free_section_usemap(memmap, usemap, altmap);
 }
-#endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-27 11:11 [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling David Hildenbrand
                   ` (5 preceding siblings ...)
  2019-05-27 11:11 ` [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE David Hildenbrand
@ 2019-05-27 11:11 ` David Hildenbrand
  2019-05-30 21:07   ` Pavel Tatashin
                     ` (2 more replies)
  2019-05-27 11:11 ` [PATCH v3 08/11] mm/memory_hotplug: Drop MHP_MEMBLOCK_API David Hildenbrand
                   ` (4 subsequent siblings)
  11 siblings, 3 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-arm-kernel, akpm, Dan Williams, Wei Yang, Igor Mammedov,
	David Hildenbrand, Greg Kroah-Hartman, Rafael J. Wysocki,
	mike.travis, Ingo Molnar, Andrew Banman, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre

Only memory to be added to the buddy and to be onlined/offlined by
user space using /sys/devices/system/memory/... needs (and should have!)
memory block devices.

Factor out creation of memory block devices. Create all devices after
arch_add_memory() succeeded. We can later drop the want_memblock parameter,
because it is now effectively stale.

Only after memory block devices have been added, memory can be onlined
by user space. This implies, that memory is not visible to user space at
all before arch_add_memory() succeeded.

While at it
- use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory()
- introduce find_memory_block_by_id() to search via block id
- Use find_memory_block_by_id() in init_memory_block() to catch
  duplicates

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 82 +++++++++++++++++++++++++++---------------
 include/linux/memory.h |  2 +-
 mm/memory_hotplug.c    | 15 ++++----
 3 files changed, 63 insertions(+), 36 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index ac17c95a5f28..5a0370f0c506 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -39,6 +39,11 @@ static inline int base_memory_block_id(int section_nr)
 	return section_nr / sections_per_block;
 }
 
+static inline int pfn_to_block_id(unsigned long pfn)
+{
+	return base_memory_block_id(pfn_to_section_nr(pfn));
+}
+
 static int memory_subsys_online(struct device *dev);
 static int memory_subsys_offline(struct device *dev);
 
@@ -582,10 +587,9 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
  * A reference for the returned object is held and the reference for the
  * hinted object is released.
  */
-struct memory_block *find_memory_block_hinted(struct mem_section *section,
-					      struct memory_block *hint)
+static struct memory_block *find_memory_block_by_id(int block_id,
+						    struct memory_block *hint)
 {
-	int block_id = base_memory_block_id(__section_nr(section));
 	struct device *hintdev = hint ? &hint->dev : NULL;
 	struct device *dev;
 
@@ -597,6 +601,14 @@ struct memory_block *find_memory_block_hinted(struct mem_section *section,
 	return to_memory_block(dev);
 }
 
+struct memory_block *find_memory_block_hinted(struct mem_section *section,
+					      struct memory_block *hint)
+{
+	int block_id = base_memory_block_id(__section_nr(section));
+
+	return find_memory_block_by_id(block_id, hint);
+}
+
 /*
  * For now, we have a linear search to go find the appropriate
  * memory_block corresponding to a particular phys_index. If
@@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block **memory, int block_id,
 	unsigned long start_pfn;
 	int ret = 0;
 
+	mem = find_memory_block_by_id(block_id, NULL);
+	if (mem) {
+		put_device(&mem->dev);
+		return -EEXIST;
+	}
 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return -ENOMEM;
@@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
 	return 0;
 }
 
+static void unregister_memory(struct memory_block *memory)
+{
+	if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
+		return;
+
+	/* drop the ref. we got via find_memory_block() */
+	put_device(&memory->dev);
+	device_unregister(&memory->dev);
+}
+
 /*
- * need an interface for the VM to add new memory regions,
- * but without onlining it.
+ * Create memory block devices for the given memory area. Start and size
+ * have to be aligned to memory block granularity. Memory block devices
+ * will be initialized as offline.
  */
-int hotplug_memory_register(int nid, struct mem_section *section)
+int create_memory_block_devices(unsigned long start, unsigned long size)
 {
-	int block_id = base_memory_block_id(__section_nr(section));
-	int ret = 0;
+	const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
+	int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
 	struct memory_block *mem;
+	unsigned long block_id;
+	int ret = 0;
 
-	mutex_lock(&mem_sysfs_mutex);
+	if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
+			 !IS_ALIGNED(size, memory_block_size_bytes())))
+		return -EINVAL;
 
-	mem = find_memory_block(section);
-	if (mem) {
-		mem->section_count++;
-		put_device(&mem->dev);
-	} else {
+	mutex_lock(&mem_sysfs_mutex);
+	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
 		ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
 		if (ret)
-			goto out;
-		mem->section_count++;
+			break;
+		mem->section_count = sections_per_block;
+	}
+	if (ret) {
+		end_block_id = block_id;
+		for (block_id = start_block_id; block_id != end_block_id;
+		     block_id++) {
+			mem = find_memory_block_by_id(block_id, NULL);
+			mem->section_count = 0;
+			unregister_memory(mem);
+		}
 	}
-
-out:
 	mutex_unlock(&mem_sysfs_mutex);
 	return ret;
 }
 
-static void
-unregister_memory(struct memory_block *memory)
-{
-	BUG_ON(memory->dev.bus != &memory_subsys);
-
-	/* drop the ref. we got via find_memory_block() */
-	put_device(&memory->dev);
-	device_unregister(&memory->dev);
-}
-
 void unregister_memory_section(struct mem_section *section)
 {
 	struct memory_block *mem;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 474c7c60c8f2..db3e8567f900 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -111,7 +111,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
-int hotplug_memory_register(int nid, struct mem_section *section);
+int create_memory_block_devices(unsigned long start, unsigned long size);
 extern void unregister_memory_section(struct mem_section *);
 extern int memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4b9d2974f86c..b1fde90bbf19 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -259,13 +259,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 		return -EEXIST;
 
 	ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
-	if (ret < 0)
-		return ret;
-
-	if (!want_memblock)
-		return 0;
-
-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
+	return ret < 0 ? ret : 0;
 }
 
 /*
@@ -1107,6 +1101,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	if (ret < 0)
 		goto error;
 
+	/* create memory block devices after memory was added */
+	ret = create_memory_block_devices(start, size);
+	if (ret) {
+		arch_remove_memory(nid, start, size, NULL);
+		goto error;
+	}
+
 	if (new_node) {
 		/* If sysfs file of new node can't be created, cpu on the node
 		 * can't be hot-added. There is no rollback way now.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 08/11] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-27 11:11 [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling David Hildenbrand
                   ` (6 preceding siblings ...)
  2019-05-27 11:11 ` [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
@ 2019-05-27 11:11 ` David Hildenbrand
  2019-06-04 21:47   ` Wei Yang
  2019-07-01  8:15   ` Michal Hocko
  2019-05-27 11:11 ` [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-arm-kernel, akpm, Dan Williams, Wei Yang, Igor Mammedov,
	David Hildenbrand, Michal Hocko, Oscar Salvador, Pavel Tatashin,
	Joonsoo Kim, Qian Cai, Arun KS, Mathieu Malaterre

No longer needed, the callers of arch_add_memory() can handle this
manually.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/memory_hotplug.h | 8 --------
 mm/memory_hotplug.c            | 9 +++------
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2d4de313926d..2f1f87e13baa 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
 extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
 			   unsigned long nr_pages, struct vmem_altmap *altmap);
 
-/*
- * Do we want sysfs memblock files created. This will allow userspace to online
- * and offline memory explicitly. Lack of this bit means that the caller has to
- * call move_pfn_range_to_zone to finish the initialization.
- */
-
-#define MHP_MEMBLOCK_API               (1<<0)
-
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
 		       struct mhp_restrictions *restrictions);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b1fde90bbf19..9a92549ef23b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -251,7 +251,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
 static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-		struct vmem_altmap *altmap, bool want_memblock)
+				   struct vmem_altmap *altmap)
 {
 	int ret;
 
@@ -294,8 +294,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	}
 
 	for (i = start_sec; i <= end_sec; i++) {
-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
-				restrictions->flags & MHP_MEMBLOCK_API);
+		err = __add_section(nid, section_nr_to_pfn(i), altmap);
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
@@ -1067,9 +1066,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
  */
 int __ref add_memory_resource(int nid, struct resource *res)
 {
-	struct mhp_restrictions restrictions = {
-		.flags = MHP_MEMBLOCK_API,
-	};
+	struct mhp_restrictions restrictions = {};
 	u64 start, size;
 	bool new_node = false;
 	int ret;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
  2019-05-27 11:11 [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling David Hildenbrand
                   ` (7 preceding siblings ...)
  2019-05-27 11:11 ` [PATCH v3 08/11] mm/memory_hotplug: Drop MHP_MEMBLOCK_API David Hildenbrand
@ 2019-05-27 11:11 ` David Hildenbrand
  2019-06-04 22:07   ` Wei Yang
  2019-07-01  8:41   ` Michal Hocko
  2019-05-27 11:11 ` [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-arm-kernel, akpm, Dan Williams, Wei Yang, Igor Mammedov,
	David Hildenbrand, Greg Kroah-Hartman, Rafael J. Wysocki,
	mike.travis, Andrew Banman, Ingo Molnar, Alex Deucher,
	David S. Miller, Mark Brown, Chris Wilson, Oscar Salvador,
	Jonathan Cameron, Michal Hocko, Pavel Tatashin, Arun KS,
	Mathieu Malaterre

Let's factor out removing of memory block devices, which is only
necessary for memory added via add_memory() and friends that created
memory block devices. Remove the devices before calling
arch_remove_memory().

This finishes factoring out memory block device handling from
arch_add_memory() and arch_remove_memory().

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 37 ++++++++++++++++++-------------------
 drivers/base/node.c    | 11 ++++++-----
 include/linux/memory.h |  2 +-
 include/linux/node.h   |  6 ++----
 mm/memory_hotplug.c    |  5 +++--
 5 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 5a0370f0c506..f28efb0bf5c7 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -763,32 +763,31 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
 	return ret;
 }
 
-void unregister_memory_section(struct mem_section *section)
+/*
+ * Remove memory block devices for the given memory area. Start and size
+ * have to be aligned to memory block granularity. Memory block devices
+ * have to be offline.
+ */
+void remove_memory_block_devices(unsigned long start, unsigned long size)
 {
+	const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
+	const int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
 	struct memory_block *mem;
+	int block_id;
 
-	if (WARN_ON_ONCE(!present_section(section)))
+	if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
+			 !IS_ALIGNED(size, memory_block_size_bytes())))
 		return;
 
 	mutex_lock(&mem_sysfs_mutex);
-
-	/*
-	 * Some users of the memory hotplug do not want/need memblock to
-	 * track all sections. Skip over those.
-	 */
-	mem = find_memory_block(section);
-	if (!mem)
-		goto out_unlock;
-
-	unregister_mem_sect_under_nodes(mem, __section_nr(section));
-
-	mem->section_count--;
-	if (mem->section_count == 0)
+	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
+		mem = find_memory_block_by_id(block_id, NULL);
+		if (WARN_ON_ONCE(!mem))
+			continue;
+		mem->section_count = 0;
+		unregister_memory_block_under_nodes(mem);
 		unregister_memory(mem);
-	else
-		put_device(&mem->dev);
-
-out_unlock:
+	}
 	mutex_unlock(&mem_sysfs_mutex);
 }
 
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 8598fcbd2a17..04fdfa99b8bc 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -801,9 +801,10 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
 	return 0;
 }
 
-/* unregister memory section under all nodes that it spans */
-int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
-				    unsigned long phys_index)
+/*
+ * Unregister memory block device under all nodes that it spans.
+ */
+int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
 	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
@@ -816,8 +817,8 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
 		return -ENOMEM;
 	nodes_clear(*unlinked_nodes);
 
-	sect_start_pfn = section_nr_to_pfn(phys_index);
-	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
+	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
 		int nid;
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index db3e8567f900..f26a5417ec5d 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -112,7 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
 int create_memory_block_devices(unsigned long start, unsigned long size);
-extern void unregister_memory_section(struct mem_section *);
+void remove_memory_block_devices(unsigned long start, unsigned long size);
 extern int memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
 extern int memory_isolate_notify(unsigned long val, void *v);
diff --git a/include/linux/node.h b/include/linux/node.h
index 1a557c589ecb..02a29e71b175 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -139,8 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int register_mem_sect_under_node(struct memory_block *mem_blk,
 						void *arg);
-extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
-					   unsigned long phys_index);
+extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
 
 extern int register_memory_node_under_compute_node(unsigned int mem_nid,
 						   unsigned int cpu_nid,
@@ -176,8 +175,7 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
 {
 	return 0;
 }
-static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
-						  unsigned long phys_index)
+static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
 	return 0;
 }
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9a92549ef23b..82136c5b4c5f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -520,8 +520,6 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
 	if (WARN_ON_ONCE(!valid_section(ms)))
 		return;
 
-	unregister_memory_section(ms);
-
 	scn_nr = __section_nr(ms);
 	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
 	__remove_zone(zone, start_pfn);
@@ -1845,6 +1843,9 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
 	memblock_free(start, size);
 	memblock_remove(start, size);
 
+	/* remove memory block devices before removing memory */
+	remove_memory_block_devices(start, size);
+
 	arch_remove_memory(nid, start, size, NULL);
 	__release_memory_resource(start, size);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-05-27 11:11 [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling David Hildenbrand
                   ` (8 preceding siblings ...)
  2019-05-27 11:11 ` [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
@ 2019-05-27 11:11 ` David Hildenbrand
  2019-06-05 21:21   ` Wei Yang
                     ` (2 more replies)
  2019-05-27 11:11 ` [PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
  2019-06-03 21:21 ` [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling Wei Yang
  11 siblings, 3 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-arm-kernel, akpm, Dan Williams, Wei Yang, Igor Mammedov,
	David Hildenbrand, Greg Kroah-Hartman, Rafael J. Wysocki,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Oscar Salvador, Jonathan Cameron

We really don't want anything during memory hotunplug to fail.
We always pass a valid memory block device, that check can go. Avoid
allocating memory and eventually failing. As we are always called under
lock, we can use a static piece of memory. This avoids having to put
the structure onto the stack, having to guess about the stack size
of callers.

Patch inspired by a patch from Oscar Salvador.

In the future, there might be no need to iterate over nodes at all.
mem->nid should tell us exactly what to remove. Memory block devices
with mixed nodes (added during boot) should properly fenced off and never
removed.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/node.c  | 18 +++++-------------
 include/linux/node.h |  5 ++---
 2 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 04fdfa99b8bc..9be88fd05147 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
 
 /*
  * Unregister memory block device under all nodes that it spans.
+ * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).
  */
-int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
+void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
-	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
+	static nodemask_t unlinked_nodes;
 
-	if (!mem_blk) {
-		NODEMASK_FREE(unlinked_nodes);
-		return -EFAULT;
-	}
-	if (!unlinked_nodes)
-		return -ENOMEM;
-	nodes_clear(*unlinked_nodes);
-
+	nodes_clear(unlinked_nodes);
 	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
 	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
@@ -827,15 +821,13 @@ int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 			continue;
 		if (!node_online(nid))
 			continue;
-		if (node_test_and_set(nid, *unlinked_nodes))
+		if (node_test_and_set(nid, unlinked_nodes))
 			continue;
 		sysfs_remove_link(&node_devices[nid]->dev.kobj,
 			 kobject_name(&mem_blk->dev.kobj));
 		sysfs_remove_link(&mem_blk->dev.kobj,
 			 kobject_name(&node_devices[nid]->dev.kobj));
 	}
-	NODEMASK_FREE(unlinked_nodes);
-	return 0;
 }
 
 int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn)
diff --git a/include/linux/node.h b/include/linux/node.h
index 02a29e71b175..548c226966a2 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -139,7 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int register_mem_sect_under_node(struct memory_block *mem_blk,
 						void *arg);
-extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
+extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
 
 extern int register_memory_node_under_compute_node(unsigned int mem_nid,
 						   unsigned int cpu_nid,
@@ -175,9 +175,8 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
 {
 	return 0;
 }
-static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
+static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
-	return 0;
 }
 
 static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section
  2019-05-27 11:11 [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling David Hildenbrand
                   ` (9 preceding siblings ...)
  2019-05-27 11:11 ` [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
@ 2019-05-27 11:11 ` David Hildenbrand
  2019-06-05 21:21   ` Wei Yang
                     ` (2 more replies)
  2019-06-03 21:21 ` [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling Wei Yang
  11 siblings, 3 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-05-27 11:11 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-arm-kernel, akpm, Dan Williams, Wei Yang, Igor Mammedov,
	David Hildenbrand

The parameter is unused, so let's drop it. Memory removal paths should
never care about zones. This is the job of memory offlining and will
require more refactorings.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/memory_hotplug.h | 2 +-
 mm/memory_hotplug.c            | 2 +-
 mm/sparse.c                    | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2f1f87e13baa..1a4257c5f74c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -346,7 +346,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern int sparse_add_one_section(int nid, unsigned long start_pfn,
 				  struct vmem_altmap *altmap);
-extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
+extern void sparse_remove_one_section(struct mem_section *ms,
 		unsigned long map_offset, struct vmem_altmap *altmap);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 82136c5b4c5f..e48ec7b9dee2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -524,7 +524,7 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
 	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
 	__remove_zone(zone, start_pfn);
 
-	sparse_remove_one_section(zone, ms, map_offset, altmap);
+	sparse_remove_one_section(ms, map_offset, altmap);
 }
 
 /**
diff --git a/mm/sparse.c b/mm/sparse.c
index d1d5e05f5b8d..1552c855d62a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -800,8 +800,8 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap,
 		free_map_bootmem(memmap);
 }
 
-void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
-		unsigned long map_offset, struct vmem_altmap *altmap)
+void sparse_remove_one_section(struct mem_section *ms, unsigned long map_offset,
+			       struct vmem_altmap *altmap)
 {
 	struct page *memmap = NULL;
 	unsigned long *usemap = NULL;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  2019-05-27 11:11 ` [PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
@ 2019-05-30 17:53   ` Pavel Tatashin
  2019-06-10 16:46   ` Oscar Salvador
  2019-07-01  7:42   ` Michal Hocko
  2 siblings, 0 replies; 68+ messages in thread
From: Pavel Tatashin @ 2019-05-30 17:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, LKML, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	Linux ARM, Andrew Morton, Dan Williams, Wei Yang, Igor Mammedov,
	Oscar Salvador, Michal Hocko, Qian Cai, Arun KS,
	Mathieu Malaterre, Wei Yang

On Mon, May 27, 2019 at 7:12 AM David Hildenbrand <david@redhat.com> wrote:
>
> By converting start and size to page granularity, we actually ignore
> unaligned parts within a page instead of properly bailing out with an
> error.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE
  2019-05-27 11:11 ` [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE David Hildenbrand
@ 2019-05-30 17:56   ` Pavel Tatashin
  2019-06-03 22:15   ` Wei Yang
  2019-07-01  8:01   ` Michal Hocko
  2 siblings, 0 replies; 68+ messages in thread
From: Pavel Tatashin @ 2019-05-30 17:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, LKML, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	Linux ARM, Andrew Morton, Dan Williams, Wei Yang, Igor Mammedov,
	Tony Luck, Fenghua Yu, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Martin Schwidefsky, Heiko Carstens,
	Yoshinori Sato, Rich Felker, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Rafael J. Wysocki,
	Michal Hocko, Mike Rapoport, Oscar Salvador, Kirill A. Shutemov,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Christophe Leroy, Nicholas Piggin, Vasily Gorbik, Rob Herring,
	Masahiro Yamada, mike.travis, Andrew Banman, Wei Yang, Arun KS,
	Qian Cai, Mathieu Malaterre, Baoquan He, Logan Gunthorpe,
	Anshuman Khandual

On Mon, May 27, 2019 at 7:12 AM David Hildenbrand <david@redhat.com> wrote:
>
> We want to improve error handling while adding memory by allowing
> to use arch_remove_memory() and __remove_pages() even if
> CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
>
>         arch_add_memory()
>         rc = do_something();
>         if (rc) {
>                 arch_remove_memory();
>         }
>
> We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
> quite some dependencies for memory offlining.

I like this simplification, we should really get rid of CONFIG_MEMORY_HOTREMOVE.
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-27 11:11 ` [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
@ 2019-05-30 21:07   ` Pavel Tatashin
  2019-06-04 21:42   ` Wei Yang
  2019-07-01  8:14   ` Michal Hocko
  2 siblings, 0 replies; 68+ messages in thread
From: Pavel Tatashin @ 2019-05-30 21:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, LKML, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	Linux ARM, Andrew Morton, Dan Williams, Wei Yang, Igor Mammedov,
	Greg Kroah-Hartman, Rafael J. Wysocki, mike.travis, Ingo Molnar,
	Andrew Banman, Oscar Salvador, Michal Hocko, Qian Cai, Arun KS,
	Mathieu Malaterre

On Mon, May 27, 2019 at 7:12 AM David Hildenbrand <david@redhat.com> wrote:
>
> Only memory to be added to the buddy and to be onlined/offlined by
> user space using /sys/devices/system/memory/... needs (and should have!)
> memory block devices.
>
> Factor out creation of memory block devices. Create all devices after
> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
> because it is now effectively stale.
>
> Only after memory block devices have been added, memory can be onlined
> by user space. This implies, that memory is not visible to user space at
> all before arch_add_memory() succeeded.
>
> While at it
> - use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory()
> - introduce find_memory_block_by_id() to search via block id
> - Use find_memory_block_by_id() in init_memory_block() to catch
>   duplicates
>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>

LGTM
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling
  2019-05-27 11:11 [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling David Hildenbrand
                   ` (10 preceding siblings ...)
  2019-05-27 11:11 ` [PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
@ 2019-06-03 21:21 ` Wei Yang
  2019-06-03 21:40   ` David Hildenbrand
  11 siblings, 1 reply; 68+ messages in thread
From: Wei Yang @ 2019-06-03 21:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Alex Deucher, Andrew Banman, Andy Lutomirski,
	Anshuman Khandual, Ard Biesheuvel, Arun KS, Baoquan He,
	Benjamin Herrenschmidt, Borislav Petkov, Catalin Marinas,
	Chintan Pandya, Christophe Leroy, Chris Wilson, Dave Hansen,
	David S. Miller, Fenghua Yu, Greg Kroah-Hartman, Heiko Carstens,
	H. Peter Anvin, Ingo Molnar, Ingo Molnar, Jonathan Cameron,
	Joonsoo Kim, Jun Yao, Kirill A. Shutemov, Logan Gunthorpe,
	Mark Brown, Mark Rutland, Martin Schwidefsky, Masahiro Yamada,
	Mathieu Malaterre, Michael Ellerman, Michal Hocko, Mike Rapoport,
	Mike Rapoport, mike.travis, Nicholas Piggin, Oscar Salvador,
	Oscar Salvador, Paul Mackerras, Pavel Tatashin, Pavel Tatashin,
	Peter Zijlstra, Qian Cai, Rafael J. Wysocki, Rich Felker,
	Rob Herring, Robin Murphy, Thomas Gleixner, Tony Luck,
	Vasily Gorbik, Wei Yang, Will Deacon, Yoshinori Sato, Yu Zhao

IMHO, there is some typo.

s/devicehandling/device handling/

On Mon, May 27, 2019 at 01:11:41PM +0200, David Hildenbrand wrote:
>We only want memory block devices for memory to be onlined/offlined
>(add/remove from the buddy). This is required so user space can
>online/offline memory and kdump gets notified about newly onlined memory.
>
>Let's factor out creation/removal of memory block devices. This helps
>to further cleanup arch_add_memory/arch_remove_memory() and to make
>implementation of new features easier - especially sub-section
>memory hot add from Dan.
>
>Anshuman Khandual is currently working on arch_remove_memory(). I added
>a temporary solution via "arm64/mm: Add temporary arch_remove_memory()
>implementation", that is sufficient as a firsts tep in the context of

s/firsts tep/first step/

>this series. (we don't cleanup page tables in case anything goes
>wrong already)
>
>Did a quick sanity test with DIMM plug/unplug, making sure all devices
>and sysfs links properly get added/removed. Compile tested on s390x and
>x86-64.
>
>Based on next/master.
>
>Next refactoring on my list will be making sure that remove_memory()
>will never deal with zones / access "struct pages". Any kind of zone
>handling will have to be done when offlining system memory / before
>removing device memory. I am thinking about remove_pfn_range_from_zone()",
>du undo everything "move_pfn_range_to_zone()" did.

what is "du undo"? I may not get it.

>
>v2 -> v3:
>- Add "s390x/mm: Fail when an altmap is used for arch_add_memory()"
>- Add "arm64/mm: Add temporary arch_remove_memory() implementation"
>- Add "drivers/base/memory: Pass a block_id to init_memory_block()"
>- Various changes to "mm/memory_hotplug: Create memory block devices
>  after arch_add_memory()" and "mm/memory_hotplug: Create memory block
>  devices after arch_add_memory()" due to switching from sections to
>  block_id's.
>
>v1 -> v2:
>- s390x/mm: Implement arch_remove_memory()
>-- remove mapping after "__remove_pages"
>
>David Hildenbrand (11):
>  mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
>  s390x/mm: Fail when an altmap is used for arch_add_memory()
>  s390x/mm: Implement arch_remove_memory()
>  arm64/mm: Add temporary arch_remove_memory() implementation
>  drivers/base/memory: Pass a block_id to init_memory_block()
>  mm/memory_hotplug: Allow arch_remove_pages() without
>    CONFIG_MEMORY_HOTREMOVE
>  mm/memory_hotplug: Create memory block devices after arch_add_memory()
>  mm/memory_hotplug: Drop MHP_MEMBLOCK_API
>  mm/memory_hotplug: Remove memory block devices before
>    arch_remove_memory()
>  mm/memory_hotplug: Make unregister_memory_block_under_nodes() never
>    fail
>  mm/memory_hotplug: Remove "zone" parameter from
>    sparse_remove_one_section
>
> arch/arm64/mm/mmu.c            |  17 +++++
> arch/ia64/mm/init.c            |   2 -
> arch/powerpc/mm/mem.c          |   2 -
> arch/s390/mm/init.c            |  18 +++--
> arch/sh/mm/init.c              |   2 -
> arch/x86/mm/init_32.c          |   2 -
> arch/x86/mm/init_64.c          |   2 -
> drivers/base/memory.c          | 134 +++++++++++++++++++--------------
> drivers/base/node.c            |  27 +++----
> include/linux/memory.h         |   6 +-
> include/linux/memory_hotplug.h |  12 +--
> include/linux/node.h           |   7 +-
> mm/memory_hotplug.c            |  44 +++++------
> mm/sparse.c                    |  10 +--
> 14 files changed, 140 insertions(+), 145 deletions(-)
>
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling
  2019-06-03 21:21 ` [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling Wei Yang
@ 2019-06-03 21:40   ` David Hildenbrand
  0 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-06-03 21:40 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Igor Mammedov,
	Alex Deucher, Andrew Banman, Andy Lutomirski, Anshuman Khandual,
	Ard Biesheuvel, Arun KS, Baoquan He, Benjamin Herrenschmidt,
	Borislav Petkov, Catalin Marinas, Chintan Pandya,
	Christophe Leroy, Chris Wilson, Dave Hansen, David S. Miller,
	Fenghua Yu, Greg Kroah-Hartman, Heiko Carstens, H. Peter Anvin,
	Ingo Molnar, Ingo Molnar, Jonathan Cameron, Joonsoo Kim, Jun Yao,
	Kirill A. Shutemov, Logan Gunthorpe, Mark Brown, Mark Rutland,
	Martin Schwidefsky, Masahiro Yamada, Mathieu Malaterre,
	Michael Ellerman, Michal Hocko, Mike Rapoport, Mike Rapoport,
	mike.travis, Nicholas Piggin, Oscar Salvador, Oscar Salvador,
	Paul Mackerras, Pavel Tatashin, Pavel Tatashin, Peter Zijlstra,
	Qian Cai, Rafael J. Wysocki, Rich Felker, Rob Herring,
	Robin Murphy, Thomas Gleixner, Tony Luck, Vasily Gorbik,
	Wei Yang, Will Deacon, Yoshinori Sato, Yu Zhao

On 03.06.19 23:21, Wei Yang wrote:
> IMHO, there is some typo.

Yes, thanks.

> 
> s/devicehandling/device handling/
> 
> On Mon, May 27, 2019 at 01:11:41PM +0200, David Hildenbrand wrote:
>> We only want memory block devices for memory to be onlined/offlined
>> (add/remove from the buddy). This is required so user space can
>> online/offline memory and kdump gets notified about newly onlined memory.
>>
>> Let's factor out creation/removal of memory block devices. This helps
>> to further cleanup arch_add_memory/arch_remove_memory() and to make
>> implementation of new features easier - especially sub-section
>> memory hot add from Dan.
>>
>> Anshuman Khandual is currently working on arch_remove_memory(). I added
>> a temporary solution via "arm64/mm: Add temporary arch_remove_memory()
>> implementation", that is sufficient as a firsts tep in the context of
> 
> s/firsts tep/first step/
> 
>> this series. (we don't cleanup page tables in case anything goes
>> wrong already)
>>
>> Did a quick sanity test with DIMM plug/unplug, making sure all devices
>> and sysfs links properly get added/removed. Compile tested on s390x and
>> x86-64.
>>
>> Based on next/master.
>>
>> Next refactoring on my list will be making sure that remove_memory()
>> will never deal with zones / access "struct pages". Any kind of zone
>> handling will have to be done when offlining system memory / before
>> removing device memory. I am thinking about remove_pfn_range_from_zone()",
>> du undo everything "move_pfn_range_to_zone()" did.
> 
> what is "du undo"? I may not get it.

to undo ;)

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation
  2019-05-27 11:11 ` [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation David Hildenbrand
@ 2019-06-03 21:41   ` Wei Yang
  2019-06-04  6:56     ` David Hildenbrand
  2019-07-01 12:48   ` Michal Hocko
  1 sibling, 1 reply; 68+ messages in thread
From: Wei Yang @ 2019-06-03 21:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Catalin Marinas, Will Deacon, Mark Rutland,
	Ard Biesheuvel, Chintan Pandya, Mike Rapoport, Jun Yao, Yu Zhao,
	Robin Murphy, Anshuman Khandual

On Mon, May 27, 2019 at 01:11:45PM +0200, David Hildenbrand wrote:
>A proper arch_remove_memory() implementation is on its way, which also
>cleanly removes page tables in arch_add_memory() in case something goes
>wrong.

Would this be better to understand?

    removes page tables created in arch_add_memory

>
>As we want to use arch_remove_memory() in case something goes wrong
>during memory hotplug after arch_add_memory() finished, let's add
>a temporary hack that is sufficient enough until we get a proper
>implementation that cleans up page table entries.
>
>We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up
>patches.
>
>Cc: Catalin Marinas <catalin.marinas@arm.com>
>Cc: Will Deacon <will.deacon@arm.com>
>Cc: Mark Rutland <mark.rutland@arm.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>Cc: Chintan Pandya <cpandya@codeaurora.org>
>Cc: Mike Rapoport <rppt@linux.ibm.com>
>Cc: Jun Yao <yaojun8558363@gmail.com>
>Cc: Yu Zhao <yuzhao@google.com>
>Cc: Robin Murphy <robin.murphy@arm.com>
>Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> arch/arm64/mm/mmu.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
>diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>index a1bfc4413982..e569a543c384 100644
>--- a/arch/arm64/mm/mmu.c
>+++ b/arch/arm64/mm/mmu.c
>@@ -1084,4 +1084,23 @@ int arch_add_memory(int nid, u64 start, u64 size,
> 	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
> 			   restrictions);
> }
>+#ifdef CONFIG_MEMORY_HOTREMOVE
>+void arch_remove_memory(int nid, u64 start, u64 size,
>+			struct vmem_altmap *altmap)
>+{
>+	unsigned long start_pfn = start >> PAGE_SHIFT;
>+	unsigned long nr_pages = size >> PAGE_SHIFT;
>+	struct zone *zone;
>+
>+	/*
>+	 * FIXME: Cleanup page tables (also in arch_add_memory() in case
>+	 * adding fails). Until then, this function should only be used
>+	 * during memory hotplug (adding memory), not for memory
>+	 * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
>+	 * unlocked yet.
>+	 */
>+	zone = page_zone(pfn_to_page(start_pfn));

Compared with arch_remove_memory in x86. If altmap is not NULL, zone will be
retrieved from page related to altmap. Not sure why this is not the same?

>+	__remove_pages(zone, start_pfn, nr_pages, altmap);
>+}
>+#endif
> #endif
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 05/11] drivers/base/memory: Pass a block_id to init_memory_block()
  2019-05-27 11:11 ` [PATCH v3 05/11] drivers/base/memory: Pass a block_id to init_memory_block() David Hildenbrand
@ 2019-06-03 21:49   ` Wei Yang
  2019-06-04  6:56     ` David Hildenbrand
  2019-07-01  7:56   ` Michal Hocko
  1 sibling, 1 reply; 68+ messages in thread
From: Wei Yang @ 2019-06-03 21:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki

On Mon, May 27, 2019 at 01:11:46PM +0200, David Hildenbrand wrote:
>We'll rework hotplug_memory_register() shortly, so it no longer consumes
>pass a section.
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index f180427e48f4..f914fa6fe350 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -651,21 +651,18 @@ int register_memory(struct memory_block *memory)
> 	return ret;
> }
> 
>-static int init_memory_block(struct memory_block **memory,
>-			     struct mem_section *section, unsigned long state)
>+static int init_memory_block(struct memory_block **memory, int block_id,
>+			     unsigned long state)
> {
> 	struct memory_block *mem;
> 	unsigned long start_pfn;
>-	int scn_nr;
> 	int ret = 0;
> 
> 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> 	if (!mem)
> 		return -ENOMEM;
> 
>-	scn_nr = __section_nr(section);
>-	mem->start_section_nr =
>-			base_memory_block_id(scn_nr) * sections_per_block;
>+	mem->start_section_nr = block_id * sections_per_block;
> 	mem->end_section_nr = mem->start_section_nr + sections_per_block - 1;
> 	mem->state = state;
> 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
>@@ -694,7 +691,8 @@ static int add_memory_block(int base_section_nr)
> 
> 	if (section_count == 0)
> 		return 0;
>-	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
>+	ret = init_memory_block(&mem, base_memory_block_id(base_section_nr),
>+				MEM_ONLINE);

If my understanding is correct, section_nr could be removed too.

> 	if (ret)
> 		return ret;
> 	mem->section_count = section_count;
>@@ -707,6 +705,7 @@ static int add_memory_block(int base_section_nr)
>  */
> int hotplug_memory_register(int nid, struct mem_section *section)
> {
>+	int block_id = base_memory_block_id(__section_nr(section));
> 	int ret = 0;
> 	struct memory_block *mem;
> 
>@@ -717,7 +716,7 @@ int hotplug_memory_register(int nid, struct mem_section *section)
> 		mem->section_count++;
> 		put_device(&mem->dev);
> 	} else {
>-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
>+		ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
> 		if (ret)
> 			goto out;
> 		mem->section_count++;
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE
  2019-05-27 11:11 ` [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE David Hildenbrand
  2019-05-30 17:56   ` Pavel Tatashin
@ 2019-06-03 22:15   ` Wei Yang
  2019-06-04  6:59     ` David Hildenbrand
  2019-07-01  8:01   ` Michal Hocko
  2 siblings, 1 reply; 68+ messages in thread
From: Wei Yang @ 2019-06-03 22:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Tony Luck, Fenghua Yu, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Greg Kroah-Hartman,
	Rafael J. Wysocki, Michal Hocko, Mike Rapoport, Oscar Salvador,
	Kirill A. Shutemov, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Christophe Leroy, Nicholas Piggin, Vasily Gorbik,
	Rob Herring, Masahiro Yamada, mike.travis, Andrew Banman,
	Pavel Tatashin, Wei Yang, Arun KS, Qian Cai, Mathieu Malaterre,
	Baoquan He, Logan Gunthorpe, Anshuman Khandual

Allow arch_remove_pages() or arch_remove_memory()?

And want to confirm the kernel build on affected arch succeed?

On Mon, May 27, 2019 at 01:11:47PM +0200, David Hildenbrand wrote:
>We want to improve error handling while adding memory by allowing
>to use arch_remove_memory() and __remove_pages() even if
>CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
>
>	arch_add_memory()
>	rc = do_something();
>	if (rc) {
>		arch_remove_memory();
>	}
>
>We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
>quite some dependencies for memory offlining.
>
>Cc: Tony Luck <tony.luck@intel.com>
>Cc: Fenghua Yu <fenghua.yu@intel.com>
>Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>Cc: Paul Mackerras <paulus@samba.org>
>Cc: Michael Ellerman <mpe@ellerman.id.au>
>Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
>Cc: Rich Felker <dalias@libc.org>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: Andy Lutomirski <luto@kernel.org>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Ingo Molnar <mingo@redhat.com>
>Cc: Borislav Petkov <bp@alien8.de>
>Cc: "H. Peter Anvin" <hpa@zytor.com>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Mike Rapoport <rppt@linux.ibm.com>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: Oscar Salvador <osalvador@suse.com>
>Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>Cc: Alex Deucher <alexander.deucher@amd.com>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Mark Brown <broonie@kernel.org>
>Cc: Chris Wilson <chris@chris-wilson.co.uk>
>Cc: Christophe Leroy <christophe.leroy@c-s.fr>
>Cc: Nicholas Piggin <npiggin@gmail.com>
>Cc: Vasily Gorbik <gor@linux.ibm.com>
>Cc: Rob Herring <robh@kernel.org>
>Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: Wei Yang <richardw.yang@linux.intel.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Qian Cai <cai@lca.pw>
>Cc: Mathieu Malaterre <malat@debian.org>
>Cc: Baoquan He <bhe@redhat.com>
>Cc: Logan Gunthorpe <logang@deltatee.com>
>Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> arch/arm64/mm/mmu.c            | 2 --
> arch/ia64/mm/init.c            | 2 --
> arch/powerpc/mm/mem.c          | 2 --
> arch/s390/mm/init.c            | 2 --
> arch/sh/mm/init.c              | 2 --
> arch/x86/mm/init_32.c          | 2 --
> arch/x86/mm/init_64.c          | 2 --
> drivers/base/memory.c          | 2 --
> include/linux/memory.h         | 2 --
> include/linux/memory_hotplug.h | 2 --
> mm/memory_hotplug.c            | 2 --
> mm/sparse.c                    | 6 ------
> 12 files changed, 28 deletions(-)
>
>diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>index e569a543c384..9ccd7539f2d4 100644
>--- a/arch/arm64/mm/mmu.c
>+++ b/arch/arm64/mm/mmu.c
>@@ -1084,7 +1084,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
> 	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
> 			   restrictions);
> }
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> void arch_remove_memory(int nid, u64 start, u64 size,
> 			struct vmem_altmap *altmap)
> {
>@@ -1103,4 +1102,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
> 	__remove_pages(zone, start_pfn, nr_pages, altmap);
> }
> #endif
>-#endif
>diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
>index d28e29103bdb..aae75fd7b810 100644
>--- a/arch/ia64/mm/init.c
>+++ b/arch/ia64/mm/init.c
>@@ -681,7 +681,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
> 	return ret;
> }
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> void arch_remove_memory(int nid, u64 start, u64 size,
> 			struct vmem_altmap *altmap)
> {
>@@ -693,4 +692,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
> 	__remove_pages(zone, start_pfn, nr_pages, altmap);
> }
> #endif
>-#endif
>diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>index e885fe2aafcc..e4bc2dc3f593 100644
>--- a/arch/powerpc/mm/mem.c
>+++ b/arch/powerpc/mm/mem.c
>@@ -130,7 +130,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
> 	return __add_pages(nid, start_pfn, nr_pages, restrictions);
> }
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> void __ref arch_remove_memory(int nid, u64 start, u64 size,
> 			     struct vmem_altmap *altmap)
> {
>@@ -164,7 +163,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
> 		pr_warn("Hash collision while resizing HPT\n");
> }
> #endif
>-#endif /* CONFIG_MEMORY_HOTPLUG */
> 
> #ifndef CONFIG_NEED_MULTIPLE_NODES
> void __init mem_topology_setup(void)
>diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>index 14955e0a9fcf..ffb81fe95c77 100644
>--- a/arch/s390/mm/init.c
>+++ b/arch/s390/mm/init.c
>@@ -239,7 +239,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
> 	return rc;
> }
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> void arch_remove_memory(int nid, u64 start, u64 size,
> 			struct vmem_altmap *altmap)
> {
>@@ -251,5 +250,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
> 	__remove_pages(zone, start_pfn, nr_pages, altmap);
> 	vmem_remove_mapping(start, size);
> }
>-#endif
> #endif /* CONFIG_MEMORY_HOTPLUG */
>diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
>index 13c6a6bb5fd9..dfdbaa50946e 100644
>--- a/arch/sh/mm/init.c
>+++ b/arch/sh/mm/init.c
>@@ -429,7 +429,6 @@ int memory_add_physaddr_to_nid(u64 addr)
> EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> #endif
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> void arch_remove_memory(int nid, u64 start, u64 size,
> 			struct vmem_altmap *altmap)
> {
>@@ -440,5 +439,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
> 	zone = page_zone(pfn_to_page(start_pfn));
> 	__remove_pages(zone, start_pfn, nr_pages, altmap);
> }
>-#endif
> #endif /* CONFIG_MEMORY_HOTPLUG */
>diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
>index f265a4316179..4068abb9427f 100644
>--- a/arch/x86/mm/init_32.c
>+++ b/arch/x86/mm/init_32.c
>@@ -860,7 +860,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
> 	return __add_pages(nid, start_pfn, nr_pages, restrictions);
> }
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> void arch_remove_memory(int nid, u64 start, u64 size,
> 			struct vmem_altmap *altmap)
> {
>@@ -872,7 +871,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
> 	__remove_pages(zone, start_pfn, nr_pages, altmap);
> }
> #endif
>-#endif
> 
> int kernel_set_to_readonly __read_mostly;
> 
>diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>index 693aaf28d5fe..8335ac6e1112 100644
>--- a/arch/x86/mm/init_64.c
>+++ b/arch/x86/mm/init_64.c
>@@ -1196,7 +1196,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
> 	remove_pagetable(start, end, false, altmap);
> }
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> static void __meminit
> kernel_physical_mapping_remove(unsigned long start, unsigned long end)
> {
>@@ -1221,7 +1220,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
> 	__remove_pages(zone, start_pfn, nr_pages, altmap);
> 	kernel_physical_mapping_remove(start, start + size);
> }
>-#endif
> #endif /* CONFIG_MEMORY_HOTPLUG */
> 
> static struct kcore_list kcore_vsyscall;
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index f914fa6fe350..ac17c95a5f28 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -727,7 +727,6 @@ int hotplug_memory_register(int nid, struct mem_section *section)
> 	return ret;
> }
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> static void
> unregister_memory(struct memory_block *memory)
> {
>@@ -766,7 +765,6 @@ void unregister_memory_section(struct mem_section *section)
> out_unlock:
> 	mutex_unlock(&mem_sysfs_mutex);
> }
>-#endif /* CONFIG_MEMORY_HOTREMOVE */
> 
> /* return true if the memory block is offlined, otherwise, return false */
> bool is_memblock_offlined(struct memory_block *mem)
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index e1dc1bb2b787..474c7c60c8f2 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -112,9 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
> extern int register_memory_isolate_notifier(struct notifier_block *nb);
> extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
> int hotplug_memory_register(int nid, struct mem_section *section);
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> extern void unregister_memory_section(struct mem_section *);
>-#endif
> extern int memory_dev_init(void);
> extern int memory_notify(unsigned long val, void *v);
> extern int memory_isolate_notify(unsigned long val, void *v);
>diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>index ae892eef8b82..2d4de313926d 100644
>--- a/include/linux/memory_hotplug.h
>+++ b/include/linux/memory_hotplug.h
>@@ -123,12 +123,10 @@ static inline bool movable_node_is_enabled(void)
> 	return movable_node_enabled;
> }
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> extern void arch_remove_memory(int nid, u64 start, u64 size,
> 			       struct vmem_altmap *altmap);
> extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
> 			   unsigned long nr_pages, struct vmem_altmap *altmap);
>-#endif /* CONFIG_MEMORY_HOTREMOVE */
> 
> /*
>  * Do we want sysfs memblock files created. This will allow userspace to online
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 762887b2358b..4b9d2974f86c 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -318,7 +318,6 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
> 	return err;
> }
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> /* find the smallest valid pfn in the range [start_pfn, end_pfn) */
> static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
> 				     unsigned long start_pfn,
>@@ -582,7 +581,6 @@ void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
> 
> 	set_zone_contiguous(zone);
> }
>-#endif /* CONFIG_MEMORY_HOTREMOVE */
> 
> int set_online_page_callback(online_page_callback_t callback)
> {
>diff --git a/mm/sparse.c b/mm/sparse.c
>index fd13166949b5..d1d5e05f5b8d 100644
>--- a/mm/sparse.c
>+++ b/mm/sparse.c
>@@ -604,7 +604,6 @@ static void __kfree_section_memmap(struct page *memmap,
> 
> 	vmemmap_free(start, end, altmap);
> }
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> static void free_map_bootmem(struct page *memmap)
> {
> 	unsigned long start = (unsigned long)memmap;
>@@ -612,7 +611,6 @@ static void free_map_bootmem(struct page *memmap)
> 
> 	vmemmap_free(start, end, NULL);
> }
>-#endif /* CONFIG_MEMORY_HOTREMOVE */
> #else
> static struct page *__kmalloc_section_memmap(void)
> {
>@@ -651,7 +649,6 @@ static void __kfree_section_memmap(struct page *memmap,
> 			   get_order(sizeof(struct page) * PAGES_PER_SECTION));
> }
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> static void free_map_bootmem(struct page *memmap)
> {
> 	unsigned long maps_section_nr, removing_section_nr, i;
>@@ -681,7 +678,6 @@ static void free_map_bootmem(struct page *memmap)
> 			put_page_bootmem(page);
> 	}
> }
>-#endif /* CONFIG_MEMORY_HOTREMOVE */
> #endif /* CONFIG_SPARSEMEM_VMEMMAP */
> 
> /**
>@@ -746,7 +742,6 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
> 	return ret;
> }
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> #ifdef CONFIG_MEMORY_FAILURE
> static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
> {
>@@ -823,5 +818,4 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
> 			PAGES_PER_SECTION - map_offset);
> 	free_section_usemap(memmap, usemap, altmap);
> }
>-#endif /* CONFIG_MEMORY_HOTREMOVE */
> #endif /* CONFIG_MEMORY_HOTPLUG */
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation
  2019-06-03 21:41   ` Wei Yang
@ 2019-06-04  6:56     ` David Hildenbrand
  2019-06-04 17:36       ` Robin Murphy
  0 siblings, 1 reply; 68+ messages in thread
From: David Hildenbrand @ 2019-06-04  6:56 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Igor Mammedov,
	Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
	Chintan Pandya, Mike Rapoport, Jun Yao, Yu Zhao, Robin Murphy,
	Anshuman Khandual

On 03.06.19 23:41, Wei Yang wrote:
> On Mon, May 27, 2019 at 01:11:45PM +0200, David Hildenbrand wrote:
>> A proper arch_remove_memory() implementation is on its way, which also
>> cleanly removes page tables in arch_add_memory() in case something goes
>> wrong.
> 
> Would this be better to understand?
> 
>     removes page tables created in arch_add_memory

That's not what this sentence expresses. Have a look at
arch_add_memory(), in case  __add_pages() fails, the page tables are not
removed. This will also be fixed by Anshuman in the same shot.

> 
>>
>> As we want to use arch_remove_memory() in case something goes wrong
>> during memory hotplug after arch_add_memory() finished, let's add
>> a temporary hack that is sufficient enough until we get a proper
>> implementation that cleans up page table entries.
>>
>> We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up
>> patches.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Cc: Chintan Pandya <cpandya@codeaurora.org>
>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>> Cc: Jun Yao <yaojun8558363@gmail.com>
>> Cc: Yu Zhao <yuzhao@google.com>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> arch/arm64/mm/mmu.c | 19 +++++++++++++++++++
>> 1 file changed, 19 insertions(+)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index a1bfc4413982..e569a543c384 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1084,4 +1084,23 @@ int arch_add_memory(int nid, u64 start, u64 size,
>> 	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
>> 			   restrictions);
>> }
>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>> +void arch_remove_memory(int nid, u64 start, u64 size,
>> +			struct vmem_altmap *altmap)
>> +{
>> +	unsigned long start_pfn = start >> PAGE_SHIFT;
>> +	unsigned long nr_pages = size >> PAGE_SHIFT;
>> +	struct zone *zone;
>> +
>> +	/*
>> +	 * FIXME: Cleanup page tables (also in arch_add_memory() in case
>> +	 * adding fails). Until then, this function should only be used
>> +	 * during memory hotplug (adding memory), not for memory
>> +	 * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
>> +	 * unlocked yet.
>> +	 */
>> +	zone = page_zone(pfn_to_page(start_pfn));
> 
> Compared with arch_remove_memory in x86. If altmap is not NULL, zone will be
> retrieved from page related to altmap. Not sure why this is not the same?

This is a minimal implementation, sufficient for this use case here. A
full implementation is in the works. For now, this function will not be
used with an altmap (ZONE_DEVICE is not esupported for arm64 yet).

Thanks!

> 
>> +	__remove_pages(zone, start_pfn, nr_pages, altmap);
>> +}
>> +#endif
>> #endif
>> -- 
>> 2.20.1
> 


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 05/11] drivers/base/memory: Pass a block_id to init_memory_block()
  2019-06-03 21:49   ` Wei Yang
@ 2019-06-04  6:56     ` David Hildenbrand
  0 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-06-04  6:56 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Igor Mammedov,
	Greg Kroah-Hartman, Rafael J. Wysocki

On 03.06.19 23:49, Wei Yang wrote:
> On Mon, May 27, 2019 at 01:11:46PM +0200, David Hildenbrand wrote:
>> We'll rework hotplug_memory_register() shortly, so it no longer consumes
>> pass a section.
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c | 15 +++++++--------
>> 1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index f180427e48f4..f914fa6fe350 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -651,21 +651,18 @@ int register_memory(struct memory_block *memory)
>> 	return ret;
>> }
>>
>> -static int init_memory_block(struct memory_block **memory,
>> -			     struct mem_section *section, unsigned long state)
>> +static int init_memory_block(struct memory_block **memory, int block_id,
>> +			     unsigned long state)
>> {
>> 	struct memory_block *mem;
>> 	unsigned long start_pfn;
>> -	int scn_nr;
>> 	int ret = 0;
>>
>> 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>> 	if (!mem)
>> 		return -ENOMEM;
>>
>> -	scn_nr = __section_nr(section);
>> -	mem->start_section_nr =
>> -			base_memory_block_id(scn_nr) * sections_per_block;
>> +	mem->start_section_nr = block_id * sections_per_block;
>> 	mem->end_section_nr = mem->start_section_nr + sections_per_block - 1;
>> 	mem->state = state;
>> 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
>> @@ -694,7 +691,8 @@ static int add_memory_block(int base_section_nr)
>>
>> 	if (section_count == 0)
>> 		return 0;
>> -	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
>> +	ret = init_memory_block(&mem, base_memory_block_id(base_section_nr),
>> +				MEM_ONLINE);
> 
> If my understanding is correct, section_nr could be removed too.

Yes you are, this has already been addressed in linux-next.


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE
  2019-06-03 22:15   ` Wei Yang
@ 2019-06-04  6:59     ` David Hildenbrand
  2019-06-04  8:31       ` Wei Yang
  0 siblings, 1 reply; 68+ messages in thread
From: David Hildenbrand @ 2019-06-04  6:59 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Igor Mammedov,
	Tony Luck, Fenghua Yu, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Martin Schwidefsky, Heiko Carstens,
	Yoshinori Sato, Rich Felker, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Rafael J. Wysocki,
	Michal Hocko, Mike Rapoport, Oscar Salvador, Kirill A. Shutemov,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Christophe Leroy, Nicholas Piggin, Vasily Gorbik, Rob Herring,
	Masahiro Yamada, mike.travis, Andrew Banman, Pavel Tatashin,
	Wei Yang, Arun KS, Qian Cai, Mathieu Malaterre, Baoquan He,
	Logan Gunthorpe, Anshuman Khandual

On 04.06.19 00:15, Wei Yang wrote:
> Allow arch_remove_pages() or arch_remove_memory()?

Looks like I merged __remove_pages() and arch_remove_memory().

@Andrew, can you fix this up to

"mm/memory_hotplug: Allow arch_remove_memory() without
CONFIG_MEMORY_HOTREMOVE"

? Thanks!

> 
> And want to confirm the kernel build on affected arch succeed?

I compile-tested on s390x and x86. As the patches are in linux-next for
some time, I think the other builds are also fine.

Thanks!

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE
  2019-06-04  6:59     ` David Hildenbrand
@ 2019-06-04  8:31       ` Wei Yang
  2019-06-04  9:00         ` David Hildenbrand
  0 siblings, 1 reply; 68+ messages in thread
From: Wei Yang @ 2019-06-04  8:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-mm, linux-kernel, linux-ia64, linuxppc-dev,
	linux-s390, linux-sh, linux-arm-kernel, akpm, Dan Williams,
	Igor Mammedov, Tony Luck, Fenghua Yu, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Greg Kroah-Hartman,
	Rafael J. Wysocki, Michal Hocko, Mike Rapoport, Oscar Salvador,
	Kirill A. Shutemov, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Christophe Leroy, Nicholas Piggin, Vasily Gorbik,
	Rob Herring, Masahiro Yamada, mike.travis, Andrew Banman,
	Pavel Tatashin, Wei Yang, Arun KS, Qian Cai, Mathieu Malaterre,
	Baoquan He, Logan Gunthorpe, Anshuman Khandual

On Tue, Jun 04, 2019 at 08:59:43AM +0200, David Hildenbrand wrote:
>On 04.06.19 00:15, Wei Yang wrote:
>> Allow arch_remove_pages() or arch_remove_memory()?
>
>Looks like I merged __remove_pages() and arch_remove_memory().
>
>@Andrew, can you fix this up to
>
>"mm/memory_hotplug: Allow arch_remove_memory() without
>CONFIG_MEMORY_HOTREMOVE"
>
>? Thanks!
>

Already merged?

>> 
>> And want to confirm the kernel build on affected arch succeed?
>
>I compile-tested on s390x and x86. As the patches are in linux-next for
>some time, I think the other builds are also fine.
>

Yep, sounds good~

>Thanks!
>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE
  2019-06-04  8:31       ` Wei Yang
@ 2019-06-04  9:00         ` David Hildenbrand
  0 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-06-04  9:00 UTC (permalink / raw)
  To: Wei Yang
  Cc: Wei Yang, linux-mm, linux-kernel, linux-ia64, linuxppc-dev,
	linux-s390, linux-sh, linux-arm-kernel, akpm, Dan Williams,
	Igor Mammedov, Tony Luck, Fenghua Yu, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Greg Kroah-Hartman,
	Rafael J. Wysocki, Michal Hocko, Mike Rapoport, Oscar Salvador,
	Kirill A. Shutemov, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Christophe Leroy, Nicholas Piggin, Vasily Gorbik,
	Rob Herring, Masahiro Yamada, mike.travis, Andrew Banman,
	Pavel Tatashin, Arun KS, Qian Cai, Mathieu Malaterre, Baoquan He,
	Logan Gunthorpe, Anshuman Khandual

On 04.06.19 10:31, Wei Yang wrote:
> On Tue, Jun 04, 2019 at 08:59:43AM +0200, David Hildenbrand wrote:
>> On 04.06.19 00:15, Wei Yang wrote:
>>> Allow arch_remove_pages() or arch_remove_memory()?
>>
>> Looks like I merged __remove_pages() and arch_remove_memory().
>>
>> @Andrew, can you fix this up to
>>
>> "mm/memory_hotplug: Allow arch_remove_memory() without
>> CONFIG_MEMORY_HOTREMOVE"
>>
>> ? Thanks!
>>
> 
> Already merged?

Andrew picked it up, but it's not in linus' tree yet.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation
  2019-06-04  6:56     ` David Hildenbrand
@ 2019-06-04 17:36       ` Robin Murphy
  2019-06-04 17:51         ` David Hildenbrand
  0 siblings, 1 reply; 68+ messages in thread
From: Robin Murphy @ 2019-06-04 17:36 UTC (permalink / raw)
  To: David Hildenbrand, Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Igor Mammedov,
	Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
	Chintan Pandya, Mike Rapoport, Jun Yao, Yu Zhao,
	Anshuman Khandual

On 04/06/2019 07:56, David Hildenbrand wrote:
> On 03.06.19 23:41, Wei Yang wrote:
>> On Mon, May 27, 2019 at 01:11:45PM +0200, David Hildenbrand wrote:
>>> A proper arch_remove_memory() implementation is on its way, which also
>>> cleanly removes page tables in arch_add_memory() in case something goes
>>> wrong.
>>
>> Would this be better to understand?
>>
>>      removes page tables created in arch_add_memory
> 
> That's not what this sentence expresses. Have a look at
> arch_add_memory(), in case  __add_pages() fails, the page tables are not
> removed. This will also be fixed by Anshuman in the same shot.
> 
>>
>>>
>>> As we want to use arch_remove_memory() in case something goes wrong
>>> during memory hotplug after arch_add_memory() finished, let's add
>>> a temporary hack that is sufficient enough until we get a proper
>>> implementation that cleans up page table entries.
>>>
>>> We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up
>>> patches.
>>>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will.deacon@arm.com>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> Cc: Chintan Pandya <cpandya@codeaurora.org>
>>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>>> Cc: Jun Yao <yaojun8558363@gmail.com>
>>> Cc: Yu Zhao <yuzhao@google.com>
>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>> arch/arm64/mm/mmu.c | 19 +++++++++++++++++++
>>> 1 file changed, 19 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index a1bfc4413982..e569a543c384 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -1084,4 +1084,23 @@ int arch_add_memory(int nid, u64 start, u64 size,
>>> 	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
>>> 			   restrictions);
>>> }
>>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>>> +void arch_remove_memory(int nid, u64 start, u64 size,
>>> +			struct vmem_altmap *altmap)
>>> +{
>>> +	unsigned long start_pfn = start >> PAGE_SHIFT;
>>> +	unsigned long nr_pages = size >> PAGE_SHIFT;
>>> +	struct zone *zone;
>>> +
>>> +	/*
>>> +	 * FIXME: Cleanup page tables (also in arch_add_memory() in case
>>> +	 * adding fails). Until then, this function should only be used
>>> +	 * during memory hotplug (adding memory), not for memory
>>> +	 * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
>>> +	 * unlocked yet.
>>> +	 */
>>> +	zone = page_zone(pfn_to_page(start_pfn));
>>
>> Compared with arch_remove_memory in x86. If altmap is not NULL, zone will be
>> retrieved from page related to altmap. Not sure why this is not the same?
> 
> This is a minimal implementation, sufficient for this use case here. A
> full implementation is in the works. For now, this function will not be
> used with an altmap (ZONE_DEVICE is not esupported for arm64 yet).

FWIW the other pieces of ZONE_DEVICE are now due to land in parallel, 
but as long as we don't throw the ARCH_ENABLE_MEMORY_HOTREMOVE switch 
then there should still be no issue. Besides, given that we should 
consistently ignore the altmap everywhere at the moment, it may even 
work out regardless.

One thing stands out about the failure path thing, though - if 
__add_pages() did fail, can it still be guaranteed to have initialised 
the memmap such that page_zone() won't return nonsense? Last time I 
looked that was still a problem when removing memory which had been 
successfully added, but never onlined (although I do know that 
particular case was already being discussed at the time, and I've not 
been paying the greatest attention since).

Robin.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation
  2019-06-04 17:36       ` Robin Murphy
@ 2019-06-04 17:51         ` David Hildenbrand
  0 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-06-04 17:51 UTC (permalink / raw)
  To: Robin Murphy, Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Igor Mammedov,
	Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel,
	Chintan Pandya, Mike Rapoport, Jun Yao, Yu Zhao,
	Anshuman Khandual

On 04.06.19 19:36, Robin Murphy wrote:
> On 04/06/2019 07:56, David Hildenbrand wrote:
>> On 03.06.19 23:41, Wei Yang wrote:
>>> On Mon, May 27, 2019 at 01:11:45PM +0200, David Hildenbrand wrote:
>>>> A proper arch_remove_memory() implementation is on its way, which also
>>>> cleanly removes page tables in arch_add_memory() in case something goes
>>>> wrong.
>>>
>>> Would this be better to understand?
>>>
>>>      removes page tables created in arch_add_memory
>>
>> That's not what this sentence expresses. Have a look at
>> arch_add_memory(), in case  __add_pages() fails, the page tables are not
>> removed. This will also be fixed by Anshuman in the same shot.
>>
>>>
>>>>
>>>> As we want to use arch_remove_memory() in case something goes wrong
>>>> during memory hotplug after arch_add_memory() finished, let's add
>>>> a temporary hack that is sufficient enough until we get a proper
>>>> implementation that cleans up page table entries.
>>>>
>>>> We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up
>>>> patches.
>>>>
>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>> Cc: Chintan Pandya <cpandya@codeaurora.org>
>>>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>>>> Cc: Jun Yao <yaojun8558363@gmail.com>
>>>> Cc: Yu Zhao <yuzhao@google.com>
>>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>> arch/arm64/mm/mmu.c | 19 +++++++++++++++++++
>>>> 1 file changed, 19 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>> index a1bfc4413982..e569a543c384 100644
>>>> --- a/arch/arm64/mm/mmu.c
>>>> +++ b/arch/arm64/mm/mmu.c
>>>> @@ -1084,4 +1084,23 @@ int arch_add_memory(int nid, u64 start, u64 size,
>>>> 	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
>>>> 			   restrictions);
>>>> }
>>>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>>>> +void arch_remove_memory(int nid, u64 start, u64 size,
>>>> +			struct vmem_altmap *altmap)
>>>> +{
>>>> +	unsigned long start_pfn = start >> PAGE_SHIFT;
>>>> +	unsigned long nr_pages = size >> PAGE_SHIFT;
>>>> +	struct zone *zone;
>>>> +
>>>> +	/*
>>>> +	 * FIXME: Cleanup page tables (also in arch_add_memory() in case
>>>> +	 * adding fails). Until then, this function should only be used
>>>> +	 * during memory hotplug (adding memory), not for memory
>>>> +	 * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
>>>> +	 * unlocked yet.
>>>> +	 */
>>>> +	zone = page_zone(pfn_to_page(start_pfn));
>>>
>>> Compared with arch_remove_memory in x86. If altmap is not NULL, zone will be
>>> retrieved from page related to altmap. Not sure why this is not the same?
>>
>> This is a minimal implementation, sufficient for this use case here. A
>> full implementation is in the works. For now, this function will not be
>> used with an altmap (ZONE_DEVICE is not esupported for arm64 yet).
> 
> FWIW the other pieces of ZONE_DEVICE are now due to land in parallel, 
> but as long as we don't throw the ARCH_ENABLE_MEMORY_HOTREMOVE switch 
> then there should still be no issue. Besides, given that we should 
> consistently ignore the altmap everywhere at the moment, it may even 
> work out regardless.

Thanks for the info.

> 
> One thing stands out about the failure path thing, though - if 
> __add_pages() did fail, can it still be guaranteed to have initialised 
> the memmap such that page_zone() won't return nonsense? Last time I 

if __add_pages() fails, then arch_add_memory() fails and
arch_remove_memory() will not be called in the context of this series.
Only if it succeeded.

> looked that was still a problem when removing memory which had been 
> successfully added, but never onlined (although I do know that 
> particular case was already being discussed at the time, and I've not 
> been paying the greatest attention since).

Yes, that part is next on my list. It works but is ugly. The memory
removal process should not care about zones at all.

Slowly moving into the right direction :)

> 
> Robin.
> 


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-27 11:11 ` [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
  2019-05-30 21:07   ` Pavel Tatashin
@ 2019-06-04 21:42   ` Wei Yang
  2019-06-05  8:58     ` David Hildenbrand
  2019-07-01  8:14   ` Michal Hocko
  2 siblings, 1 reply; 68+ messages in thread
From: Wei Yang @ 2019-06-04 21:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	mike.travis, Ingo Molnar, Andrew Banman, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre

On Mon, May 27, 2019 at 01:11:48PM +0200, David Hildenbrand wrote:
>Only memory to be added to the buddy and to be onlined/offlined by
>user space using /sys/devices/system/memory/... needs (and should have!)
>memory block devices.
>
>Factor out creation of memory block devices. Create all devices after
>arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>because it is now effectively stale.
>
>Only after memory block devices have been added, memory can be onlined
>by user space. This implies, that memory is not visible to user space at
>all before arch_add_memory() succeeded.
>
>While at it
>- use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory()
>- introduce find_memory_block_by_id() to search via block id
>- Use find_memory_block_by_id() in init_memory_block() to catch
>  duplicates

Generally looks good to me besides two tiny comments.

>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: Qian Cai <cai@lca.pw>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c  | 82 +++++++++++++++++++++++++++---------------
> include/linux/memory.h |  2 +-
> mm/memory_hotplug.c    | 15 ++++----
> 3 files changed, 63 insertions(+), 36 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index ac17c95a5f28..5a0370f0c506 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -39,6 +39,11 @@ static inline int base_memory_block_id(int section_nr)
> 	return section_nr / sections_per_block;
> }
> 
>+static inline int pfn_to_block_id(unsigned long pfn)
>+{
>+	return base_memory_block_id(pfn_to_section_nr(pfn));
>+}
>+
> static int memory_subsys_online(struct device *dev);
> static int memory_subsys_offline(struct device *dev);
> 
>@@ -582,10 +587,9 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
>  * A reference for the returned object is held and the reference for the
>  * hinted object is released.
>  */
>-struct memory_block *find_memory_block_hinted(struct mem_section *section,
>-					      struct memory_block *hint)
>+static struct memory_block *find_memory_block_by_id(int block_id,
>+						    struct memory_block *hint)
> {
>-	int block_id = base_memory_block_id(__section_nr(section));
> 	struct device *hintdev = hint ? &hint->dev : NULL;
> 	struct device *dev;
> 
>@@ -597,6 +601,14 @@ struct memory_block *find_memory_block_hinted(struct mem_section *section,
> 	return to_memory_block(dev);
> }
> 
>+struct memory_block *find_memory_block_hinted(struct mem_section *section,
>+					      struct memory_block *hint)
>+{
>+	int block_id = base_memory_block_id(__section_nr(section));
>+
>+	return find_memory_block_by_id(block_id, hint);
>+}
>+
> /*
>  * For now, we have a linear search to go find the appropriate
>  * memory_block corresponding to a particular phys_index. If
>@@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block **memory, int block_id,
> 	unsigned long start_pfn;
> 	int ret = 0;
> 
>+	mem = find_memory_block_by_id(block_id, NULL);
>+	if (mem) {
>+		put_device(&mem->dev);
>+		return -EEXIST;
>+	}

find_memory_block_by_id() is not that close to the main idea in this patch.
Would it be better to split this part?

> 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> 	if (!mem)
> 		return -ENOMEM;
>@@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
> 	return 0;
> }
> 
>+static void unregister_memory(struct memory_block *memory)
>+{
>+	if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
>+		return;
>+
>+	/* drop the ref. we got via find_memory_block() */
>+	put_device(&memory->dev);
>+	device_unregister(&memory->dev);
>+}
>+
> /*
>- * need an interface for the VM to add new memory regions,
>- * but without onlining it.
>+ * Create memory block devices for the given memory area. Start and size
>+ * have to be aligned to memory block granularity. Memory block devices
>+ * will be initialized as offline.
>  */
>-int hotplug_memory_register(int nid, struct mem_section *section)
>+int create_memory_block_devices(unsigned long start, unsigned long size)
> {
>-	int block_id = base_memory_block_id(__section_nr(section));
>-	int ret = 0;
>+	const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>+	int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
> 	struct memory_block *mem;
>+	unsigned long block_id;
>+	int ret = 0;
> 
>-	mutex_lock(&mem_sysfs_mutex);
>+	if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>+			 !IS_ALIGNED(size, memory_block_size_bytes())))
>+		return -EINVAL;
> 
>-	mem = find_memory_block(section);
>-	if (mem) {
>-		mem->section_count++;
>-		put_device(&mem->dev);
>-	} else {
>+	mutex_lock(&mem_sysfs_mutex);
>+	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
> 		ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
> 		if (ret)
>-			goto out;
>-		mem->section_count++;
>+			break;
>+		mem->section_count = sections_per_block;
>+	}
>+	if (ret) {
>+		end_block_id = block_id;
>+		for (block_id = start_block_id; block_id != end_block_id;
>+		     block_id++) {
>+			mem = find_memory_block_by_id(block_id, NULL);
>+			mem->section_count = 0;
>+			unregister_memory(mem);
>+		}
> 	}

Would it be better to do this in reverse order?

And unregister_memory() would free mem, so it is still necessary to set
section_count to 0?

>-
>-out:
> 	mutex_unlock(&mem_sysfs_mutex);
> 	return ret;
> }
> 
>-static void
>-unregister_memory(struct memory_block *memory)
>-{
>-	BUG_ON(memory->dev.bus != &memory_subsys);
>-
>-	/* drop the ref. we got via find_memory_block() */
>-	put_device(&memory->dev);
>-	device_unregister(&memory->dev);
>-}
>-
> void unregister_memory_section(struct mem_section *section)
> {
> 	struct memory_block *mem;
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index 474c7c60c8f2..db3e8567f900 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -111,7 +111,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
> extern void unregister_memory_notifier(struct notifier_block *nb);
> extern int register_memory_isolate_notifier(struct notifier_block *nb);
> extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>-int hotplug_memory_register(int nid, struct mem_section *section);
>+int create_memory_block_devices(unsigned long start, unsigned long size);
> extern void unregister_memory_section(struct mem_section *);
> extern int memory_dev_init(void);
> extern int memory_notify(unsigned long val, void *v);
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 4b9d2974f86c..b1fde90bbf19 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -259,13 +259,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> 		return -EEXIST;
> 
> 	ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
>-	if (ret < 0)
>-		return ret;
>-
>-	if (!want_memblock)
>-		return 0;
>-
>-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
>+	return ret < 0 ? ret : 0;
> }
> 
> /*
>@@ -1107,6 +1101,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
> 	if (ret < 0)
> 		goto error;
> 
>+	/* create memory block devices after memory was added */
>+	ret = create_memory_block_devices(start, size);
>+	if (ret) {
>+		arch_remove_memory(nid, start, size, NULL);
>+		goto error;
>+	}
>+
> 	if (new_node) {
> 		/* If sysfs file of new node can't be created, cpu on the node
> 		 * can't be hot-added. There is no rollback way now.
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 08/11] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-27 11:11 ` [PATCH v3 08/11] mm/memory_hotplug: Drop MHP_MEMBLOCK_API David Hildenbrand
@ 2019-06-04 21:47   ` Wei Yang
  2019-07-01  8:15   ` Michal Hocko
  1 sibling, 0 replies; 68+ messages in thread
From: Wei Yang @ 2019-06-04 21:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Michal Hocko, Oscar Salvador, Pavel Tatashin,
	Joonsoo Kim, Qian Cai, Arun KS, Mathieu Malaterre

On Mon, May 27, 2019 at 01:11:49PM +0200, David Hildenbrand wrote:
>No longer needed, the callers of arch_add_memory() can handle this
>manually.
>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Oscar Salvador <osalvador@suse.com>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>Cc: Qian Cai <cai@lca.pw>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>

>---
> include/linux/memory_hotplug.h | 8 --------
> mm/memory_hotplug.c            | 9 +++------
> 2 files changed, 3 insertions(+), 14 deletions(-)
>
>diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>index 2d4de313926d..2f1f87e13baa 100644
>--- a/include/linux/memory_hotplug.h
>+++ b/include/linux/memory_hotplug.h
>@@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
> extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
> 			   unsigned long nr_pages, struct vmem_altmap *altmap);
> 
>-/*
>- * Do we want sysfs memblock files created. This will allow userspace to online
>- * and offline memory explicitly. Lack of this bit means that the caller has to
>- * call move_pfn_range_to_zone to finish the initialization.
>- */
>-
>-#define MHP_MEMBLOCK_API               (1<<0)
>-
> /* reasonably generic interface to expand the physical pages */
> extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> 		       struct mhp_restrictions *restrictions);
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index b1fde90bbf19..9a92549ef23b 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -251,7 +251,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
> #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
> 
> static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>-		struct vmem_altmap *altmap, bool want_memblock)
>+				   struct vmem_altmap *altmap)
> {
> 	int ret;
> 
>@@ -294,8 +294,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
> 	}
> 
> 	for (i = start_sec; i <= end_sec; i++) {
>-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
>-				restrictions->flags & MHP_MEMBLOCK_API);
>+		err = __add_section(nid, section_nr_to_pfn(i), altmap);
> 
> 		/*
> 		 * EEXIST is finally dealt with by ioresource collision
>@@ -1067,9 +1066,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>  */
> int __ref add_memory_resource(int nid, struct resource *res)
> {
>-	struct mhp_restrictions restrictions = {
>-		.flags = MHP_MEMBLOCK_API,
>-	};
>+	struct mhp_restrictions restrictions = {};
> 	u64 start, size;
> 	bool new_node = false;
> 	int ret;
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
  2019-05-27 11:11 ` [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
@ 2019-06-04 22:07   ` Wei Yang
  2019-06-05  9:00     ` David Hildenbrand
  2019-07-01  8:41   ` Michal Hocko
  1 sibling, 1 reply; 68+ messages in thread
From: Wei Yang @ 2019-06-04 22:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	mike.travis, Andrew Banman, Ingo Molnar, Alex Deucher,
	David S. Miller, Mark Brown, Chris Wilson, Oscar Salvador,
	Jonathan Cameron, Michal Hocko, Pavel Tatashin, Arun KS,
	Mathieu Malaterre

On Mon, May 27, 2019 at 01:11:50PM +0200, David Hildenbrand wrote:
>Let's factor out removing of memory block devices, which is only
>necessary for memory added via add_memory() and friends that created
>memory block devices. Remove the devices before calling
>arch_remove_memory().
>
>This finishes factoring out memory block device handling from
>arch_add_memory() and arch_remove_memory().
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Alex Deucher <alexander.deucher@amd.com>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Mark Brown <broonie@kernel.org>
>Cc: Chris Wilson <chris@chris-wilson.co.uk>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c  | 37 ++++++++++++++++++-------------------
> drivers/base/node.c    | 11 ++++++-----
> include/linux/memory.h |  2 +-
> include/linux/node.h   |  6 ++----
> mm/memory_hotplug.c    |  5 +++--
> 5 files changed, 30 insertions(+), 31 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 5a0370f0c506..f28efb0bf5c7 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -763,32 +763,31 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
> 	return ret;
> }
> 
>-void unregister_memory_section(struct mem_section *section)
>+/*
>+ * Remove memory block devices for the given memory area. Start and size
>+ * have to be aligned to memory block granularity. Memory block devices
>+ * have to be offline.
>+ */
>+void remove_memory_block_devices(unsigned long start, unsigned long size)
> {
>+	const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>+	const int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
> 	struct memory_block *mem;
>+	int block_id;
> 
>-	if (WARN_ON_ONCE(!present_section(section)))
>+	if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>+			 !IS_ALIGNED(size, memory_block_size_bytes())))
> 		return;
> 
> 	mutex_lock(&mem_sysfs_mutex);
>-
>-	/*
>-	 * Some users of the memory hotplug do not want/need memblock to
>-	 * track all sections. Skip over those.
>-	 */
>-	mem = find_memory_block(section);
>-	if (!mem)
>-		goto out_unlock;
>-
>-	unregister_mem_sect_under_nodes(mem, __section_nr(section));
>-
>-	mem->section_count--;
>-	if (mem->section_count == 0)
>+	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>+		mem = find_memory_block_by_id(block_id, NULL);
>+		if (WARN_ON_ONCE(!mem))
>+			continue;
>+		mem->section_count = 0;

Is this step necessary?

>+		unregister_memory_block_under_nodes(mem);
> 		unregister_memory(mem);
>-	else
>-		put_device(&mem->dev);
>-
>-out_unlock:
>+	}
> 	mutex_unlock(&mem_sysfs_mutex);
> }
> 
>diff --git a/drivers/base/node.c b/drivers/base/node.c
>index 8598fcbd2a17..04fdfa99b8bc 100644
>--- a/drivers/base/node.c
>+++ b/drivers/base/node.c
>@@ -801,9 +801,10 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
> 	return 0;
> }
> 
>-/* unregister memory section under all nodes that it spans */
>-int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>-				    unsigned long phys_index)
>+/*
>+ * Unregister memory block device under all nodes that it spans.
>+ */
>+int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
> {
> 	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
> 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
>@@ -816,8 +817,8 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> 		return -ENOMEM;
> 	nodes_clear(*unlinked_nodes);
> 
>-	sect_start_pfn = section_nr_to_pfn(phys_index);
>-	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
>+	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
>+	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
> 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
> 		int nid;
> 
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index db3e8567f900..f26a5417ec5d 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -112,7 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
> extern int register_memory_isolate_notifier(struct notifier_block *nb);
> extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
> int create_memory_block_devices(unsigned long start, unsigned long size);
>-extern void unregister_memory_section(struct mem_section *);
>+void remove_memory_block_devices(unsigned long start, unsigned long size);
> extern int memory_dev_init(void);
> extern int memory_notify(unsigned long val, void *v);
> extern int memory_isolate_notify(unsigned long val, void *v);
>diff --git a/include/linux/node.h b/include/linux/node.h
>index 1a557c589ecb..02a29e71b175 100644
>--- a/include/linux/node.h
>+++ b/include/linux/node.h
>@@ -139,8 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
> extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
> extern int register_mem_sect_under_node(struct memory_block *mem_blk,
> 						void *arg);
>-extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>-					   unsigned long phys_index);
>+extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
> 
> extern int register_memory_node_under_compute_node(unsigned int mem_nid,
> 						   unsigned int cpu_nid,
>@@ -176,8 +175,7 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
> {
> 	return 0;
> }
>-static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>-						  unsigned long phys_index)
>+static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
> {
> 	return 0;
> }
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 9a92549ef23b..82136c5b4c5f 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -520,8 +520,6 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
> 	if (WARN_ON_ONCE(!valid_section(ms)))
> 		return;
> 
>-	unregister_memory_section(ms);
>-
> 	scn_nr = __section_nr(ms);
> 	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
> 	__remove_zone(zone, start_pfn);
>@@ -1845,6 +1843,9 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
> 	memblock_free(start, size);
> 	memblock_remove(start, size);
> 
>+	/* remove memory block devices before removing memory */
>+	remove_memory_block_devices(start, size);
>+
> 	arch_remove_memory(nid, start, size, NULL);
> 	__release_memory_resource(start, size);
> 
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-06-04 21:42   ` Wei Yang
@ 2019-06-05  8:58     ` David Hildenbrand
  2019-06-05 10:58       ` David Hildenbrand
  0 siblings, 1 reply; 68+ messages in thread
From: David Hildenbrand @ 2019-06-05  8:58 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Igor Mammedov,
	Greg Kroah-Hartman, Rafael J. Wysocki, mike.travis, Ingo Molnar,
	Andrew Banman, Oscar Salvador, Michal Hocko, Pavel Tatashin,
	Qian Cai, Arun KS, Mathieu Malaterre

>> /*
>>  * For now, we have a linear search to go find the appropriate
>>  * memory_block corresponding to a particular phys_index. If
>> @@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block **memory, int block_id,
>> 	unsigned long start_pfn;
>> 	int ret = 0;
>>
>> +	mem = find_memory_block_by_id(block_id, NULL);
>> +	if (mem) {
>> +		put_device(&mem->dev);
>> +		return -EEXIST;
>> +	}
> 
> find_memory_block_by_id() is not that close to the main idea in this patch.
> Would it be better to split this part?

I played with that but didn't like the temporary results (e.g. having to
export find_memory_block_by_id()). I'll stick to this for now.

> 
>> 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>> 	if (!mem)
>> 		return -ENOMEM;
>> @@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
>> 	return 0;
>> }
>>
>> +static void unregister_memory(struct memory_block *memory)
>> +{
>> +	if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
>> +		return;
>> +
>> +	/* drop the ref. we got via find_memory_block() */
>> +	put_device(&memory->dev);
>> +	device_unregister(&memory->dev);
>> +}
>> +
>> /*
>> - * need an interface for the VM to add new memory regions,
>> - * but without onlining it.
>> + * Create memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * will be initialized as offline.
>>  */
>> -int hotplug_memory_register(int nid, struct mem_section *section)
>> +int create_memory_block_devices(unsigned long start, unsigned long size)
>> {
>> -	int block_id = base_memory_block_id(__section_nr(section));
>> -	int ret = 0;
>> +	const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>> +	int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>> 	struct memory_block *mem;
>> +	unsigned long block_id;
>> +	int ret = 0;
>>
>> -	mutex_lock(&mem_sysfs_mutex);
>> +	if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>> +			 !IS_ALIGNED(size, memory_block_size_bytes())))
>> +		return -EINVAL;
>>
>> -	mem = find_memory_block(section);
>> -	if (mem) {
>> -		mem->section_count++;
>> -		put_device(&mem->dev);
>> -	} else {
>> +	mutex_lock(&mem_sysfs_mutex);
>> +	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>> 		ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
>> 		if (ret)
>> -			goto out;
>> -		mem->section_count++;
>> +			break;
>> +		mem->section_count = sections_per_block;
>> +	}
>> +	if (ret) {
>> +		end_block_id = block_id;
>> +		for (block_id = start_block_id; block_id != end_block_id;
>> +		     block_id++) {
>> +			mem = find_memory_block_by_id(block_id, NULL);
>> +			mem->section_count = 0;
>> +			unregister_memory(mem);
>> +		}
>> 	}
> 
> Would it be better to do this in reverse order?
> 
> And unregister_memory() would free mem, so it is still necessary to set
> section_count to 0?

1. I kept the existing behavior (setting it to 0) for now. I am planning
to eventually remove the section count completely (it could be
beneficial to detect removing of partially populated memory blocks).

2. Reverse order: We would have to start with "block_id - 1", I don't
like that better.

Thanks for having a look!

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
  2019-06-04 22:07   ` Wei Yang
@ 2019-06-05  9:00     ` David Hildenbrand
  0 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-06-05  9:00 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Igor Mammedov,
	Greg Kroah-Hartman, Rafael J. Wysocki, mike.travis,
	Andrew Banman, Ingo Molnar, Alex Deucher, David S. Miller,
	Mark Brown, Chris Wilson, Oscar Salvador, Jonathan Cameron,
	Michal Hocko, Pavel Tatashin, Arun KS, Mathieu Malaterre

On 05.06.19 00:07, Wei Yang wrote:
> On Mon, May 27, 2019 at 01:11:50PM +0200, David Hildenbrand wrote:
>> Let's factor out removing of memory block devices, which is only
>> necessary for memory added via add_memory() and friends that created
>> memory block devices. Remove the devices before calling
>> arch_remove_memory().
>>
>> This finishes factoring out memory block device handling from
>> arch_add_memory() and arch_remove_memory().
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Mark Brown <broonie@kernel.org>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>> Cc: Arun KS <arunks@codeaurora.org>
>> Cc: Mathieu Malaterre <malat@debian.org>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c  | 37 ++++++++++++++++++-------------------
>> drivers/base/node.c    | 11 ++++++-----
>> include/linux/memory.h |  2 +-
>> include/linux/node.h   |  6 ++----
>> mm/memory_hotplug.c    |  5 +++--
>> 5 files changed, 30 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 5a0370f0c506..f28efb0bf5c7 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -763,32 +763,31 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
>> 	return ret;
>> }
>>
>> -void unregister_memory_section(struct mem_section *section)
>> +/*
>> + * Remove memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * have to be offline.
>> + */
>> +void remove_memory_block_devices(unsigned long start, unsigned long size)
>> {
>> +	const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>> +	const int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>> 	struct memory_block *mem;
>> +	int block_id;
>>
>> -	if (WARN_ON_ONCE(!present_section(section)))
>> +	if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>> +			 !IS_ALIGNED(size, memory_block_size_bytes())))
>> 		return;
>>
>> 	mutex_lock(&mem_sysfs_mutex);
>> -
>> -	/*
>> -	 * Some users of the memory hotplug do not want/need memblock to
>> -	 * track all sections. Skip over those.
>> -	 */
>> -	mem = find_memory_block(section);
>> -	if (!mem)
>> -		goto out_unlock;
>> -
>> -	unregister_mem_sect_under_nodes(mem, __section_nr(section));
>> -
>> -	mem->section_count--;
>> -	if (mem->section_count == 0)
>> +	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>> +		mem = find_memory_block_by_id(block_id, NULL);
>> +		if (WARN_ON_ONCE(!mem))
>> +			continue;
>> +		mem->section_count = 0;
> 
> Is this step necessary?

It's what the previous code does, it might not be - I'll leave it like
that for now. As mentioned in another reply, I might remove the
section_count completely, eventually.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-06-05  8:58     ` David Hildenbrand
@ 2019-06-05 10:58       ` David Hildenbrand
  2019-06-05 21:22         ` Wei Yang
  0 siblings, 1 reply; 68+ messages in thread
From: David Hildenbrand @ 2019-06-05 10:58 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Igor Mammedov,
	Greg Kroah-Hartman, Rafael J. Wysocki, mike.travis, Ingo Molnar,
	Andrew Banman, Oscar Salvador, Michal Hocko, Pavel Tatashin,
	Qian Cai, Arun KS, Mathieu Malaterre

On 05.06.19 10:58, David Hildenbrand wrote:
>>> /*
>>>  * For now, we have a linear search to go find the appropriate
>>>  * memory_block corresponding to a particular phys_index. If
>>> @@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block **memory, int block_id,
>>> 	unsigned long start_pfn;
>>> 	int ret = 0;
>>>
>>> +	mem = find_memory_block_by_id(block_id, NULL);
>>> +	if (mem) {
>>> +		put_device(&mem->dev);
>>> +		return -EEXIST;
>>> +	}
>>
>> find_memory_block_by_id() is not that close to the main idea in this patch.
>> Would it be better to split this part?
> 
> I played with that but didn't like the temporary results (e.g. having to
> export find_memory_block_by_id()). I'll stick to this for now.
> 
>>
>>> 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>> 	if (!mem)
>>> 		return -ENOMEM;
>>> @@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
>>> 	return 0;
>>> }
>>>
>>> +static void unregister_memory(struct memory_block *memory)
>>> +{
>>> +	if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
>>> +		return;
>>> +
>>> +	/* drop the ref. we got via find_memory_block() */
>>> +	put_device(&memory->dev);
>>> +	device_unregister(&memory->dev);
>>> +}
>>> +
>>> /*
>>> - * need an interface for the VM to add new memory regions,
>>> - * but without onlining it.
>>> + * Create memory block devices for the given memory area. Start and size
>>> + * have to be aligned to memory block granularity. Memory block devices
>>> + * will be initialized as offline.
>>>  */
>>> -int hotplug_memory_register(int nid, struct mem_section *section)
>>> +int create_memory_block_devices(unsigned long start, unsigned long size)
>>> {
>>> -	int block_id = base_memory_block_id(__section_nr(section));
>>> -	int ret = 0;
>>> +	const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>>> +	int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>>> 	struct memory_block *mem;
>>> +	unsigned long block_id;
>>> +	int ret = 0;
>>>
>>> -	mutex_lock(&mem_sysfs_mutex);
>>> +	if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>>> +			 !IS_ALIGNED(size, memory_block_size_bytes())))
>>> +		return -EINVAL;
>>>
>>> -	mem = find_memory_block(section);
>>> -	if (mem) {
>>> -		mem->section_count++;
>>> -		put_device(&mem->dev);
>>> -	} else {
>>> +	mutex_lock(&mem_sysfs_mutex);
>>> +	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>>> 		ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
>>> 		if (ret)
>>> -			goto out;
>>> -		mem->section_count++;
>>> +			break;
>>> +		mem->section_count = sections_per_block;
>>> +	}
>>> +	if (ret) {
>>> +		end_block_id = block_id;
>>> +		for (block_id = start_block_id; block_id != end_block_id;
>>> +		     block_id++) {
>>> +			mem = find_memory_block_by_id(block_id, NULL);
>>> +			mem->section_count = 0;
>>> +			unregister_memory(mem);
>>> +		}
>>> 	}
>>
>> Would it be better to do this in reverse order?
>>
>> And unregister_memory() would free mem, so it is still necessary to set
>> section_count to 0?
> 
> 1. I kept the existing behavior (setting it to 0) for now. I am planning
> to eventually remove the section count completely (it could be
> beneficial to detect removing of partially populated memory blocks).

Correction: We already use it to block offlining of partially populated
memory blocks \o/

> 
> 2. Reverse order: We would have to start with "block_id - 1", I don't
> like that better.
> 
> Thanks for having a look!
> 


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-05-27 11:11 ` [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
@ 2019-06-05 21:21   ` Wei Yang
  2019-06-10 16:56   ` Oscar Salvador
  2019-07-01  8:51   ` Michal Hocko
  2 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2019-06-05 21:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Oscar Salvador, Jonathan Cameron

On Mon, May 27, 2019 at 01:11:51PM +0200, David Hildenbrand wrote:
>We really don't want anything during memory hotunplug to fail.
>We always pass a valid memory block device, that check can go. Avoid
>allocating memory and eventually failing. As we are always called under
>lock, we can use a static piece of memory. This avoids having to put
>the structure onto the stack, having to guess about the stack size
>of callers.
>
>Patch inspired by a patch from Oscar Salvador.
>
>In the future, there might be no need to iterate over nodes at all.
>mem->nid should tell us exactly what to remove. Memory block devices
>with mixed nodes (added during boot) should properly fenced off and never
>removed.
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: Alex Deucher <alexander.deucher@amd.com>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Mark Brown <broonie@kernel.org>
>Cc: Chris Wilson <chris@chris-wilson.co.uk>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>

>---
> drivers/base/node.c  | 18 +++++-------------
> include/linux/node.h |  5 ++---
> 2 files changed, 7 insertions(+), 16 deletions(-)
>
>diff --git a/drivers/base/node.c b/drivers/base/node.c
>index 04fdfa99b8bc..9be88fd05147 100644
>--- a/drivers/base/node.c
>+++ b/drivers/base/node.c
>@@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
> 
> /*
>  * Unregister memory block device under all nodes that it spans.
>+ * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).
>  */
>-int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>+void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
> {
>-	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
> 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
>+	static nodemask_t unlinked_nodes;
> 
>-	if (!mem_blk) {
>-		NODEMASK_FREE(unlinked_nodes);
>-		return -EFAULT;
>-	}
>-	if (!unlinked_nodes)
>-		return -ENOMEM;
>-	nodes_clear(*unlinked_nodes);
>-
>+	nodes_clear(unlinked_nodes);
> 	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
> 	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
> 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
>@@ -827,15 +821,13 @@ int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
> 			continue;
> 		if (!node_online(nid))
> 			continue;
>-		if (node_test_and_set(nid, *unlinked_nodes))
>+		if (node_test_and_set(nid, unlinked_nodes))
> 			continue;
> 		sysfs_remove_link(&node_devices[nid]->dev.kobj,
> 			 kobject_name(&mem_blk->dev.kobj));
> 		sysfs_remove_link(&mem_blk->dev.kobj,
> 			 kobject_name(&node_devices[nid]->dev.kobj));
> 	}
>-	NODEMASK_FREE(unlinked_nodes);
>-	return 0;
> }
> 
> int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn)
>diff --git a/include/linux/node.h b/include/linux/node.h
>index 02a29e71b175..548c226966a2 100644
>--- a/include/linux/node.h
>+++ b/include/linux/node.h
>@@ -139,7 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
> extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
> extern int register_mem_sect_under_node(struct memory_block *mem_blk,
> 						void *arg);
>-extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
>+extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
> 
> extern int register_memory_node_under_compute_node(unsigned int mem_nid,
> 						   unsigned int cpu_nid,
>@@ -175,9 +175,8 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
> {
> 	return 0;
> }
>-static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>+static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
> {
>-	return 0;
> }
> 
> static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section
  2019-05-27 11:11 ` [PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
@ 2019-06-05 21:21   ` Wei Yang
  2019-06-10 16:58   ` Oscar Salvador
  2019-07-01  8:52   ` Michal Hocko
  2 siblings, 0 replies; 68+ messages in thread
From: Wei Yang @ 2019-06-05 21:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov

On Mon, May 27, 2019 at 01:11:52PM +0200, David Hildenbrand wrote:
>The parameter is unused, so let's drop it. Memory removal paths should
>never care about zones. This is the job of memory offlining and will
>require more refactorings.
>
>Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>

>---
> include/linux/memory_hotplug.h | 2 +-
> mm/memory_hotplug.c            | 2 +-
> mm/sparse.c                    | 4 ++--
> 3 files changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>index 2f1f87e13baa..1a4257c5f74c 100644
>--- a/include/linux/memory_hotplug.h
>+++ b/include/linux/memory_hotplug.h
>@@ -346,7 +346,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> extern bool is_memblock_offlined(struct memory_block *mem);
> extern int sparse_add_one_section(int nid, unsigned long start_pfn,
> 				  struct vmem_altmap *altmap);
>-extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
>+extern void sparse_remove_one_section(struct mem_section *ms,
> 		unsigned long map_offset, struct vmem_altmap *altmap);
> extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
> 					  unsigned long pnum);
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 82136c5b4c5f..e48ec7b9dee2 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -524,7 +524,7 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
> 	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
> 	__remove_zone(zone, start_pfn);
> 
>-	sparse_remove_one_section(zone, ms, map_offset, altmap);
>+	sparse_remove_one_section(ms, map_offset, altmap);
> }
> 
> /**
>diff --git a/mm/sparse.c b/mm/sparse.c
>index d1d5e05f5b8d..1552c855d62a 100644
>--- a/mm/sparse.c
>+++ b/mm/sparse.c
>@@ -800,8 +800,8 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap,
> 		free_map_bootmem(memmap);
> }
> 
>-void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
>-		unsigned long map_offset, struct vmem_altmap *altmap)
>+void sparse_remove_one_section(struct mem_section *ms, unsigned long map_offset,
>+			       struct vmem_altmap *altmap)
> {
> 	struct page *memmap = NULL;
> 	unsigned long *usemap = NULL;
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-06-05 10:58       ` David Hildenbrand
@ 2019-06-05 21:22         ` Wei Yang
  2019-06-05 21:50           ` David Hildenbrand
  0 siblings, 1 reply; 68+ messages in thread
From: Wei Yang @ 2019-06-05 21:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-mm, linux-kernel, linux-ia64, linuxppc-dev,
	linux-s390, linux-sh, linux-arm-kernel, akpm, Dan Williams,
	Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	mike.travis, Ingo Molnar, Andrew Banman, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre

On Wed, Jun 05, 2019 at 12:58:46PM +0200, David Hildenbrand wrote:
>On 05.06.19 10:58, David Hildenbrand wrote:
>>>> /*
>>>>  * For now, we have a linear search to go find the appropriate
>>>>  * memory_block corresponding to a particular phys_index. If
>>>> @@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block **memory, int block_id,
>>>> 	unsigned long start_pfn;
>>>> 	int ret = 0;
>>>>
>>>> +	mem = find_memory_block_by_id(block_id, NULL);
>>>> +	if (mem) {
>>>> +		put_device(&mem->dev);
>>>> +		return -EEXIST;
>>>> +	}
>>>
>>> find_memory_block_by_id() is not that close to the main idea in this patch.
>>> Would it be better to split this part?
>> 
>> I played with that but didn't like the temporary results (e.g. having to
>> export find_memory_block_by_id()). I'll stick to this for now.
>> 
>>>
>>>> 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>>> 	if (!mem)
>>>> 		return -ENOMEM;
>>>> @@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
>>>> 	return 0;
>>>> }
>>>>
>>>> +static void unregister_memory(struct memory_block *memory)
>>>> +{
>>>> +	if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
>>>> +		return;
>>>> +
>>>> +	/* drop the ref. we got via find_memory_block() */
>>>> +	put_device(&memory->dev);
>>>> +	device_unregister(&memory->dev);
>>>> +}
>>>> +
>>>> /*
>>>> - * need an interface for the VM to add new memory regions,
>>>> - * but without onlining it.
>>>> + * Create memory block devices for the given memory area. Start and size
>>>> + * have to be aligned to memory block granularity. Memory block devices
>>>> + * will be initialized as offline.
>>>>  */
>>>> -int hotplug_memory_register(int nid, struct mem_section *section)
>>>> +int create_memory_block_devices(unsigned long start, unsigned long size)
>>>> {
>>>> -	int block_id = base_memory_block_id(__section_nr(section));
>>>> -	int ret = 0;
>>>> +	const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>>>> +	int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>>>> 	struct memory_block *mem;
>>>> +	unsigned long block_id;
>>>> +	int ret = 0;
>>>>
>>>> -	mutex_lock(&mem_sysfs_mutex);
>>>> +	if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>>>> +			 !IS_ALIGNED(size, memory_block_size_bytes())))
>>>> +		return -EINVAL;
>>>>
>>>> -	mem = find_memory_block(section);
>>>> -	if (mem) {
>>>> -		mem->section_count++;
>>>> -		put_device(&mem->dev);
>>>> -	} else {
>>>> +	mutex_lock(&mem_sysfs_mutex);
>>>> +	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>>>> 		ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
>>>> 		if (ret)
>>>> -			goto out;
>>>> -		mem->section_count++;
>>>> +			break;
>>>> +		mem->section_count = sections_per_block;
>>>> +	}
>>>> +	if (ret) {
>>>> +		end_block_id = block_id;
>>>> +		for (block_id = start_block_id; block_id != end_block_id;
>>>> +		     block_id++) {
>>>> +			mem = find_memory_block_by_id(block_id, NULL);
>>>> +			mem->section_count = 0;
>>>> +			unregister_memory(mem);
>>>> +		}
>>>> 	}
>>>
>>> Would it be better to do this in reverse order?
>>>
>>> And unregister_memory() would free mem, so it is still necessary to set
>>> section_count to 0?
>> 
>> 1. I kept the existing behavior (setting it to 0) for now. I am planning
>> to eventually remove the section count completely (it could be
>> beneficial to detect removing of partially populated memory blocks).
>
>Correction: We already use it to block offlining of partially populated
>memory blocks \o/

Would you mind letting me know where we leverage this?

>
>> 
>> 2. Reverse order: We would have to start with "block_id - 1", I don't
>> like that better.
>> 
>> Thanks for having a look!
>> 
>
>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-06-05 21:22         ` Wei Yang
@ 2019-06-05 21:50           ` David Hildenbrand
  0 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-06-05 21:50 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Igor Mammedov,
	Greg Kroah-Hartman, Rafael J. Wysocki, mike.travis, Ingo Molnar,
	Andrew Banman, Oscar Salvador, Michal Hocko, Pavel Tatashin,
	Qian Cai, Arun KS, Mathieu Malaterre

On 05.06.19 23:22, Wei Yang wrote:
> On Wed, Jun 05, 2019 at 12:58:46PM +0200, David Hildenbrand wrote:
>> On 05.06.19 10:58, David Hildenbrand wrote:
>>>>> /*
>>>>>  * For now, we have a linear search to go find the appropriate
>>>>>  * memory_block corresponding to a particular phys_index. If
>>>>> @@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block **memory, int block_id,
>>>>> 	unsigned long start_pfn;
>>>>> 	int ret = 0;
>>>>>
>>>>> +	mem = find_memory_block_by_id(block_id, NULL);
>>>>> +	if (mem) {
>>>>> +		put_device(&mem->dev);
>>>>> +		return -EEXIST;
>>>>> +	}
>>>>
>>>> find_memory_block_by_id() is not that close to the main idea in this patch.
>>>> Would it be better to split this part?
>>>
>>> I played with that but didn't like the temporary results (e.g. having to
>>> export find_memory_block_by_id()). I'll stick to this for now.
>>>
>>>>
>>>>> 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>>>> 	if (!mem)
>>>>> 		return -ENOMEM;
>>>>> @@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
>>>>> 	return 0;
>>>>> }
>>>>>
>>>>> +static void unregister_memory(struct memory_block *memory)
>>>>> +{
>>>>> +	if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
>>>>> +		return;
>>>>> +
>>>>> +	/* drop the ref. we got via find_memory_block() */
>>>>> +	put_device(&memory->dev);
>>>>> +	device_unregister(&memory->dev);
>>>>> +}
>>>>> +
>>>>> /*
>>>>> - * need an interface for the VM to add new memory regions,
>>>>> - * but without onlining it.
>>>>> + * Create memory block devices for the given memory area. Start and size
>>>>> + * have to be aligned to memory block granularity. Memory block devices
>>>>> + * will be initialized as offline.
>>>>>  */
>>>>> -int hotplug_memory_register(int nid, struct mem_section *section)
>>>>> +int create_memory_block_devices(unsigned long start, unsigned long size)
>>>>> {
>>>>> -	int block_id = base_memory_block_id(__section_nr(section));
>>>>> -	int ret = 0;
>>>>> +	const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>>>>> +	int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>>>>> 	struct memory_block *mem;
>>>>> +	unsigned long block_id;
>>>>> +	int ret = 0;
>>>>>
>>>>> -	mutex_lock(&mem_sysfs_mutex);
>>>>> +	if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>>>>> +			 !IS_ALIGNED(size, memory_block_size_bytes())))
>>>>> +		return -EINVAL;
>>>>>
>>>>> -	mem = find_memory_block(section);
>>>>> -	if (mem) {
>>>>> -		mem->section_count++;
>>>>> -		put_device(&mem->dev);
>>>>> -	} else {
>>>>> +	mutex_lock(&mem_sysfs_mutex);
>>>>> +	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>>>>> 		ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
>>>>> 		if (ret)
>>>>> -			goto out;
>>>>> -		mem->section_count++;
>>>>> +			break;
>>>>> +		mem->section_count = sections_per_block;
>>>>> +	}
>>>>> +	if (ret) {
>>>>> +		end_block_id = block_id;
>>>>> +		for (block_id = start_block_id; block_id != end_block_id;
>>>>> +		     block_id++) {
>>>>> +			mem = find_memory_block_by_id(block_id, NULL);
>>>>> +			mem->section_count = 0;
>>>>> +			unregister_memory(mem);
>>>>> +		}
>>>>> 	}
>>>>
>>>> Would it be better to do this in reverse order?
>>>>
>>>> And unregister_memory() would free mem, so it is still necessary to set
>>>> section_count to 0?
>>>
>>> 1. I kept the existing behavior (setting it to 0) for now. I am planning
>>> to eventually remove the section count completely (it could be
>>> beneficial to detect removing of partially populated memory blocks).
>>
>> Correction: We already use it to block offlining of partially populated
>> memory blocks \o/
> 
> Would you mind letting me know where we leverage this?

Sure:

drivers/base/memory.c:memory_subsys_offline()

if (mem->section_count != sections_per_block)
	return -EINVAL;

I would have expected such checks in the offline_pages() function instead.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  2019-05-27 11:11 ` [PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
  2019-05-30 17:53   ` Pavel Tatashin
@ 2019-06-10 16:46   ` Oscar Salvador
  2019-07-01  7:42   ` Michal Hocko
  2 siblings, 0 replies; 68+ messages in thread
From: Oscar Salvador @ 2019-06-10 16:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre, Wei Yang

On Mon, May 27, 2019 at 01:11:42PM +0200, David Hildenbrand wrote:
> By converting start and size to page granularity, we actually ignore
> unaligned parts within a page instead of properly bailing out with an
> error.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-05-27 11:11 ` [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
  2019-06-05 21:21   ` Wei Yang
@ 2019-06-10 16:56   ` Oscar Salvador
  2019-07-01  8:51   ` Michal Hocko
  2 siblings, 0 replies; 68+ messages in thread
From: Oscar Salvador @ 2019-06-10 16:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Jonathan Cameron

On Mon, May 27, 2019 at 01:11:51PM +0200, David Hildenbrand wrote:
> We really don't want anything during memory hotunplug to fail.
> We always pass a valid memory block device, that check can go. Avoid
> allocating memory and eventually failing. As we are always called under
> lock, we can use a static piece of memory. This avoids having to put
> the structure onto the stack, having to guess about the stack size
> of callers.
> 
> Patch inspired by a patch from Oscar Salvador.
> 
> In the future, there might be no need to iterate over nodes at all.
> mem->nid should tell us exactly what to remove. Memory block devices
> with mixed nodes (added during boot) should properly fenced off and never
> removed.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section
  2019-05-27 11:11 ` [PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
  2019-06-05 21:21   ` Wei Yang
@ 2019-06-10 16:58   ` Oscar Salvador
  2019-07-01  8:52   ` Michal Hocko
  2 siblings, 0 replies; 68+ messages in thread
From: Oscar Salvador @ 2019-06-10 16:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov

On Mon, May 27, 2019 at 01:11:52PM +0200, David Hildenbrand wrote:
> The parameter is unused, so let's drop it. Memory removal paths should
> never care about zones. This is the job of memory offlining and will
> require more refactorings.
> 
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory()
  2019-05-27 11:11 ` [PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory() David Hildenbrand
@ 2019-06-10 17:07   ` Oscar Salvador
  2019-07-01  7:43   ` Michal Hocko
  1 sibling, 0 replies; 68+ messages in thread
From: Oscar Salvador @ 2019-06-10 17:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Martin Schwidefsky, Heiko Carstens, Michal Hocko,
	Mike Rapoport, Vasily Gorbik, Oscar Salvador

On Mon, May 27, 2019 at 01:11:43PM +0200, David Hildenbrand wrote:
> ZONE_DEVICE is not yet supported, fail if an altmap is passed, so we
> don't forget arch_add_memory()/arch_remove_memory() when unlocking
> support.
> 
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  2019-05-27 11:11 ` [PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
  2019-05-30 17:53   ` Pavel Tatashin
  2019-06-10 16:46   ` Oscar Salvador
@ 2019-07-01  7:42   ` Michal Hocko
  2 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2019-07-01  7:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Oscar Salvador, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre, Wei Yang

[Sorry for a really late response]

On Mon 27-05-19 13:11:42, David Hildenbrand wrote:
> By converting start and size to page granularity, we actually ignore
> unaligned parts within a page instead of properly bailing out with an
> error.

I do not expect any code path would ever provide an unaligned address
and even if it did then rounding that to a pfn doesn't sound like a
terrible thing to do. Anyway this removes few lines so why not.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memory_hotplug.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index e096c987d261..762887b2358b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1051,16 +1051,11 @@ int try_online_node(int nid)
>  
>  static int check_hotplug_memory_range(u64 start, u64 size)
>  {
> -	unsigned long block_sz = memory_block_size_bytes();
> -	u64 block_nr_pages = block_sz >> PAGE_SHIFT;
> -	u64 nr_pages = size >> PAGE_SHIFT;
> -	u64 start_pfn = PFN_DOWN(start);
> -
>  	/* memory range must be block size aligned */
> -	if (!nr_pages || !IS_ALIGNED(start_pfn, block_nr_pages) ||
> -	    !IS_ALIGNED(nr_pages, block_nr_pages)) {
> +	if (!size || !IS_ALIGNED(start, memory_block_size_bytes()) ||
> +	    !IS_ALIGNED(size, memory_block_size_bytes())) {
>  		pr_err("Block size [%#lx] unaligned hotplug range: start %#llx, size %#llx",
> -		       block_sz, start, size);
> +		       memory_block_size_bytes(), start, size);
>  		return -EINVAL;
>  	}
>  
> -- 
> 2.20.1
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory()
  2019-05-27 11:11 ` [PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory() David Hildenbrand
  2019-06-10 17:07   ` Oscar Salvador
@ 2019-07-01  7:43   ` Michal Hocko
  2019-07-01 12:46     ` Michal Hocko
  1 sibling, 1 reply; 68+ messages in thread
From: Michal Hocko @ 2019-07-01  7:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Martin Schwidefsky, Heiko Carstens, Mike Rapoport,
	Vasily Gorbik, Oscar Salvador

On Mon 27-05-19 13:11:43, David Hildenbrand wrote:
> ZONE_DEVICE is not yet supported, fail if an altmap is passed, so we
> don't forget arch_add_memory()/arch_remove_memory() when unlocking
> support.

Why do we need this? Sure ZONE_DEVICE is not supported for s390 and so
might be the case for other arches which support hotplug. I do not see
much point in adding warning to each of them.

> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/mm/init.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 14d1eae9fe43..d552e330fbcc 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -226,6 +226,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
>  	unsigned long size_pages = PFN_DOWN(size);
>  	int rc;
>  
> +	if (WARN_ON_ONCE(restrictions->altmap))
> +		return -EINVAL;
> +
>  	rc = vmem_add_mapping(start, size);
>  	if (rc)
>  		return rc;
> -- 
> 2.20.1
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 03/11] s390x/mm: Implement arch_remove_memory()
  2019-05-27 11:11 ` [PATCH v3 03/11] s390x/mm: Implement arch_remove_memory() David Hildenbrand
@ 2019-07-01  7:45   ` Michal Hocko
  2019-07-01 12:47     ` Michal Hocko
  0 siblings, 1 reply; 68+ messages in thread
From: Michal Hocko @ 2019-07-01  7:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Martin Schwidefsky, Heiko Carstens, Mike Rapoport,
	Vasily Gorbik, Oscar Salvador

On Mon 27-05-19 13:11:44, David Hildenbrand wrote:
> Will come in handy when wanting to handle errors after
> arch_add_memory().

I do not understand this. Why do you add a code for something that is
not possible on this HW (based on the comment - is it still valid btw?)

> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/mm/init.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index d552e330fbcc..14955e0a9fcf 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -243,12 +243,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
>  void arch_remove_memory(int nid, u64 start, u64 size,
>  			struct vmem_altmap *altmap)
>  {
> -	/*
> -	 * There is no hardware or firmware interface which could trigger a
> -	 * hot memory remove on s390. So there is nothing that needs to be
> -	 * implemented.
> -	 */
> -	BUG();
> +	unsigned long start_pfn = start >> PAGE_SHIFT;
> +	unsigned long nr_pages = size >> PAGE_SHIFT;
> +	struct zone *zone;
> +
> +	zone = page_zone(pfn_to_page(start_pfn));
> +	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +	vmem_remove_mapping(start, size);
>  }
>  #endif
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> -- 
> 2.20.1
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 05/11] drivers/base/memory: Pass a block_id to init_memory_block()
  2019-05-27 11:11 ` [PATCH v3 05/11] drivers/base/memory: Pass a block_id to init_memory_block() David Hildenbrand
  2019-06-03 21:49   ` Wei Yang
@ 2019-07-01  7:56   ` Michal Hocko
  1 sibling, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2019-07-01  7:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki

On Mon 27-05-19 13:11:46, David Hildenbrand wrote:
> We'll rework hotplug_memory_register() shortly, so it no longer consumes
> pass a section.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  drivers/base/memory.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index f180427e48f4..f914fa6fe350 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -651,21 +651,18 @@ int register_memory(struct memory_block *memory)
>  	return ret;
>  }
>  
> -static int init_memory_block(struct memory_block **memory,
> -			     struct mem_section *section, unsigned long state)
> +static int init_memory_block(struct memory_block **memory, int block_id,
> +			     unsigned long state)
>  {
>  	struct memory_block *mem;
>  	unsigned long start_pfn;
> -	int scn_nr;
>  	int ret = 0;
>  
>  	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>  	if (!mem)
>  		return -ENOMEM;
>  
> -	scn_nr = __section_nr(section);
> -	mem->start_section_nr =
> -			base_memory_block_id(scn_nr) * sections_per_block;
> +	mem->start_section_nr = block_id * sections_per_block;
>  	mem->end_section_nr = mem->start_section_nr + sections_per_block - 1;
>  	mem->state = state;
>  	start_pfn = section_nr_to_pfn(mem->start_section_nr);
> @@ -694,7 +691,8 @@ static int add_memory_block(int base_section_nr)
>  
>  	if (section_count == 0)
>  		return 0;
> -	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
> +	ret = init_memory_block(&mem, base_memory_block_id(base_section_nr),
> +				MEM_ONLINE);
>  	if (ret)
>  		return ret;
>  	mem->section_count = section_count;
> @@ -707,6 +705,7 @@ static int add_memory_block(int base_section_nr)
>   */
>  int hotplug_memory_register(int nid, struct mem_section *section)
>  {
> +	int block_id = base_memory_block_id(__section_nr(section));
>  	int ret = 0;
>  	struct memory_block *mem;
>  
> @@ -717,7 +716,7 @@ int hotplug_memory_register(int nid, struct mem_section *section)
>  		mem->section_count++;
>  		put_device(&mem->dev);
>  	} else {
> -		ret = init_memory_block(&mem, section, MEM_OFFLINE);
> +		ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
>  		if (ret)
>  			goto out;
>  		mem->section_count++;
> -- 
> 2.20.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE
  2019-05-27 11:11 ` [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE David Hildenbrand
  2019-05-30 17:56   ` Pavel Tatashin
  2019-06-03 22:15   ` Wei Yang
@ 2019-07-01  8:01   ` Michal Hocko
  2019-07-01 12:51     ` Michal Hocko
  2 siblings, 1 reply; 68+ messages in thread
From: Michal Hocko @ 2019-07-01  8:01 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Tony Luck, Fenghua Yu, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Greg Kroah-Hartman,
	Rafael J. Wysocki, Mike Rapoport, Oscar Salvador,
	Kirill A. Shutemov, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Christophe Leroy, Nicholas Piggin, Vasily Gorbik,
	Rob Herring, Masahiro Yamada, mike.travis, Andrew Banman,
	Pavel Tatashin, Wei Yang, Arun KS, Qian Cai, Mathieu Malaterre,
	Baoquan He, Logan Gunthorpe, Anshuman Khandual

On Mon 27-05-19 13:11:47, David Hildenbrand wrote:
> We want to improve error handling while adding memory by allowing
> to use arch_remove_memory() and __remove_pages() even if
> CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
> 
> 	arch_add_memory()
> 	rc = do_something();
> 	if (rc) {
> 		arch_remove_memory();
> 	}
> 
> We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
> quite some dependencies for memory offlining.

If we cannot really remove CONFIG_MEMORY_HOTREMOVE altogether then why
not simply add an empty placeholder for arch_remove_memory when the
config is disabled?
 
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: Rich Felker <dalias@libc.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Wei Yang <richardw.yang@linux.intel.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Mathieu Malaterre <malat@debian.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/arm64/mm/mmu.c            | 2 --
>  arch/ia64/mm/init.c            | 2 --
>  arch/powerpc/mm/mem.c          | 2 --
>  arch/s390/mm/init.c            | 2 --
>  arch/sh/mm/init.c              | 2 --
>  arch/x86/mm/init_32.c          | 2 --
>  arch/x86/mm/init_64.c          | 2 --
>  drivers/base/memory.c          | 2 --
>  include/linux/memory.h         | 2 --
>  include/linux/memory_hotplug.h | 2 --
>  mm/memory_hotplug.c            | 2 --
>  mm/sparse.c                    | 6 ------
>  12 files changed, 28 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index e569a543c384..9ccd7539f2d4 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1084,7 +1084,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>  	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
>  			   restrictions);
>  }
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void arch_remove_memory(int nid, u64 start, u64 size,
>  			struct vmem_altmap *altmap)
>  {
> @@ -1103,4 +1102,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  	__remove_pages(zone, start_pfn, nr_pages, altmap);
>  }
>  #endif
> -#endif
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index d28e29103bdb..aae75fd7b810 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -681,7 +681,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>  	return ret;
>  }
>  
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void arch_remove_memory(int nid, u64 start, u64 size,
>  			struct vmem_altmap *altmap)
>  {
> @@ -693,4 +692,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  	__remove_pages(zone, start_pfn, nr_pages, altmap);
>  }
>  #endif
> -#endif
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index e885fe2aafcc..e4bc2dc3f593 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -130,7 +130,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
>  	return __add_pages(nid, start_pfn, nr_pages, restrictions);
>  }
>  
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void __ref arch_remove_memory(int nid, u64 start, u64 size,
>  			     struct vmem_altmap *altmap)
>  {
> @@ -164,7 +163,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
>  		pr_warn("Hash collision while resizing HPT\n");
>  }
>  #endif
> -#endif /* CONFIG_MEMORY_HOTPLUG */
>  
>  #ifndef CONFIG_NEED_MULTIPLE_NODES
>  void __init mem_topology_setup(void)
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 14955e0a9fcf..ffb81fe95c77 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -239,7 +239,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>  	return rc;
>  }
>  
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void arch_remove_memory(int nid, u64 start, u64 size,
>  			struct vmem_altmap *altmap)
>  {
> @@ -251,5 +250,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  	__remove_pages(zone, start_pfn, nr_pages, altmap);
>  	vmem_remove_mapping(start, size);
>  }
> -#endif
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> index 13c6a6bb5fd9..dfdbaa50946e 100644
> --- a/arch/sh/mm/init.c
> +++ b/arch/sh/mm/init.c
> @@ -429,7 +429,6 @@ int memory_add_physaddr_to_nid(u64 addr)
>  EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
>  #endif
>  
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void arch_remove_memory(int nid, u64 start, u64 size,
>  			struct vmem_altmap *altmap)
>  {
> @@ -440,5 +439,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  	zone = page_zone(pfn_to_page(start_pfn));
>  	__remove_pages(zone, start_pfn, nr_pages, altmap);
>  }
> -#endif
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index f265a4316179..4068abb9427f 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -860,7 +860,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>  	return __add_pages(nid, start_pfn, nr_pages, restrictions);
>  }
>  
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void arch_remove_memory(int nid, u64 start, u64 size,
>  			struct vmem_altmap *altmap)
>  {
> @@ -872,7 +871,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>  	__remove_pages(zone, start_pfn, nr_pages, altmap);
>  }
>  #endif
> -#endif
>  
>  int kernel_set_to_readonly __read_mostly;
>  
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 693aaf28d5fe..8335ac6e1112 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1196,7 +1196,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
>  	remove_pagetable(start, end, false, altmap);
>  }
>  
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void __meminit
>  kernel_physical_mapping_remove(unsigned long start, unsigned long end)
>  {
> @@ -1221,7 +1220,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
>  	__remove_pages(zone, start_pfn, nr_pages, altmap);
>  	kernel_physical_mapping_remove(start, start + size);
>  }
> -#endif
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>  
>  static struct kcore_list kcore_vsyscall;
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index f914fa6fe350..ac17c95a5f28 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -727,7 +727,6 @@ int hotplug_memory_register(int nid, struct mem_section *section)
>  	return ret;
>  }
>  
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void
>  unregister_memory(struct memory_block *memory)
>  {
> @@ -766,7 +765,6 @@ void unregister_memory_section(struct mem_section *section)
>  out_unlock:
>  	mutex_unlock(&mem_sysfs_mutex);
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  /* return true if the memory block is offlined, otherwise, return false */
>  bool is_memblock_offlined(struct memory_block *mem)
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index e1dc1bb2b787..474c7c60c8f2 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -112,9 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
>  extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>  int hotplug_memory_register(int nid, struct mem_section *section);
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  extern void unregister_memory_section(struct mem_section *);
> -#endif
>  extern int memory_dev_init(void);
>  extern int memory_notify(unsigned long val, void *v);
>  extern int memory_isolate_notify(unsigned long val, void *v);
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index ae892eef8b82..2d4de313926d 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -123,12 +123,10 @@ static inline bool movable_node_is_enabled(void)
>  	return movable_node_enabled;
>  }
>  
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  extern void arch_remove_memory(int nid, u64 start, u64 size,
>  			       struct vmem_altmap *altmap);
>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
>  			   unsigned long nr_pages, struct vmem_altmap *altmap);
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  /*
>   * Do we want sysfs memblock files created. This will allow userspace to online
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 762887b2358b..4b9d2974f86c 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -318,7 +318,6 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>  	return err;
>  }
>  
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  /* find the smallest valid pfn in the range [start_pfn, end_pfn) */
>  static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
>  				     unsigned long start_pfn,
> @@ -582,7 +581,6 @@ void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>  
>  	set_zone_contiguous(zone);
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  int set_online_page_callback(online_page_callback_t callback)
>  {
> diff --git a/mm/sparse.c b/mm/sparse.c
> index fd13166949b5..d1d5e05f5b8d 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -604,7 +604,6 @@ static void __kfree_section_memmap(struct page *memmap,
>  
>  	vmemmap_free(start, end, altmap);
>  }
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void free_map_bootmem(struct page *memmap)
>  {
>  	unsigned long start = (unsigned long)memmap;
> @@ -612,7 +611,6 @@ static void free_map_bootmem(struct page *memmap)
>  
>  	vmemmap_free(start, end, NULL);
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>  #else
>  static struct page *__kmalloc_section_memmap(void)
>  {
> @@ -651,7 +649,6 @@ static void __kfree_section_memmap(struct page *memmap,
>  			   get_order(sizeof(struct page) * PAGES_PER_SECTION));
>  }
>  
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void free_map_bootmem(struct page *memmap)
>  {
>  	unsigned long maps_section_nr, removing_section_nr, i;
> @@ -681,7 +678,6 @@ static void free_map_bootmem(struct page *memmap)
>  			put_page_bootmem(page);
>  	}
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>  
>  /**
> @@ -746,7 +742,6 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
>  	return ret;
>  }
>  
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  #ifdef CONFIG_MEMORY_FAILURE
>  static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
>  {
> @@ -823,5 +818,4 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
>  			PAGES_PER_SECTION - map_offset);
>  	free_section_usemap(memmap, usemap, altmap);
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> -- 
> 2.20.1
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-27 11:11 ` [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
  2019-05-30 21:07   ` Pavel Tatashin
  2019-06-04 21:42   ` Wei Yang
@ 2019-07-01  8:14   ` Michal Hocko
  2 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2019-07-01  8:14 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	mike.travis, Ingo Molnar, Andrew Banman, Oscar Salvador,
	Pavel Tatashin, Qian Cai, Arun KS, Mathieu Malaterre

On Mon 27-05-19 13:11:48, David Hildenbrand wrote:
> Only memory to be added to the buddy and to be onlined/offlined by
> user space using /sys/devices/system/memory/... needs (and should have!)
> memory block devices.
> 
> Factor out creation of memory block devices. Create all devices after
> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
> because it is now effectively stale.
> 
> Only after memory block devices have been added, memory can be onlined
> by user space. This implies, that memory is not visible to user space at
> all before arch_add_memory() succeeded.

I like the memblock API to go away from the low level hotplug handling.
The current implementation is just too convoluted and I remember I was
fighting with subtle expectations wired deep in call chains when
touching that code in the past (some memblocks didn't get created etc.).
Maybe those have been addressed in the meantime.

> While at it
> - use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory()

This would better be a separate patch with an explanation

> - introduce find_memory_block_by_id() to search via block id
> - Use find_memory_block_by_id() in init_memory_block() to catch
>   duplicates
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Other than that looks good to me.
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  drivers/base/memory.c  | 82 +++++++++++++++++++++++++++---------------
>  include/linux/memory.h |  2 +-
>  mm/memory_hotplug.c    | 15 ++++----
>  3 files changed, 63 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index ac17c95a5f28..5a0370f0c506 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -39,6 +39,11 @@ static inline int base_memory_block_id(int section_nr)
>  	return section_nr / sections_per_block;
>  }
>  
> +static inline int pfn_to_block_id(unsigned long pfn)
> +{
> +	return base_memory_block_id(pfn_to_section_nr(pfn));
> +}
> +
>  static int memory_subsys_online(struct device *dev);
>  static int memory_subsys_offline(struct device *dev);
>  
> @@ -582,10 +587,9 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
>   * A reference for the returned object is held and the reference for the
>   * hinted object is released.
>   */
> -struct memory_block *find_memory_block_hinted(struct mem_section *section,
> -					      struct memory_block *hint)
> +static struct memory_block *find_memory_block_by_id(int block_id,
> +						    struct memory_block *hint)
>  {
> -	int block_id = base_memory_block_id(__section_nr(section));
>  	struct device *hintdev = hint ? &hint->dev : NULL;
>  	struct device *dev;
>  
> @@ -597,6 +601,14 @@ struct memory_block *find_memory_block_hinted(struct mem_section *section,
>  	return to_memory_block(dev);
>  }
>  
> +struct memory_block *find_memory_block_hinted(struct mem_section *section,
> +					      struct memory_block *hint)
> +{
> +	int block_id = base_memory_block_id(__section_nr(section));
> +
> +	return find_memory_block_by_id(block_id, hint);
> +}
> +
>  /*
>   * For now, we have a linear search to go find the appropriate
>   * memory_block corresponding to a particular phys_index. If
> @@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block **memory, int block_id,
>  	unsigned long start_pfn;
>  	int ret = 0;
>  
> +	mem = find_memory_block_by_id(block_id, NULL);
> +	if (mem) {
> +		put_device(&mem->dev);
> +		return -EEXIST;
> +	}
>  	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>  	if (!mem)
>  		return -ENOMEM;
> @@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
>  	return 0;
>  }
>  
> +static void unregister_memory(struct memory_block *memory)
> +{
> +	if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
> +		return;
> +
> +	/* drop the ref. we got via find_memory_block() */
> +	put_device(&memory->dev);
> +	device_unregister(&memory->dev);
> +}
> +
>  /*
> - * need an interface for the VM to add new memory regions,
> - * but without onlining it.
> + * Create memory block devices for the given memory area. Start and size
> + * have to be aligned to memory block granularity. Memory block devices
> + * will be initialized as offline.
>   */
> -int hotplug_memory_register(int nid, struct mem_section *section)
> +int create_memory_block_devices(unsigned long start, unsigned long size)
>  {
> -	int block_id = base_memory_block_id(__section_nr(section));
> -	int ret = 0;
> +	const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
> +	int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>  	struct memory_block *mem;
> +	unsigned long block_id;
> +	int ret = 0;
>  
> -	mutex_lock(&mem_sysfs_mutex);
> +	if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
> +			 !IS_ALIGNED(size, memory_block_size_bytes())))
> +		return -EINVAL;
>  
> -	mem = find_memory_block(section);
> -	if (mem) {
> -		mem->section_count++;
> -		put_device(&mem->dev);
> -	} else {
> +	mutex_lock(&mem_sysfs_mutex);
> +	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>  		ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
>  		if (ret)
> -			goto out;
> -		mem->section_count++;
> +			break;
> +		mem->section_count = sections_per_block;
> +	}
> +	if (ret) {
> +		end_block_id = block_id;
> +		for (block_id = start_block_id; block_id != end_block_id;
> +		     block_id++) {
> +			mem = find_memory_block_by_id(block_id, NULL);
> +			mem->section_count = 0;
> +			unregister_memory(mem);
> +		}
>  	}
> -
> -out:
>  	mutex_unlock(&mem_sysfs_mutex);
>  	return ret;
>  }
>  
> -static void
> -unregister_memory(struct memory_block *memory)
> -{
> -	BUG_ON(memory->dev.bus != &memory_subsys);
> -
> -	/* drop the ref. we got via find_memory_block() */
> -	put_device(&memory->dev);
> -	device_unregister(&memory->dev);
> -}
> -
>  void unregister_memory_section(struct mem_section *section)
>  {
>  	struct memory_block *mem;
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index 474c7c60c8f2..db3e8567f900 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -111,7 +111,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
>  extern void unregister_memory_notifier(struct notifier_block *nb);
>  extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
> -int hotplug_memory_register(int nid, struct mem_section *section);
> +int create_memory_block_devices(unsigned long start, unsigned long size);
>  extern void unregister_memory_section(struct mem_section *);
>  extern int memory_dev_init(void);
>  extern int memory_notify(unsigned long val, void *v);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 4b9d2974f86c..b1fde90bbf19 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -259,13 +259,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>  		return -EEXIST;
>  
>  	ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
> -	if (ret < 0)
> -		return ret;
> -
> -	if (!want_memblock)
> -		return 0;
> -
> -	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
> +	return ret < 0 ? ret : 0;
>  }
>  
>  /*
> @@ -1107,6 +1101,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
>  	if (ret < 0)
>  		goto error;
>  
> +	/* create memory block devices after memory was added */
> +	ret = create_memory_block_devices(start, size);
> +	if (ret) {
> +		arch_remove_memory(nid, start, size, NULL);
> +		goto error;
> +	}
> +
>  	if (new_node) {
>  		/* If sysfs file of new node can't be created, cpu on the node
>  		 * can't be hot-added. There is no rollback way now.
> -- 
> 2.20.1
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 08/11] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-27 11:11 ` [PATCH v3 08/11] mm/memory_hotplug: Drop MHP_MEMBLOCK_API David Hildenbrand
  2019-06-04 21:47   ` Wei Yang
@ 2019-07-01  8:15   ` Michal Hocko
  1 sibling, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2019-07-01  8:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Oscar Salvador, Pavel Tatashin, Joonsoo Kim,
	Qian Cai, Arun KS, Mathieu Malaterre

On Mon 27-05-19 13:11:49, David Hildenbrand wrote:
> No longer needed, the callers of arch_add_memory() can handle this
> manually.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/memory_hotplug.h | 8 --------
>  mm/memory_hotplug.c            | 9 +++------
>  2 files changed, 3 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 2d4de313926d..2f1f87e13baa 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
>  			   unsigned long nr_pages, struct vmem_altmap *altmap);
>  
> -/*
> - * Do we want sysfs memblock files created. This will allow userspace to online
> - * and offline memory explicitly. Lack of this bit means that the caller has to
> - * call move_pfn_range_to_zone to finish the initialization.
> - */
> -
> -#define MHP_MEMBLOCK_API               (1<<0)
> -
>  /* reasonably generic interface to expand the physical pages */
>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>  		       struct mhp_restrictions *restrictions);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index b1fde90bbf19..9a92549ef23b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -251,7 +251,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
>  #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>  
>  static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> -		struct vmem_altmap *altmap, bool want_memblock)
> +				   struct vmem_altmap *altmap)
>  {
>  	int ret;
>  
> @@ -294,8 +294,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>  	}
>  
>  	for (i = start_sec; i <= end_sec; i++) {
> -		err = __add_section(nid, section_nr_to_pfn(i), altmap,
> -				restrictions->flags & MHP_MEMBLOCK_API);
> +		err = __add_section(nid, section_nr_to_pfn(i), altmap);
>  
>  		/*
>  		 * EEXIST is finally dealt with by ioresource collision
> @@ -1067,9 +1066,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>   */
>  int __ref add_memory_resource(int nid, struct resource *res)
>  {
> -	struct mhp_restrictions restrictions = {
> -		.flags = MHP_MEMBLOCK_API,
> -	};
> +	struct mhp_restrictions restrictions = {};
>  	u64 start, size;
>  	bool new_node = false;
>  	int ret;
> -- 
> 2.20.1
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
  2019-05-27 11:11 ` [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
  2019-06-04 22:07   ` Wei Yang
@ 2019-07-01  8:41   ` Michal Hocko
  2019-07-15 10:58     ` David Hildenbrand
  1 sibling, 1 reply; 68+ messages in thread
From: Michal Hocko @ 2019-07-01  8:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	mike.travis, Andrew Banman, Ingo Molnar, Alex Deucher,
	David S. Miller, Mark Brown, Chris Wilson, Oscar Salvador,
	Jonathan Cameron, Pavel Tatashin, Arun KS, Mathieu Malaterre

On Mon 27-05-19 13:11:50, David Hildenbrand wrote:
> Let's factor out removing of memory block devices, which is only
> necessary for memory added via add_memory() and friends that created
> memory block devices. Remove the devices before calling
> arch_remove_memory().
> 
> This finishes factoring out memory block device handling from
> arch_add_memory() and arch_remove_memory().

OK, this makes sense again. Just a nit. Calling find_memory_block_by_id
for each memory block looks a bit suboptimal, especially when we are
removing consequent physical memblocks. I have to confess that I do not
know how expensive is the search and I also expect that there won't be
that many memblocks in the removed range anyway as large setups have
large memblocks.

> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Other than that looks good to me.
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  drivers/base/memory.c  | 37 ++++++++++++++++++-------------------
>  drivers/base/node.c    | 11 ++++++-----
>  include/linux/memory.h |  2 +-
>  include/linux/node.h   |  6 ++----
>  mm/memory_hotplug.c    |  5 +++--
>  5 files changed, 30 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 5a0370f0c506..f28efb0bf5c7 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -763,32 +763,31 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
>  	return ret;
>  }
>  
> -void unregister_memory_section(struct mem_section *section)
> +/*
> + * Remove memory block devices for the given memory area. Start and size
> + * have to be aligned to memory block granularity. Memory block devices
> + * have to be offline.
> + */
> +void remove_memory_block_devices(unsigned long start, unsigned long size)
>  {
> +	const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
> +	const int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>  	struct memory_block *mem;
> +	int block_id;
>  
> -	if (WARN_ON_ONCE(!present_section(section)))
> +	if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
> +			 !IS_ALIGNED(size, memory_block_size_bytes())))
>  		return;
>  
>  	mutex_lock(&mem_sysfs_mutex);
> -
> -	/*
> -	 * Some users of the memory hotplug do not want/need memblock to
> -	 * track all sections. Skip over those.
> -	 */
> -	mem = find_memory_block(section);
> -	if (!mem)
> -		goto out_unlock;
> -
> -	unregister_mem_sect_under_nodes(mem, __section_nr(section));
> -
> -	mem->section_count--;
> -	if (mem->section_count == 0)
> +	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
> +		mem = find_memory_block_by_id(block_id, NULL);
> +		if (WARN_ON_ONCE(!mem))
> +			continue;
> +		mem->section_count = 0;
> +		unregister_memory_block_under_nodes(mem);
>  		unregister_memory(mem);
> -	else
> -		put_device(&mem->dev);
> -
> -out_unlock:
> +	}
>  	mutex_unlock(&mem_sysfs_mutex);
>  }
>  
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 8598fcbd2a17..04fdfa99b8bc 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -801,9 +801,10 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
>  	return 0;
>  }
>  
> -/* unregister memory section under all nodes that it spans */
> -int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -				    unsigned long phys_index)
> +/*
> + * Unregister memory block device under all nodes that it spans.
> + */
> +int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  {
>  	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
>  	unsigned long pfn, sect_start_pfn, sect_end_pfn;
> @@ -816,8 +817,8 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>  		return -ENOMEM;
>  	nodes_clear(*unlinked_nodes);
>  
> -	sect_start_pfn = section_nr_to_pfn(phys_index);
> -	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
> +	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
> +	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
>  	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
>  		int nid;
>  
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index db3e8567f900..f26a5417ec5d 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -112,7 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
>  extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>  int create_memory_block_devices(unsigned long start, unsigned long size);
> -extern void unregister_memory_section(struct mem_section *);
> +void remove_memory_block_devices(unsigned long start, unsigned long size);
>  extern int memory_dev_init(void);
>  extern int memory_notify(unsigned long val, void *v);
>  extern int memory_isolate_notify(unsigned long val, void *v);
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 1a557c589ecb..02a29e71b175 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -139,8 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
>  extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
>  extern int register_mem_sect_under_node(struct memory_block *mem_blk,
>  						void *arg);
> -extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -					   unsigned long phys_index);
> +extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
>  
>  extern int register_memory_node_under_compute_node(unsigned int mem_nid,
>  						   unsigned int cpu_nid,
> @@ -176,8 +175,7 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
>  {
>  	return 0;
>  }
> -static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -						  unsigned long phys_index)
> +static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  {
>  	return 0;
>  }
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9a92549ef23b..82136c5b4c5f 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -520,8 +520,6 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
>  	if (WARN_ON_ONCE(!valid_section(ms)))
>  		return;
>  
> -	unregister_memory_section(ms);
> -
>  	scn_nr = __section_nr(ms);
>  	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
>  	__remove_zone(zone, start_pfn);
> @@ -1845,6 +1843,9 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
>  	memblock_free(start, size);
>  	memblock_remove(start, size);
>  
> +	/* remove memory block devices before removing memory */
> +	remove_memory_block_devices(start, size);
> +
>  	arch_remove_memory(nid, start, size, NULL);
>  	__release_memory_resource(start, size);
>  
> -- 
> 2.20.1
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-05-27 11:11 ` [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
  2019-06-05 21:21   ` Wei Yang
  2019-06-10 16:56   ` Oscar Salvador
@ 2019-07-01  8:51   ` Michal Hocko
  2019-07-01  9:36     ` Oscar Salvador
  2 siblings, 1 reply; 68+ messages in thread
From: Michal Hocko @ 2019-07-01  8:51 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Oscar Salvador, Jonathan Cameron

On Mon 27-05-19 13:11:51, David Hildenbrand wrote:
> We really don't want anything during memory hotunplug to fail.
> We always pass a valid memory block device, that check can go. Avoid
> allocating memory and eventually failing. As we are always called under
> lock, we can use a static piece of memory. This avoids having to put
> the structure onto the stack, having to guess about the stack size
> of callers.
> 
> Patch inspired by a patch from Oscar Salvador.
> 
> In the future, there might be no need to iterate over nodes at all.
> mem->nid should tell us exactly what to remove. Memory block devices
> with mixed nodes (added during boot) should properly fenced off and never
> removed.

Yeah, we do not allow to offline multi zone (node) ranges so the current
code seems to be over engineered.

Anyway, I am wondering why do we have to strictly check for already
removed nodes links. Is the sysfs code going to complain we we try to
remove again?
 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Anyway
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  drivers/base/node.c  | 18 +++++-------------
>  include/linux/node.h |  5 ++---
>  2 files changed, 7 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 04fdfa99b8bc..9be88fd05147 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
>  
>  /*
>   * Unregister memory block device under all nodes that it spans.
> + * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).
>   */
> -int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
> +void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  {
> -	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
>  	unsigned long pfn, sect_start_pfn, sect_end_pfn;
> +	static nodemask_t unlinked_nodes;
>  
> -	if (!mem_blk) {
> -		NODEMASK_FREE(unlinked_nodes);
> -		return -EFAULT;
> -	}
> -	if (!unlinked_nodes)
> -		return -ENOMEM;
> -	nodes_clear(*unlinked_nodes);
> -
> +	nodes_clear(unlinked_nodes);
>  	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
>  	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
>  	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
> @@ -827,15 +821,13 @@ int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  			continue;
>  		if (!node_online(nid))
>  			continue;
> -		if (node_test_and_set(nid, *unlinked_nodes))
> +		if (node_test_and_set(nid, unlinked_nodes))
>  			continue;
>  		sysfs_remove_link(&node_devices[nid]->dev.kobj,
>  			 kobject_name(&mem_blk->dev.kobj));
>  		sysfs_remove_link(&mem_blk->dev.kobj,
>  			 kobject_name(&node_devices[nid]->dev.kobj));
>  	}
> -	NODEMASK_FREE(unlinked_nodes);
> -	return 0;
>  }
>  
>  int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn)
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 02a29e71b175..548c226966a2 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -139,7 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
>  extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
>  extern int register_mem_sect_under_node(struct memory_block *mem_blk,
>  						void *arg);
> -extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
> +extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
>  
>  extern int register_memory_node_under_compute_node(unsigned int mem_nid,
>  						   unsigned int cpu_nid,
> @@ -175,9 +175,8 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
>  {
>  	return 0;
>  }
> -static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
> +static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  {
> -	return 0;
>  }
>  
>  static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
> -- 
> 2.20.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section
  2019-05-27 11:11 ` [PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
  2019-06-05 21:21   ` Wei Yang
  2019-06-10 16:58   ` Oscar Salvador
@ 2019-07-01  8:52   ` Michal Hocko
  2 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2019-07-01  8:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov

On Mon 27-05-19 13:11:52, David Hildenbrand wrote:
> The parameter is unused, so let's drop it. Memory removal paths should
> never care about zones. This is the job of memory offlining and will
> require more refactorings.
> 
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/memory_hotplug.h | 2 +-
>  mm/memory_hotplug.c            | 2 +-
>  mm/sparse.c                    | 4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 2f1f87e13baa..1a4257c5f74c 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -346,7 +346,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  extern bool is_memblock_offlined(struct memory_block *mem);
>  extern int sparse_add_one_section(int nid, unsigned long start_pfn,
>  				  struct vmem_altmap *altmap);
> -extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
> +extern void sparse_remove_one_section(struct mem_section *ms,
>  		unsigned long map_offset, struct vmem_altmap *altmap);
>  extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
>  					  unsigned long pnum);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 82136c5b4c5f..e48ec7b9dee2 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -524,7 +524,7 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
>  	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
>  	__remove_zone(zone, start_pfn);
>  
> -	sparse_remove_one_section(zone, ms, map_offset, altmap);
> +	sparse_remove_one_section(ms, map_offset, altmap);
>  }
>  
>  /**
> diff --git a/mm/sparse.c b/mm/sparse.c
> index d1d5e05f5b8d..1552c855d62a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -800,8 +800,8 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap,
>  		free_map_bootmem(memmap);
>  }
>  
> -void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
> -		unsigned long map_offset, struct vmem_altmap *altmap)
> +void sparse_remove_one_section(struct mem_section *ms, unsigned long map_offset,
> +			       struct vmem_altmap *altmap)
>  {
>  	struct page *memmap = NULL;
>  	unsigned long *usemap = NULL;
> -- 
> 2.20.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-07-01  8:51   ` Michal Hocko
@ 2019-07-01  9:36     ` Oscar Salvador
  2019-07-01 10:27       ` Michal Hocko
  0 siblings, 1 reply; 68+ messages in thread
From: Oscar Salvador @ 2019-07-01  9:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Hildenbrand, linux-mm, linux-kernel, linux-ia64,
	linuxppc-dev, linux-s390, linux-sh, linux-arm-kernel, akpm,
	Dan Williams, Wei Yang, Igor Mammedov, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Jonathan Cameron

On Mon, Jul 01, 2019 at 10:51:44AM +0200, Michal Hocko wrote:
> Yeah, we do not allow to offline multi zone (node) ranges so the current
> code seems to be over engineered.
> 
> Anyway, I am wondering why do we have to strictly check for already
> removed nodes links. Is the sysfs code going to complain we we try to
> remove again?

No, sysfs will silently "fail" if the symlink has already been removed.
At least that is what I saw last time I played with it.

I guess the question is what if sysfs handling changes in the future
and starts dropping warnings when trying to remove a symlink is not there.
Maybe that is unlikely to happen?

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-07-01  9:36     ` Oscar Salvador
@ 2019-07-01 10:27       ` Michal Hocko
  2019-07-15 11:10         ` David Hildenbrand
  0 siblings, 1 reply; 68+ messages in thread
From: Michal Hocko @ 2019-07-01 10:27 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: David Hildenbrand, linux-mm, linux-kernel, linux-ia64,
	linuxppc-dev, linux-s390, linux-sh, linux-arm-kernel, akpm,
	Dan Williams, Wei Yang, Igor Mammedov, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Jonathan Cameron

On Mon 01-07-19 11:36:44, Oscar Salvador wrote:
> On Mon, Jul 01, 2019 at 10:51:44AM +0200, Michal Hocko wrote:
> > Yeah, we do not allow to offline multi zone (node) ranges so the current
> > code seems to be over engineered.
> > 
> > Anyway, I am wondering why do we have to strictly check for already
> > removed nodes links. Is the sysfs code going to complain we we try to
> > remove again?
> 
> No, sysfs will silently "fail" if the symlink has already been removed.
> At least that is what I saw last time I played with it.
> 
> I guess the question is what if sysfs handling changes in the future
> and starts dropping warnings when trying to remove a symlink is not there.
> Maybe that is unlikely to happen?

And maybe we handle it then rather than have a static allocation that
everybody with hotremove configured has to pay for.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory()
  2019-07-01  7:43   ` Michal Hocko
@ 2019-07-01 12:46     ` Michal Hocko
  2019-07-15 10:51       ` David Hildenbrand
  0 siblings, 1 reply; 68+ messages in thread
From: Michal Hocko @ 2019-07-01 12:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Martin Schwidefsky, Heiko Carstens, Mike Rapoport,
	Vasily Gorbik, Oscar Salvador

On Mon 01-07-19 09:43:06, Michal Hocko wrote:
> On Mon 27-05-19 13:11:43, David Hildenbrand wrote:
> > ZONE_DEVICE is not yet supported, fail if an altmap is passed, so we
> > don't forget arch_add_memory()/arch_remove_memory() when unlocking
> > support.
> 
> Why do we need this? Sure ZONE_DEVICE is not supported for s390 and so
> might be the case for other arches which support hotplug. I do not see
> much point in adding warning to each of them.

I would drop this one. If there is a strong reason to have something
like that it should come with a better explanation and it can be done on
top.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 03/11] s390x/mm: Implement arch_remove_memory()
  2019-07-01  7:45   ` Michal Hocko
@ 2019-07-01 12:47     ` Michal Hocko
  2019-07-15 10:45       ` David Hildenbrand
  0 siblings, 1 reply; 68+ messages in thread
From: Michal Hocko @ 2019-07-01 12:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Martin Schwidefsky, Heiko Carstens, Mike Rapoport,
	Vasily Gorbik, Oscar Salvador

On Mon 01-07-19 09:45:03, Michal Hocko wrote:
> On Mon 27-05-19 13:11:44, David Hildenbrand wrote:
> > Will come in handy when wanting to handle errors after
> > arch_add_memory().
> 
> I do not understand this. Why do you add a code for something that is
> not possible on this HW (based on the comment - is it still valid btw?)

Same as the previous patch (drop it).

> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Vasily Gorbik <gor@linux.ibm.com>
> > Cc: Oscar Salvador <osalvador@suse.com>
> > Signed-off-by: David Hildenbrand <david@redhat.com>
> > ---
> >  arch/s390/mm/init.c | 13 +++++++------
> >  1 file changed, 7 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> > index d552e330fbcc..14955e0a9fcf 100644
> > --- a/arch/s390/mm/init.c
> > +++ b/arch/s390/mm/init.c
> > @@ -243,12 +243,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >  void arch_remove_memory(int nid, u64 start, u64 size,
> >  			struct vmem_altmap *altmap)
> >  {
> > -	/*
> > -	 * There is no hardware or firmware interface which could trigger a
> > -	 * hot memory remove on s390. So there is nothing that needs to be
> > -	 * implemented.
> > -	 */
> > -	BUG();
> > +	unsigned long start_pfn = start >> PAGE_SHIFT;
> > +	unsigned long nr_pages = size >> PAGE_SHIFT;
> > +	struct zone *zone;
> > +
> > +	zone = page_zone(pfn_to_page(start_pfn));
> > +	__remove_pages(zone, start_pfn, nr_pages, altmap);
> > +	vmem_remove_mapping(start, size);
> >  }
> >  #endif
> >  #endif /* CONFIG_MEMORY_HOTPLUG */
> > -- 
> > 2.20.1
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation
  2019-05-27 11:11 ` [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation David Hildenbrand
  2019-06-03 21:41   ` Wei Yang
@ 2019-07-01 12:48   ` Michal Hocko
  1 sibling, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2019-07-01 12:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Catalin Marinas, Will Deacon, Mark Rutland,
	Ard Biesheuvel, Chintan Pandya, Mike Rapoport, Jun Yao, Yu Zhao,
	Robin Murphy, Anshuman Khandual

On Mon 27-05-19 13:11:45, David Hildenbrand wrote:
> A proper arch_remove_memory() implementation is on its way, which also
> cleanly removes page tables in arch_add_memory() in case something goes
> wrong.
> 
> As we want to use arch_remove_memory() in case something goes wrong
> during memory hotplug after arch_add_memory() finished, let's add
> a temporary hack that is sufficient enough until we get a proper
> implementation that cleans up page table entries.
> 
> We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up
> patches.

I would drop this one as well (like s390 counterpart).
 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Chintan Pandya <cpandya@codeaurora.org>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Jun Yao <yaojun8558363@gmail.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/arm64/mm/mmu.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index a1bfc4413982..e569a543c384 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1084,4 +1084,23 @@ int arch_add_memory(int nid, u64 start, u64 size,
>  	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
>  			   restrictions);
>  }
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +void arch_remove_memory(int nid, u64 start, u64 size,
> +			struct vmem_altmap *altmap)
> +{
> +	unsigned long start_pfn = start >> PAGE_SHIFT;
> +	unsigned long nr_pages = size >> PAGE_SHIFT;
> +	struct zone *zone;
> +
> +	/*
> +	 * FIXME: Cleanup page tables (also in arch_add_memory() in case
> +	 * adding fails). Until then, this function should only be used
> +	 * during memory hotplug (adding memory), not for memory
> +	 * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
> +	 * unlocked yet.
> +	 */
> +	zone = page_zone(pfn_to_page(start_pfn));
> +	__remove_pages(zone, start_pfn, nr_pages, altmap);
> +}
> +#endif
>  #endif
> -- 
> 2.20.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE
  2019-07-01  8:01   ` Michal Hocko
@ 2019-07-01 12:51     ` Michal Hocko
  2019-07-15 10:54       ` David Hildenbrand
  0 siblings, 1 reply; 68+ messages in thread
From: Michal Hocko @ 2019-07-01 12:51 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Tony Luck, Fenghua Yu, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Greg Kroah-Hartman,
	Rafael J. Wysocki, Mike Rapoport, Oscar Salvador,
	Kirill A. Shutemov, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Christophe Leroy, Nicholas Piggin, Vasily Gorbik,
	Rob Herring, Masahiro Yamada, mike.travis, Andrew Banman,
	Pavel Tatashin, Wei Yang, Arun KS, Qian Cai, Mathieu Malaterre,
	Baoquan He, Logan Gunthorpe, Anshuman Khandual

On Mon 01-07-19 10:01:41, Michal Hocko wrote:
> On Mon 27-05-19 13:11:47, David Hildenbrand wrote:
> > We want to improve error handling while adding memory by allowing
> > to use arch_remove_memory() and __remove_pages() even if
> > CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
> > 
> > 	arch_add_memory()
> > 	rc = do_something();
> > 	if (rc) {
> > 		arch_remove_memory();
> > 	}
> > 
> > We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
> > quite some dependencies for memory offlining.
> 
> If we cannot really remove CONFIG_MEMORY_HOTREMOVE altogether then why
> not simply add an empty placeholder for arch_remove_memory when the
> config is disabled?

In other words, can we replace this by something as simple as:

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ae892eef8b82..0329027fe740 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -128,6 +128,20 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
 			       struct vmem_altmap *altmap);
 extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
 			   unsigned long nr_pages, struct vmem_altmap *altmap);
+#else
+/*
+ * Allow code using
+ * arch_add_memory();
+ * rc = do_something();
+ * if (rc)
+ * 	arch_remove_memory();
+ *
+ * without ifdefery.
+ */
+static inline void arch_remove_memory(int nid, u64 start, u64 size,
+			       struct vmem_altmap *altmap)
+{
+}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /*
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 03/11] s390x/mm: Implement arch_remove_memory()
  2019-07-01 12:47     ` Michal Hocko
@ 2019-07-15 10:45       ` David Hildenbrand
  0 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-07-15 10:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Martin Schwidefsky, Heiko Carstens, Mike Rapoport,
	Vasily Gorbik, Oscar Salvador

On 01.07.19 14:47, Michal Hocko wrote:
> On Mon 01-07-19 09:45:03, Michal Hocko wrote:
>> On Mon 27-05-19 13:11:44, David Hildenbrand wrote:
>>> Will come in handy when wanting to handle errors after
>>> arch_add_memory().
>>
>> I do not understand this. Why do you add a code for something that is
>> not possible on this HW (based on the comment - is it still valid btw?)
> 
> Same as the previous patch (drop it).

No. As the description says, this will be needed to handle errors in
patch 6 cleanly.

And BTW, with paravirtualied devices like virtio-pmem and virtio-mem,
this will also see some other users in the future.

Thanks.

> 
>>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>>> Cc: Oscar Salvador <osalvador@suse.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  arch/s390/mm/init.c | 13 +++++++------
>>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>>> index d552e330fbcc..14955e0a9fcf 100644
>>> --- a/arch/s390/mm/init.c
>>> +++ b/arch/s390/mm/init.c
>>> @@ -243,12 +243,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
>>>  void arch_remove_memory(int nid, u64 start, u64 size,
>>>  			struct vmem_altmap *altmap)
>>>  {
>>> -	/*
>>> -	 * There is no hardware or firmware interface which could trigger a
>>> -	 * hot memory remove on s390. So there is nothing that needs to be
>>> -	 * implemented.
>>> -	 */
>>> -	BUG();
>>> +	unsigned long start_pfn = start >> PAGE_SHIFT;
>>> +	unsigned long nr_pages = size >> PAGE_SHIFT;
>>> +	struct zone *zone;
>>> +
>>> +	zone = page_zone(pfn_to_page(start_pfn));
>>> +	__remove_pages(zone, start_pfn, nr_pages, altmap);
>>> +	vmem_remove_mapping(start, size);
>>>  }
>>>  #endif
>>>  #endif /* CONFIG_MEMORY_HOTPLUG */
>>> -- 
>>> 2.20.1
>>>
>>
>> -- 
>> Michal Hocko
>> SUSE Labs
> 


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory()
  2019-07-01 12:46     ` Michal Hocko
@ 2019-07-15 10:51       ` David Hildenbrand
  2019-07-19  6:45         ` Michal Hocko
  0 siblings, 1 reply; 68+ messages in thread
From: David Hildenbrand @ 2019-07-15 10:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Martin Schwidefsky, Heiko Carstens, Mike Rapoport,
	Vasily Gorbik, Oscar Salvador

On 01.07.19 14:46, Michal Hocko wrote:
> On Mon 01-07-19 09:43:06, Michal Hocko wrote:
>> On Mon 27-05-19 13:11:43, David Hildenbrand wrote:
>>> ZONE_DEVICE is not yet supported, fail if an altmap is passed, so we
>>> don't forget arch_add_memory()/arch_remove_memory() when unlocking
>>> support.
>>
>> Why do we need this? Sure ZONE_DEVICE is not supported for s390 and so
>> might be the case for other arches which support hotplug. I do not see
>> much point in adding warning to each of them.
> 
> I would drop this one. If there is a strong reason to have something
> like that it should come with a better explanation and it can be done on
> top.
> 

This was requested by Dan and I agree it is the right thing to do. In
the context of paravirtualized devices (e.g., virtio-pmem), it makes
sense to block functionality an arch does not support.

I'll leave the decision to Andrew.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE
  2019-07-01 12:51     ` Michal Hocko
@ 2019-07-15 10:54       ` David Hildenbrand
  2019-07-19  6:06         ` Michal Hocko
  0 siblings, 1 reply; 68+ messages in thread
From: David Hildenbrand @ 2019-07-15 10:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Tony Luck, Fenghua Yu, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Greg Kroah-Hartman,
	Rafael J. Wysocki, Mike Rapoport, Oscar Salvador,
	Kirill A. Shutemov, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Christophe Leroy, Nicholas Piggin, Vasily Gorbik,
	Rob Herring, Masahiro Yamada, mike.travis, Andrew Banman,
	Pavel Tatashin, Wei Yang, Arun KS, Qian Cai, Mathieu Malaterre,
	Baoquan He, Logan Gunthorpe, Anshuman Khandual

On 01.07.19 14:51, Michal Hocko wrote:
> On Mon 01-07-19 10:01:41, Michal Hocko wrote:
>> On Mon 27-05-19 13:11:47, David Hildenbrand wrote:
>>> We want to improve error handling while adding memory by allowing
>>> to use arch_remove_memory() and __remove_pages() even if
>>> CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
>>>
>>> 	arch_add_memory()
>>> 	rc = do_something();
>>> 	if (rc) {
>>> 		arch_remove_memory();
>>> 	}
>>>
>>> We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
>>> quite some dependencies for memory offlining.
>>
>> If we cannot really remove CONFIG_MEMORY_HOTREMOVE altogether then why
>> not simply add an empty placeholder for arch_remove_memory when the
>> config is disabled?
> 
> In other words, can we replace this by something as simple as:
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index ae892eef8b82..0329027fe740 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -128,6 +128,20 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
>  			       struct vmem_altmap *altmap);
>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
>  			   unsigned long nr_pages, struct vmem_altmap *altmap);
> +#else
> +/*
> + * Allow code using
> + * arch_add_memory();
> + * rc = do_something();
> + * if (rc)
> + * 	arch_remove_memory();
> + *
> + * without ifdefery.
> + */
> +static inline void arch_remove_memory(int nid, u64 start, u64 size,
> +			       struct vmem_altmap *altmap)
> +{
> +}
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  /*
> 

A system configured without CONFIG_MEMORY_HOTREMOVE should not suddenly
behave worse than before when adding of memory fails. What you suggest
result in that.

The goal should be to force architectures to properly implement
arch_remove_memory() right from the start - which is the case for all
architectures after this patch set *except* arm, for which a proper
implementation is on the way.

So I'm leaving it like it is. arch_remove_memory() will be mandatory for
architectures implementing arch_add_memory().

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
  2019-07-01  8:41   ` Michal Hocko
@ 2019-07-15 10:58     ` David Hildenbrand
  0 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-07-15 10:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	mike.travis, Andrew Banman, Ingo Molnar, Alex Deucher,
	David S. Miller, Mark Brown, Chris Wilson, Oscar Salvador,
	Jonathan Cameron, Pavel Tatashin, Arun KS, Mathieu Malaterre

On 01.07.19 10:41, Michal Hocko wrote:
> On Mon 27-05-19 13:11:50, David Hildenbrand wrote:
>> Let's factor out removing of memory block devices, which is only
>> necessary for memory added via add_memory() and friends that created
>> memory block devices. Remove the devices before calling
>> arch_remove_memory().
>>
>> This finishes factoring out memory block device handling from
>> arch_add_memory() and arch_remove_memory().
> 
> OK, this makes sense again. Just a nit. Calling find_memory_block_by_id
> for each memory block looks a bit suboptimal, especially when we are
> removing consequent physical memblocks. I have to confess that I do not
> know how expensive is the search and I also expect that there won't be
> that many memblocks in the removed range anyway as large setups have
> large memblocks.
> 

The devices are not allocated sequentially, so there is no easy way to
look them up.

There is a comment for find_memory_block():

"For now, we have a linear search to go find the appropriate
memory_block corresponding to a particular phys_index. If this gets to
be a real problem, we can always use a radix tree or something here."

So if this becomes a problem, we need a separate data structure to speed
up the lookup. (IOW, this was already the same in the old code)

Thanks!

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-07-01 10:27       ` Michal Hocko
@ 2019-07-15 11:10         ` David Hildenbrand
  2019-07-16  8:46           ` Oscar Salvador
  2019-07-19  6:05           ` Michal Hocko
  0 siblings, 2 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-07-15 11:10 UTC (permalink / raw)
  To: Michal Hocko, Oscar Salvador
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Jonathan Cameron

On 01.07.19 12:27, Michal Hocko wrote:
> On Mon 01-07-19 11:36:44, Oscar Salvador wrote:
>> On Mon, Jul 01, 2019 at 10:51:44AM +0200, Michal Hocko wrote:
>>> Yeah, we do not allow to offline multi zone (node) ranges so the current
>>> code seems to be over engineered.
>>>
>>> Anyway, I am wondering why do we have to strictly check for already
>>> removed nodes links. Is the sysfs code going to complain we we try to
>>> remove again?
>>
>> No, sysfs will silently "fail" if the symlink has already been removed.
>> At least that is what I saw last time I played with it.
>>
>> I guess the question is what if sysfs handling changes in the future
>> and starts dropping warnings when trying to remove a symlink is not there.
>> Maybe that is unlikely to happen?
> 
> And maybe we handle it then rather than have a static allocation that
> everybody with hotremove configured has to pay for.
> 

So what's the suggestion? Dropping the nodemask_t completely and calling
sysfs_remove_link() on already potentially removed links?

Of course, we can also just use mem_blk->nid and rest assured that it
will never be called for memory blocks belonging to multiple nodes.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-07-15 11:10         ` David Hildenbrand
@ 2019-07-16  8:46           ` Oscar Salvador
  2019-07-16 11:08             ` David Hildenbrand
  2019-07-16 11:09             ` David Hildenbrand
  2019-07-19  6:05           ` Michal Hocko
  1 sibling, 2 replies; 68+ messages in thread
From: Oscar Salvador @ 2019-07-16  8:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, linux-mm, linux-kernel, linux-ia64, linuxppc-dev,
	linux-s390, linux-sh, linux-arm-kernel, akpm, Dan Williams,
	Wei Yang, Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Jonathan Cameron

On Mon, Jul 15, 2019 at 01:10:33PM +0200, David Hildenbrand wrote:
> On 01.07.19 12:27, Michal Hocko wrote:
> > On Mon 01-07-19 11:36:44, Oscar Salvador wrote:
> >> On Mon, Jul 01, 2019 at 10:51:44AM +0200, Michal Hocko wrote:
> >>> Yeah, we do not allow to offline multi zone (node) ranges so the current
> >>> code seems to be over engineered.
> >>>
> >>> Anyway, I am wondering why do we have to strictly check for already
> >>> removed nodes links. Is the sysfs code going to complain we we try to
> >>> remove again?
> >>
> >> No, sysfs will silently "fail" if the symlink has already been removed.
> >> At least that is what I saw last time I played with it.
> >>
> >> I guess the question is what if sysfs handling changes in the future
> >> and starts dropping warnings when trying to remove a symlink is not there.
> >> Maybe that is unlikely to happen?
> > 
> > And maybe we handle it then rather than have a static allocation that
> > everybody with hotremove configured has to pay for.
> > 
> 
> So what's the suggestion? Dropping the nodemask_t completely and calling
> sysfs_remove_link() on already potentially removed links?
> 
> Of course, we can also just use mem_blk->nid and rest assured that it
> will never be called for memory blocks belonging to multiple nodes.

Hi David,

While it is easy to construct a scenario where a memblock belongs to multiple
nodes, I have to confess that I yet have not seen that in a real-world scenario.

Given said that, I think that the less risky way is to just drop the nodemask_t
and do not care about calling sysfs_remove_link() for already removed links.
As I said, sysfs_remove_link() will silently fail when it fails to find the
symlink, so I do not think it is a big deal.


-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-07-16  8:46           ` Oscar Salvador
@ 2019-07-16 11:08             ` David Hildenbrand
  2019-07-16 11:09             ` David Hildenbrand
  1 sibling, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-07-16 11:08 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Michal Hocko, linux-mm, linux-kernel, linux-ia64, linuxppc-dev,
	linux-s390, linux-sh, linux-arm-kernel, akpm, Dan Williams,
	Wei Yang, Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Jonathan Cameron

On 16.07.19 10:46, Oscar Salvador wrote:
> On Mon, Jul 15, 2019 at 01:10:33PM +0200, David Hildenbrand wrote:
>> On 01.07.19 12:27, Michal Hocko wrote:
>>> On Mon 01-07-19 11:36:44, Oscar Salvador wrote:
>>>> On Mon, Jul 01, 2019 at 10:51:44AM +0200, Michal Hocko wrote:
>>>>> Yeah, we do not allow to offline multi zone (node) ranges so the current
>>>>> code seems to be over engineered.
>>>>>
>>>>> Anyway, I am wondering why do we have to strictly check for already
>>>>> removed nodes links. Is the sysfs code going to complain we we try to
>>>>> remove again?
>>>>
>>>> No, sysfs will silently "fail" if the symlink has already been removed.
>>>> At least that is what I saw last time I played with it.
>>>>
>>>> I guess the question is what if sysfs handling changes in the future
>>>> and starts dropping warnings when trying to remove a symlink is not there.
>>>> Maybe that is unlikely to happen?
>>>
>>> And maybe we handle it then rather than have a static allocation that
>>> everybody with hotremove configured has to pay for.
>>>
>>
>> So what's the suggestion? Dropping the nodemask_t completely and calling
>> sysfs_remove_link() on already potentially removed links?
>>
>> Of course, we can also just use mem_blk->nid and rest assured that it
>> will never be called for memory blocks belonging to multiple nodes.
> 
> Hi David,
> 
> While it is easy to construct a scenario where a memblock belongs to multiple
> nodes, I have to confess that I yet have not seen that in a real-world scenario.
> 
> Given said that, I think that the less risky way is to just drop the nodemask_t
> and do not care about calling sysfs_remove_link() for already removed links.
> As I said, sysfs_remove_link() will silently fail when it fails to find the
> symlink, so I do not think it is a big deal.
> 
> 

As far as I can tell we

a) don't allow offlining of memory that belongs to multiple nodes
already (as pointed out by Michael recently)

b) users cannot add memory blocks that belong to multiple nodes via
add_memory()

So I don't see a way how remove_memory() (and even offline_pages())
could ever succeed on such memory blocks.

I think it should be fine to limit it to one node here. (if not, I guess
we would have a different BUG that would actually allow to remove such
memory blocks)

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-07-16  8:46           ` Oscar Salvador
  2019-07-16 11:08             ` David Hildenbrand
@ 2019-07-16 11:09             ` David Hildenbrand
  1 sibling, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2019-07-16 11:09 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Michal Hocko, linux-mm, linux-kernel, linux-ia64, linuxppc-dev,
	linux-s390, linux-sh, linux-arm-kernel, akpm, Dan Williams,
	Wei Yang, Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Jonathan Cameron

On 16.07.19 10:46, Oscar Salvador wrote:
> On Mon, Jul 15, 2019 at 01:10:33PM +0200, David Hildenbrand wrote:
>> On 01.07.19 12:27, Michal Hocko wrote:
>>> On Mon 01-07-19 11:36:44, Oscar Salvador wrote:
>>>> On Mon, Jul 01, 2019 at 10:51:44AM +0200, Michal Hocko wrote:
>>>>> Yeah, we do not allow to offline multi zone (node) ranges so the current
>>>>> code seems to be over engineered.
>>>>>
>>>>> Anyway, I am wondering why do we have to strictly check for already
>>>>> removed nodes links. Is the sysfs code going to complain we we try to
>>>>> remove again?
>>>>
>>>> No, sysfs will silently "fail" if the symlink has already been removed.
>>>> At least that is what I saw last time I played with it.
>>>>
>>>> I guess the question is what if sysfs handling changes in the future
>>>> and starts dropping warnings when trying to remove a symlink is not there.
>>>> Maybe that is unlikely to happen?
>>>
>>> And maybe we handle it then rather than have a static allocation that
>>> everybody with hotremove configured has to pay for.
>>>
>>
>> So what's the suggestion? Dropping the nodemask_t completely and calling
>> sysfs_remove_link() on already potentially removed links?
>>
>> Of course, we can also just use mem_blk->nid and rest assured that it
>> will never be called for memory blocks belonging to multiple nodes.
> 
> Hi David,
> 
> While it is easy to construct a scenario where a memblock belongs to multiple
> nodes, I have to confess that I yet have not seen that in a real-world scenario.
> 
> Given said that, I think that the less risky way is to just drop the nodemask_t
> and do not care about calling sysfs_remove_link() for already removed links.
> As I said, sysfs_remove_link() will silently fail when it fails to find the
> symlink, so I do not think it is a big deal.
> 
> 

As far as I can tell we

a) don't allow offlining of memory that belongs to multiple nodes
already (as pointed out by Michal recently)

b) users cannot add memory blocks that belong to multiple nodes via
add_memory()

So I don't see a way how remove_memory() (and even offline_pages())
could ever succeed on such memory blocks.

I think it should be fine to limit it to one node here. (if not, I guess
we would have a different BUG that would actually allow to remove such
memory blocks)

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-07-15 11:10         ` David Hildenbrand
  2019-07-16  8:46           ` Oscar Salvador
@ 2019-07-19  6:05           ` Michal Hocko
  1 sibling, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2019-07-19  6:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, linux-mm, linux-kernel, linux-ia64, linuxppc-dev,
	linux-s390, linux-sh, linux-arm-kernel, akpm, Dan Williams,
	Wei Yang, Igor Mammedov, Greg Kroah-Hartman, Rafael J. Wysocki,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Jonathan Cameron

On Mon 15-07-19 13:10:33, David Hildenbrand wrote:
> On 01.07.19 12:27, Michal Hocko wrote:
> > On Mon 01-07-19 11:36:44, Oscar Salvador wrote:
> >> On Mon, Jul 01, 2019 at 10:51:44AM +0200, Michal Hocko wrote:
> >>> Yeah, we do not allow to offline multi zone (node) ranges so the current
> >>> code seems to be over engineered.
> >>>
> >>> Anyway, I am wondering why do we have to strictly check for already
> >>> removed nodes links. Is the sysfs code going to complain we we try to
> >>> remove again?
> >>
> >> No, sysfs will silently "fail" if the symlink has already been removed.
> >> At least that is what I saw last time I played with it.
> >>
> >> I guess the question is what if sysfs handling changes in the future
> >> and starts dropping warnings when trying to remove a symlink is not there.
> >> Maybe that is unlikely to happen?
> > 
> > And maybe we handle it then rather than have a static allocation that
> > everybody with hotremove configured has to pay for.
> > 
> 
> So what's the suggestion? Dropping the nodemask_t completely and calling
> sysfs_remove_link() on already potentially removed links?

Yes. In a follow up patch.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE
  2019-07-15 10:54       ` David Hildenbrand
@ 2019-07-19  6:06         ` Michal Hocko
  0 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2019-07-19  6:06 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Tony Luck, Fenghua Yu, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Greg Kroah-Hartman,
	Rafael J. Wysocki, Mike Rapoport, Oscar Salvador,
	Kirill A. Shutemov, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Christophe Leroy, Nicholas Piggin, Vasily Gorbik,
	Rob Herring, Masahiro Yamada, mike.travis, Andrew Banman,
	Pavel Tatashin, Wei Yang, Arun KS, Qian Cai, Mathieu Malaterre,
	Baoquan He, Logan Gunthorpe, Anshuman Khandual

On Mon 15-07-19 12:54:20, David Hildenbrand wrote:
[...]
> So I'm leaving it like it is. arch_remove_memory() will be mandatory for
> architectures implementing arch_add_memory().

I do agree that removing CONFIG_MEMORY_HOTREMOVE makes some sense. But
this patch being a mid step should be simpler rather than going half way
to get there. I would have liked the above for the purpose of this patch
more and then go with another one to remove the config altogether. But
Andrew has already sent his patch bomb including this series to Linus so
this is all moot.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory()
  2019-07-15 10:51       ` David Hildenbrand
@ 2019-07-19  6:45         ` Michal Hocko
  0 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2019-07-19  6:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-arm-kernel, akpm, Dan Williams, Wei Yang,
	Igor Mammedov, Martin Schwidefsky, Heiko Carstens, Mike Rapoport,
	Vasily Gorbik, Oscar Salvador

On Mon 15-07-19 12:51:27, David Hildenbrand wrote:
> On 01.07.19 14:46, Michal Hocko wrote:
> > On Mon 01-07-19 09:43:06, Michal Hocko wrote:
> >> On Mon 27-05-19 13:11:43, David Hildenbrand wrote:
> >>> ZONE_DEVICE is not yet supported, fail if an altmap is passed, so we
> >>> don't forget arch_add_memory()/arch_remove_memory() when unlocking
> >>> support.
> >>
> >> Why do we need this? Sure ZONE_DEVICE is not supported for s390 and so
> >> might be the case for other arches which support hotplug. I do not see
> >> much point in adding warning to each of them.
> > 
> > I would drop this one. If there is a strong reason to have something
> > like that it should come with a better explanation and it can be done on
> > top.
> > 
> 
> This was requested by Dan and I agree it is the right thing to do.

This is probably a matter of taste. I would argue that altmap doesn't
really equal ZONE_DEVICE. This is more a mechanism to use an alternative
memmap allocator. Sure ZONE_DEVICE is the only in tree user of the
feature but I really do not see why the arh specific code should care
about it. The lack of altmap allocator is handled in the sparse code so
this is just adding an early check which might confuse people in future.

> In
> the context of paravirtualized devices (e.g., virtio-pmem), it makes
> sense to block functionality an arch does not support.

Then block it on the config dependences.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2019-07-19  6:45 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-27 11:11 [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling David Hildenbrand
2019-05-27 11:11 ` [PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
2019-05-30 17:53   ` Pavel Tatashin
2019-06-10 16:46   ` Oscar Salvador
2019-07-01  7:42   ` Michal Hocko
2019-05-27 11:11 ` [PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory() David Hildenbrand
2019-06-10 17:07   ` Oscar Salvador
2019-07-01  7:43   ` Michal Hocko
2019-07-01 12:46     ` Michal Hocko
2019-07-15 10:51       ` David Hildenbrand
2019-07-19  6:45         ` Michal Hocko
2019-05-27 11:11 ` [PATCH v3 03/11] s390x/mm: Implement arch_remove_memory() David Hildenbrand
2019-07-01  7:45   ` Michal Hocko
2019-07-01 12:47     ` Michal Hocko
2019-07-15 10:45       ` David Hildenbrand
2019-05-27 11:11 ` [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation David Hildenbrand
2019-06-03 21:41   ` Wei Yang
2019-06-04  6:56     ` David Hildenbrand
2019-06-04 17:36       ` Robin Murphy
2019-06-04 17:51         ` David Hildenbrand
2019-07-01 12:48   ` Michal Hocko
2019-05-27 11:11 ` [PATCH v3 05/11] drivers/base/memory: Pass a block_id to init_memory_block() David Hildenbrand
2019-06-03 21:49   ` Wei Yang
2019-06-04  6:56     ` David Hildenbrand
2019-07-01  7:56   ` Michal Hocko
2019-05-27 11:11 ` [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE David Hildenbrand
2019-05-30 17:56   ` Pavel Tatashin
2019-06-03 22:15   ` Wei Yang
2019-06-04  6:59     ` David Hildenbrand
2019-06-04  8:31       ` Wei Yang
2019-06-04  9:00         ` David Hildenbrand
2019-07-01  8:01   ` Michal Hocko
2019-07-01 12:51     ` Michal Hocko
2019-07-15 10:54       ` David Hildenbrand
2019-07-19  6:06         ` Michal Hocko
2019-05-27 11:11 ` [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
2019-05-30 21:07   ` Pavel Tatashin
2019-06-04 21:42   ` Wei Yang
2019-06-05  8:58     ` David Hildenbrand
2019-06-05 10:58       ` David Hildenbrand
2019-06-05 21:22         ` Wei Yang
2019-06-05 21:50           ` David Hildenbrand
2019-07-01  8:14   ` Michal Hocko
2019-05-27 11:11 ` [PATCH v3 08/11] mm/memory_hotplug: Drop MHP_MEMBLOCK_API David Hildenbrand
2019-06-04 21:47   ` Wei Yang
2019-07-01  8:15   ` Michal Hocko
2019-05-27 11:11 ` [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
2019-06-04 22:07   ` Wei Yang
2019-06-05  9:00     ` David Hildenbrand
2019-07-01  8:41   ` Michal Hocko
2019-07-15 10:58     ` David Hildenbrand
2019-05-27 11:11 ` [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
2019-06-05 21:21   ` Wei Yang
2019-06-10 16:56   ` Oscar Salvador
2019-07-01  8:51   ` Michal Hocko
2019-07-01  9:36     ` Oscar Salvador
2019-07-01 10:27       ` Michal Hocko
2019-07-15 11:10         ` David Hildenbrand
2019-07-16  8:46           ` Oscar Salvador
2019-07-16 11:08             ` David Hildenbrand
2019-07-16 11:09             ` David Hildenbrand
2019-07-19  6:05           ` Michal Hocko
2019-05-27 11:11 ` [PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
2019-06-05 21:21   ` Wei Yang
2019-06-10 16:58   ` Oscar Salvador
2019-07-01  8:52   ` Michal Hocko
2019-06-03 21:21 ` [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling Wei Yang
2019-06-03 21:40   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).