linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
       [not found] <20190507183804.5512-1-david@redhat.com>
@ 2019-05-07 18:37 ` David Hildenbrand
  2019-05-07 20:38   ` Dan Williams
  2019-05-09 12:23   ` Wei Yang
  2019-05-07 18:37 ` [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory() David Hildenbrand
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 35+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:37 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang, Arun KS,
	Mathieu Malaterre

By converting start and size to page granularity, we actually ignore
unaligned parts within a page instead of properly bailing out with an
error.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 328878b6799d..202febe88b58 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1050,16 +1050,11 @@ int try_online_node(int nid)
 
 static int check_hotplug_memory_range(u64 start, u64 size)
 {
-	unsigned long block_sz = memory_block_size_bytes();
-	u64 block_nr_pages = block_sz >> PAGE_SHIFT;
-	u64 nr_pages = size >> PAGE_SHIFT;
-	u64 start_pfn = PFN_DOWN(start);
-
 	/* memory range must be block size aligned */
-	if (!nr_pages || !IS_ALIGNED(start_pfn, block_nr_pages) ||
-	    !IS_ALIGNED(nr_pages, block_nr_pages)) {
+	if (!size || !IS_ALIGNED(start, memory_block_size_bytes()) ||
+	    !IS_ALIGNED(size, memory_block_size_bytes())) {
 		pr_err("Block size [%#lx] unaligned hotplug range: start %#llx, size %#llx",
-		       block_sz, start, size);
+		       memory_block_size_bytes(), start, size);
 		return -EINVAL;
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory()
       [not found] <20190507183804.5512-1-david@redhat.com>
  2019-05-07 18:37 ` [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
@ 2019-05-07 18:37 ` David Hildenbrand
  2019-05-07 20:46   ` Dan Williams
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:37 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Martin Schwidefsky,
	Heiko Carstens, Michal Hocko, Mike Rapoport, Vasily Gorbik,
	Oscar Salvador

Will come in handy when wanting to handle errors after
arch_add_memory().

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/init.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 31b1071315d7..1e0cbae69f12 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -237,12 +237,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
 void arch_remove_memory(int nid, u64 start, u64 size,
 			struct vmem_altmap *altmap)
 {
-	/*
-	 * There is no hardware or firmware interface which could trigger a
-	 * hot memory remove on s390. So there is nothing that needs to be
-	 * implemented.
-	 */
-	BUG();
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	struct zone *zone;
+
+	zone = page_zone(pfn_to_page(start_pfn));
+	__remove_pages(zone, start_pfn, nr_pages, altmap);
+	vmem_remove_mapping(start, size);
 }
 #endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
       [not found] <20190507183804.5512-1-david@redhat.com>
  2019-05-07 18:37 ` [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
  2019-05-07 18:37 ` [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory() David Hildenbrand
@ 2019-05-07 18:38 ` David Hildenbrand
  2019-05-07 21:17   ` Dan Williams
                     ` (4 more replies)
  2019-05-07 18:38 ` [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API David Hildenbrand
                   ` (3 subsequent siblings)
  6 siblings, 5 replies; 35+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang,
	Arun KS, Mathieu Malaterre

Only memory to be added to the buddy and to be onlined/offlined by
user space using memory block devices needs (and should have!) memory
block devices.

Factor out creation of memory block devices Create all devices after
arch_add_memory() succeeded. We can later drop the want_memblock parameter,
because it is now effectively stale.

Only after memory block devices have been added, memory can be onlined
by user space. This implies, that memory is not visible to user space at
all before arch_add_memory() succeeded.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
 include/linux/memory.h |  2 +-
 mm/memory_hotplug.c    | 15 ++++-----
 3 files changed, 53 insertions(+), 34 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 6e0cb4fda179..862c202a18ca 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
 	return 0;
 }
 
+static void unregister_memory(struct memory_block *memory)
+{
+	BUG_ON(memory->dev.bus != &memory_subsys);
+
+	/* drop the ref. we got via find_memory_block() */
+	put_device(&memory->dev);
+	device_unregister(&memory->dev);
+}
+
 /*
- * need an interface for the VM to add new memory regions,
- * but without onlining it.
+ * Create memory block devices for the given memory area. Start and size
+ * have to be aligned to memory block granularity. Memory block devices
+ * will be initialized as offline.
  */
-int hotplug_memory_register(int nid, struct mem_section *section)
+int hotplug_memory_register(unsigned long start, unsigned long size)
 {
-	int ret = 0;
+	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
+	unsigned long start_pfn = PFN_DOWN(start);
+	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
+	unsigned long pfn;
 	struct memory_block *mem;
+	int ret = 0;
 
-	mutex_lock(&mem_sysfs_mutex);
+	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
+	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
 
-	mem = find_memory_block(section);
-	if (mem) {
-		mem->section_count++;
-		put_device(&mem->dev);
-	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
+	mutex_lock(&mem_sysfs_mutex);
+	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
+		mem = find_memory_block(__pfn_to_section(pfn));
+		if (mem) {
+			WARN_ON_ONCE(false);
+			put_device(&mem->dev);
+			continue;
+		}
+		ret = init_memory_block(&mem, __pfn_to_section(pfn),
+					MEM_OFFLINE);
 		if (ret)
-			goto out;
-		mem->section_count++;
+			break;
+		mem->section_count = memory_block_size_bytes() /
+				     MIN_MEMORY_BLOCK_SIZE;
+	}
+	if (ret) {
+		end_pfn = pfn;
+		for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
+			mem = find_memory_block(__pfn_to_section(pfn));
+			if (!mem)
+				continue;
+			mem->section_count = 0;
+			unregister_memory(mem);
+		}
 	}
-
-out:
 	mutex_unlock(&mem_sysfs_mutex);
 	return ret;
 }
 
-static void
-unregister_memory(struct memory_block *memory)
-{
-	BUG_ON(memory->dev.bus != &memory_subsys);
-
-	/* drop the ref. we got via find_memory_block() */
-	put_device(&memory->dev);
-	device_unregister(&memory->dev);
-}
-
-void unregister_memory_section(struct mem_section *section)
+static int remove_memory_section(struct mem_section *section)
 {
 	struct memory_block *mem;
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 474c7c60c8f2..95505fbb5f85 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -111,7 +111,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
-int hotplug_memory_register(int nid, struct mem_section *section);
+int hotplug_memory_register(unsigned long start, unsigned long size);
 extern void unregister_memory_section(struct mem_section *);
 extern int memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7b5439839d67..e1637c8a0723 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -258,13 +258,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 		return -EEXIST;
 
 	ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
-	if (ret < 0)
-		return ret;
-
-	if (!want_memblock)
-		return 0;
-
-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
+	return ret < 0 ? ret : 0;
 }
 
 /*
@@ -1106,6 +1100,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	if (ret < 0)
 		goto error;
 
+	/* create memory block devices after memory was added */
+	ret = hotplug_memory_register(start, size);
+	if (ret) {
+		arch_remove_memory(nid, start, size, NULL);
+		goto error;
+	}
+
 	if (new_node) {
 		/* If sysfs file of new node can't be created, cpu on the node
 		 * can't be hot-added. There is no rollback way now.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
       [not found] <20190507183804.5512-1-david@redhat.com>
                   ` (2 preceding siblings ...)
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
@ 2019-05-07 18:38 ` David Hildenbrand
  2019-05-07 21:19   ` Dan Williams
  2019-05-07 18:38 ` [PATCH v2 6/8] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai,
	Arun KS, Mathieu Malaterre

No longer needed, the callers of arch_add_memory() can handle this
manually.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/memory_hotplug.h | 8 --------
 mm/memory_hotplug.c            | 9 +++------
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2d4de313926d..2f1f87e13baa 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
 extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
 			   unsigned long nr_pages, struct vmem_altmap *altmap);
 
-/*
- * Do we want sysfs memblock files created. This will allow userspace to online
- * and offline memory explicitly. Lack of this bit means that the caller has to
- * call move_pfn_range_to_zone to finish the initialization.
- */
-
-#define MHP_MEMBLOCK_API               (1<<0)
-
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
 		       struct mhp_restrictions *restrictions);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e1637c8a0723..107f72952347 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
 static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-		struct vmem_altmap *altmap, bool want_memblock)
+				   struct vmem_altmap *altmap)
 {
 	int ret;
 
@@ -293,8 +293,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	}
 
 	for (i = start_sec; i <= end_sec; i++) {
-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
-				restrictions->flags & MHP_MEMBLOCK_API);
+		err = __add_section(nid, section_nr_to_pfn(i), altmap);
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
@@ -1066,9 +1065,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
  */
 int __ref add_memory_resource(int nid, struct resource *res)
 {
-	struct mhp_restrictions restrictions = {
-		.flags = MHP_MEMBLOCK_API,
-	};
+	struct mhp_restrictions restrictions = {};
 	u64 start, size;
 	bool new_node = false;
 	int ret;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 6/8] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
       [not found] <20190507183804.5512-1-david@redhat.com>
                   ` (3 preceding siblings ...)
  2019-05-07 18:38 ` [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API David Hildenbrand
@ 2019-05-07 18:38 ` David Hildenbrand
  2019-05-07 21:27   ` Dan Williams
  2019-05-07 18:38 ` [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
  2019-05-07 18:38 ` [PATCH v2 8/8] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
  6 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Andrew Banman, Ingo Molnar,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Oscar Salvador, Jonathan Cameron, Michal Hocko, Pavel Tatashin,
	Arun KS, Mathieu Malaterre

Let's factor out removing of memory block devices, which is only
necessary for memory added via add_memory() and friends that created
memory block devices. Remove the devices before calling
arch_remove_memory().

This finishes factoring out memory block device handling from
arch_add_memory() and arch_remove_memory().

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 39 +++++++++++++++++++--------------------
 drivers/base/node.c    | 11 ++++++-----
 include/linux/memory.h |  2 +-
 include/linux/node.h   |  6 ++----
 mm/memory_hotplug.c    |  5 +++--
 5 files changed, 31 insertions(+), 32 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 862c202a18ca..47ff49058d1f 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -756,32 +756,31 @@ int hotplug_memory_register(unsigned long start, unsigned long size)
 	return ret;
 }
 
-static int remove_memory_section(struct mem_section *section)
+/*
+ * Remove memory block devices for the given memory area. Start and size
+ * have to be aligned to memory block granularity. Memory block devices
+ * have to be offline.
+ */
+void hotplug_memory_unregister(unsigned long start, unsigned long size)
 {
+	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
+	unsigned long start_pfn = PFN_DOWN(start);
+	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
 	struct memory_block *mem;
+	unsigned long pfn;
 
-	if (WARN_ON_ONCE(!present_section(section)))
-		return;
+	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
+	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
 
 	mutex_lock(&mem_sysfs_mutex);
-
-	/*
-	 * Some users of the memory hotplug do not want/need memblock to
-	 * track all sections. Skip over those.
-	 */
-	mem = find_memory_block(section);
-	if (!mem)
-		goto out_unlock;
-
-	unregister_mem_sect_under_nodes(mem, __section_nr(section));
-
-	mem->section_count--;
-	if (mem->section_count == 0)
+	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
+		mem = find_memory_block(__pfn_to_section(pfn));
+		if (!mem)
+			continue;
+		mem->section_count = 0;
+		unregister_memory_block_under_nodes(mem);
 		unregister_memory(mem);
-	else
-		put_device(&mem->dev);
-
-out_unlock:
+	}
 	mutex_unlock(&mem_sysfs_mutex);
 }
 
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 8598fcbd2a17..04fdfa99b8bc 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -801,9 +801,10 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
 	return 0;
 }
 
-/* unregister memory section under all nodes that it spans */
-int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
-				    unsigned long phys_index)
+/*
+ * Unregister memory block device under all nodes that it spans.
+ */
+int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
 	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
@@ -816,8 +817,8 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
 		return -ENOMEM;
 	nodes_clear(*unlinked_nodes);
 
-	sect_start_pfn = section_nr_to_pfn(phys_index);
-	sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
+	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
 		int nid;
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 95505fbb5f85..aa236c2a0466 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -112,7 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
 int hotplug_memory_register(unsigned long start, unsigned long size);
-extern void unregister_memory_section(struct mem_section *);
+void hotplug_memory_unregister(unsigned long start, unsigned long size);
 extern int memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
 extern int memory_isolate_notify(unsigned long val, void *v);
diff --git a/include/linux/node.h b/include/linux/node.h
index 1a557c589ecb..02a29e71b175 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -139,8 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int register_mem_sect_under_node(struct memory_block *mem_blk,
 						void *arg);
-extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
-					   unsigned long phys_index);
+extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
 
 extern int register_memory_node_under_compute_node(unsigned int mem_nid,
 						   unsigned int cpu_nid,
@@ -176,8 +175,7 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
 {
 	return 0;
 }
-static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
-						  unsigned long phys_index)
+static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
 	return 0;
 }
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 107f72952347..527fe4f9c620 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -519,8 +519,6 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
 	if (WARN_ON_ONCE(!valid_section(ms)))
 		return;
 
-	unregister_memory_section(ms);
-
 	scn_nr = __section_nr(ms);
 	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
 	__remove_zone(zone, start_pfn);
@@ -1844,6 +1842,9 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
 	memblock_free(start, size);
 	memblock_remove(start, size);
 
+	/* remove memory block devices before removing memory */
+	hotplug_memory_unregister(start, size);
+
 	arch_remove_memory(nid, start, size, NULL);
 	__release_memory_resource(start, size);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
       [not found] <20190507183804.5512-1-david@redhat.com>
                   ` (4 preceding siblings ...)
  2019-05-07 18:38 ` [PATCH v2 6/8] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
@ 2019-05-07 18:38 ` David Hildenbrand
  2019-05-08  0:15   ` Dan Williams
  2019-05-07 18:38 ` [PATCH v2 8/8] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
  6 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Oscar Salvador, Jonathan Cameron

We really don't want anything during memory hotunplug to fail.
We always pass a valid memory block device, that check can go. Avoid
allocating memory and eventually failing. As we are always called under
lock, we can use a static piece of memory. This avoids having to put
the structure onto the stack, having to guess about the stack size
of callers.

Patch inspired by a patch from Oscar Salvador.

In the future, there might be no need to iterate over nodes at all.
mem->nid should tell us exactly what to remove. Memory block devices
with mixed nodes (added during boot) should properly fenced off and never
removed.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/node.c  | 18 +++++-------------
 include/linux/node.h |  5 ++---
 2 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 04fdfa99b8bc..9be88fd05147 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
 
 /*
  * Unregister memory block device under all nodes that it spans.
+ * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).
  */
-int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
+void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
-	NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
+	static nodemask_t unlinked_nodes;
 
-	if (!mem_blk) {
-		NODEMASK_FREE(unlinked_nodes);
-		return -EFAULT;
-	}
-	if (!unlinked_nodes)
-		return -ENOMEM;
-	nodes_clear(*unlinked_nodes);
-
+	nodes_clear(unlinked_nodes);
 	sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
 	sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
@@ -827,15 +821,13 @@ int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 			continue;
 		if (!node_online(nid))
 			continue;
-		if (node_test_and_set(nid, *unlinked_nodes))
+		if (node_test_and_set(nid, unlinked_nodes))
 			continue;
 		sysfs_remove_link(&node_devices[nid]->dev.kobj,
 			 kobject_name(&mem_blk->dev.kobj));
 		sysfs_remove_link(&mem_blk->dev.kobj,
 			 kobject_name(&node_devices[nid]->dev.kobj));
 	}
-	NODEMASK_FREE(unlinked_nodes);
-	return 0;
 }
 
 int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn)
diff --git a/include/linux/node.h b/include/linux/node.h
index 02a29e71b175..548c226966a2 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -139,7 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int register_mem_sect_under_node(struct memory_block *mem_blk,
 						void *arg);
-extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
+extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
 
 extern int register_memory_node_under_compute_node(unsigned int mem_nid,
 						   unsigned int cpu_nid,
@@ -175,9 +175,8 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
 {
 	return 0;
 }
-static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
+static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
-	return 0;
 }
 
 static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 8/8] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section
       [not found] <20190507183804.5512-1-david@redhat.com>
                   ` (5 preceding siblings ...)
  2019-05-07 18:38 ` [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
@ 2019-05-07 18:38 ` David Hildenbrand
  2019-05-08  0:30   ` Dan Williams
  6 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2019-05-07 18:38 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, David Hildenbrand

Unused, and memory unplug path should never care about zones. This is
the job of memory offlining. ZONE_DEVICE might require special care -
the caller of arch_remove_memory() should handle this.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/memory_hotplug.h | 2 +-
 mm/memory_hotplug.c            | 2 +-
 mm/sparse.c                    | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2f1f87e13baa..1a4257c5f74c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -346,7 +346,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 extern bool is_memblock_offlined(struct memory_block *mem);
 extern int sparse_add_one_section(int nid, unsigned long start_pfn,
 				  struct vmem_altmap *altmap);
-extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
+extern void sparse_remove_one_section(struct mem_section *ms,
 		unsigned long map_offset, struct vmem_altmap *altmap);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 527fe4f9c620..e0340c8f6df4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -523,7 +523,7 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
 	start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
 	__remove_zone(zone, start_pfn);
 
-	sparse_remove_one_section(zone, ms, map_offset, altmap);
+	sparse_remove_one_section(ms, map_offset, altmap);
 }
 
 /**
diff --git a/mm/sparse.c b/mm/sparse.c
index d1d5e05f5b8d..1552c855d62a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -800,8 +800,8 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap,
 		free_map_bootmem(memmap);
 }
 
-void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
-		unsigned long map_offset, struct vmem_altmap *altmap)
+void sparse_remove_one_section(struct mem_section *ms, unsigned long map_offset,
+			       struct vmem_altmap *altmap)
 {
 	struct page *memmap = NULL;
 	unsigned long *usemap = NULL;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  2019-05-07 18:37 ` [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
@ 2019-05-07 20:38   ` Dan Williams
  2019-05-09 12:23   ` Wei Yang
  1 sibling, 0 replies; 35+ messages in thread
From: Dan Williams @ 2019-05-07 20:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang, Arun KS,
	Mathieu Malaterre

On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> By converting start and size to page granularity, we actually ignore
> unaligned parts within a page instead of properly bailing out with an
> error.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory()
  2019-05-07 18:37 ` [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory() David Hildenbrand
@ 2019-05-07 20:46   ` Dan Williams
  2019-05-07 20:47     ` David Hildenbrand
  0 siblings, 1 reply; 35+ messages in thread
From: Dan Williams @ 2019-05-07 20:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Martin Schwidefsky,
	Heiko Carstens, Michal Hocko, Mike Rapoport, Vasily Gorbik,
	Oscar Salvador

On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> Will come in handy when wanting to handle errors after
> arch_add_memory().
>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/mm/init.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 31b1071315d7..1e0cbae69f12 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -237,12 +237,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
>  void arch_remove_memory(int nid, u64 start, u64 size,
>                         struct vmem_altmap *altmap)
>  {
> -       /*
> -        * There is no hardware or firmware interface which could trigger a
> -        * hot memory remove on s390. So there is nothing that needs to be
> -        * implemented.
> -        */
> -       BUG();
> +       unsigned long start_pfn = start >> PAGE_SHIFT;
> +       unsigned long nr_pages = size >> PAGE_SHIFT;
> +       struct zone *zone;
> +
> +       zone = page_zone(pfn_to_page(start_pfn));

Does s390 actually support passing in an altmap? If 'yes', I think it
also needs the vmem_altmap_offset() fixup like x86-64:

        /* With altmap the first mapped page is offset from @start */
        if (altmap)
                page += vmem_altmap_offset(altmap);

...but I suspect it does not support altmap since
arch/s390/mm/vmem.c::vmemmap_populate() does not arrange for 'struct
page' capacity to be allocated out of an altmap defined page pool.

I think it would be enough to disallow any arch_add_memory() on s390
where @altmap is non-NULL. At least until s390 gains ZONE_DEVICE
support and can enable the pmem use case.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory()
  2019-05-07 20:46   ` Dan Williams
@ 2019-05-07 20:47     ` David Hildenbrand
  2019-05-07 20:57       ` Dan Williams
  0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2019-05-07 20:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Martin Schwidefsky,
	Heiko Carstens, Michal Hocko, Mike Rapoport, Vasily Gorbik,
	Oscar Salvador

On 07.05.19 22:46, Dan Williams wrote:
> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> Will come in handy when wanting to handle errors after
>> arch_add_memory().
>>
>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>> Cc: Oscar Salvador <osalvador@suse.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  arch/s390/mm/init.c | 13 +++++++------
>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>> index 31b1071315d7..1e0cbae69f12 100644
>> --- a/arch/s390/mm/init.c
>> +++ b/arch/s390/mm/init.c
>> @@ -237,12 +237,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
>>  void arch_remove_memory(int nid, u64 start, u64 size,
>>                         struct vmem_altmap *altmap)
>>  {
>> -       /*
>> -        * There is no hardware or firmware interface which could trigger a
>> -        * hot memory remove on s390. So there is nothing that needs to be
>> -        * implemented.
>> -        */
>> -       BUG();
>> +       unsigned long start_pfn = start >> PAGE_SHIFT;
>> +       unsigned long nr_pages = size >> PAGE_SHIFT;
>> +       struct zone *zone;
>> +
>> +       zone = page_zone(pfn_to_page(start_pfn));
> 
> Does s390 actually support passing in an altmap? If 'yes', I think it
> also needs the vmem_altmap_offset() fixup like x86-64:
> 
>         /* With altmap the first mapped page is offset from @start */
>         if (altmap)
>                 page += vmem_altmap_offset(altmap);
> 
> ...but I suspect it does not support altmap since
> arch/s390/mm/vmem.c::vmemmap_populate() does not arrange for 'struct
> page' capacity to be allocated out of an altmap defined page pool.
> 
> I think it would be enough to disallow any arch_add_memory() on s390
> where @altmap is non-NULL. At least until s390 gains ZONE_DEVICE
> support and can enable the pmem use case.
> 

As far as I know, it doesn't yet, however I guess this could change once
virtio-pmem is supported?

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory()
  2019-05-07 20:47     ` David Hildenbrand
@ 2019-05-07 20:57       ` Dan Williams
  2019-05-07 21:13         ` David Hildenbrand
  0 siblings, 1 reply; 35+ messages in thread
From: Dan Williams @ 2019-05-07 20:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Martin Schwidefsky,
	Heiko Carstens, Michal Hocko, Mike Rapoport, Vasily Gorbik,
	Oscar Salvador

On Tue, May 7, 2019 at 1:47 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.05.19 22:46, Dan Williams wrote:
> > On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> Will come in handy when wanting to handle errors after
> >> arch_add_memory().
> >>
> >> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> >> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Michal Hocko <mhocko@suse.com>
> >> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> >> Cc: David Hildenbrand <david@redhat.com>
> >> Cc: Vasily Gorbik <gor@linux.ibm.com>
> >> Cc: Oscar Salvador <osalvador@suse.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  arch/s390/mm/init.c | 13 +++++++------
> >>  1 file changed, 7 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> >> index 31b1071315d7..1e0cbae69f12 100644
> >> --- a/arch/s390/mm/init.c
> >> +++ b/arch/s390/mm/init.c
> >> @@ -237,12 +237,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>  void arch_remove_memory(int nid, u64 start, u64 size,
> >>                         struct vmem_altmap *altmap)
> >>  {
> >> -       /*
> >> -        * There is no hardware or firmware interface which could trigger a
> >> -        * hot memory remove on s390. So there is nothing that needs to be
> >> -        * implemented.
> >> -        */
> >> -       BUG();
> >> +       unsigned long start_pfn = start >> PAGE_SHIFT;
> >> +       unsigned long nr_pages = size >> PAGE_SHIFT;
> >> +       struct zone *zone;
> >> +
> >> +       zone = page_zone(pfn_to_page(start_pfn));
> >
> > Does s390 actually support passing in an altmap? If 'yes', I think it
> > also needs the vmem_altmap_offset() fixup like x86-64:
> >
> >         /* With altmap the first mapped page is offset from @start */
> >         if (altmap)
> >                 page += vmem_altmap_offset(altmap);
> >
> > ...but I suspect it does not support altmap since
> > arch/s390/mm/vmem.c::vmemmap_populate() does not arrange for 'struct
> > page' capacity to be allocated out of an altmap defined page pool.
> >
> > I think it would be enough to disallow any arch_add_memory() on s390
> > where @altmap is non-NULL. At least until s390 gains ZONE_DEVICE
> > support and can enable the pmem use case.
> >
>
> As far as I know, it doesn't yet, however I guess this could change once
> virtio-pmem is supported?

I would expect and request virtio-pmem remain a non-starter on s390
until s390 gains ZONE_DEVICE support. As it stands virtio-pmem is just
another flavor of the general pmem driver and the pmem driver
currently only exports ZONE_DEVICE pfns tagged by the PTE_DEVMAP
pte-flag and PFN_DEV+PFN_MAP pfn_t-flags.

A hamstrung version of DAX (CONFIG_FS_DAX_LIMITED) is enabled for the
s390/dcssblk driver, but that requires that the driver indicate this
limited use case via the PTE_SPECIAL pte-flag and PFN_SPECIAL
pfn_t-flag. I otherwise do not want to see CONFIG_FS_DAX_LIMITED
spread outside of the s390/dcssblk use case.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory()
  2019-05-07 20:57       ` Dan Williams
@ 2019-05-07 21:13         ` David Hildenbrand
  0 siblings, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2019-05-07 21:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Martin Schwidefsky,
	Heiko Carstens, Michal Hocko, Mike Rapoport, Vasily Gorbik,
	Oscar Salvador

On 07.05.19 22:57, Dan Williams wrote:
> On Tue, May 7, 2019 at 1:47 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 07.05.19 22:46, Dan Williams wrote:
>>> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> Will come in handy when wanting to handle errors after
>>>> arch_add_memory().
>>>>
>>>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>>>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>>>> Cc: Oscar Salvador <osalvador@suse.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  arch/s390/mm/init.c | 13 +++++++------
>>>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>>>> index 31b1071315d7..1e0cbae69f12 100644
>>>> --- a/arch/s390/mm/init.c
>>>> +++ b/arch/s390/mm/init.c
>>>> @@ -237,12 +237,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
>>>>  void arch_remove_memory(int nid, u64 start, u64 size,
>>>>                         struct vmem_altmap *altmap)
>>>>  {
>>>> -       /*
>>>> -        * There is no hardware or firmware interface which could trigger a
>>>> -        * hot memory remove on s390. So there is nothing that needs to be
>>>> -        * implemented.
>>>> -        */
>>>> -       BUG();
>>>> +       unsigned long start_pfn = start >> PAGE_SHIFT;
>>>> +       unsigned long nr_pages = size >> PAGE_SHIFT;
>>>> +       struct zone *zone;
>>>> +
>>>> +       zone = page_zone(pfn_to_page(start_pfn));
>>>
>>> Does s390 actually support passing in an altmap? If 'yes', I think it
>>> also needs the vmem_altmap_offset() fixup like x86-64:
>>>
>>>         /* With altmap the first mapped page is offset from @start */
>>>         if (altmap)
>>>                 page += vmem_altmap_offset(altmap);
>>>
>>> ...but I suspect it does not support altmap since
>>> arch/s390/mm/vmem.c::vmemmap_populate() does not arrange for 'struct
>>> page' capacity to be allocated out of an altmap defined page pool.
>>>
>>> I think it would be enough to disallow any arch_add_memory() on s390
>>> where @altmap is non-NULL. At least until s390 gains ZONE_DEVICE
>>> support and can enable the pmem use case.
>>>
>>
>> As far as I know, it doesn't yet, however I guess this could change once
>> virtio-pmem is supported?
> 
> I would expect and request virtio-pmem remain a non-starter on s390
> until s390 gains ZONE_DEVICE support. As it stands virtio-pmem is just
> another flavor of the general pmem driver and the pmem driver
> currently only exports ZONE_DEVICE pfns tagged by the PTE_DEVMAP
> pte-flag and PFN_DEV+PFN_MAP pfn_t-flags.

Yes, I think ZONE_DEVICE will be the way to go. On real HW, there will
never be anything mapped into the physical address space besides system
ram. However with virtio-pmem in virtual environments, we have the
option to change that.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
@ 2019-05-07 21:17   ` Dan Williams
  2019-05-07 21:27     ` David Hildenbrand
  2019-05-08  8:35   ` David Hildenbrand
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 35+ messages in thread
From: Dan Williams @ 2019-05-07 21:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang,
	Arun KS, Mathieu Malaterre

On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> Only memory to be added to the buddy and to be onlined/offlined by
> user space using memory block devices needs (and should have!) memory
> block devices.
>
> Factor out creation of memory block devices Create all devices after
> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
> because it is now effectively stale.
>
> Only after memory block devices have been added, memory can be onlined
> by user space. This implies, that memory is not visible to user space at
> all before arch_add_memory() succeeded.

Nice!

>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
>  include/linux/memory.h |  2 +-
>  mm/memory_hotplug.c    | 15 ++++-----
>  3 files changed, 53 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 6e0cb4fda179..862c202a18ca 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
>         return 0;
>  }
>
> +static void unregister_memory(struct memory_block *memory)
> +{
> +       BUG_ON(memory->dev.bus != &memory_subsys);

Given this should never happen and only a future kernel developer
might trip over it, do we really need to kill that developer's
machine? I.e. s/BUG/WARN/? I guess an argument can be made to move
such a change that to a follow-on patch since you're just preserving
existing behavior, but I figure might as well address these as the
code is refactored.

> +
> +       /* drop the ref. we got via find_memory_block() */
> +       put_device(&memory->dev);
> +       device_unregister(&memory->dev);
> +}
> +
>  /*
> - * need an interface for the VM to add new memory regions,
> - * but without onlining it.
> + * Create memory block devices for the given memory area. Start and size
> + * have to be aligned to memory block granularity. Memory block devices
> + * will be initialized as offline.
>   */
> -int hotplug_memory_register(int nid, struct mem_section *section)
> +int hotplug_memory_register(unsigned long start, unsigned long size)
>  {
> -       int ret = 0;
> +       unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
> +       unsigned long start_pfn = PFN_DOWN(start);
> +       unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
> +       unsigned long pfn;
>         struct memory_block *mem;
> +       int ret = 0;
>
> -       mutex_lock(&mem_sysfs_mutex);
> +       BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
> +       BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));

Perhaps:

    if (WARN_ON(...))
        return -EINVAL;

>
> -       mem = find_memory_block(section);
> -       if (mem) {
> -               mem->section_count++;
> -               put_device(&mem->dev);
> -       } else {
> -               ret = init_memory_block(&mem, section, MEM_OFFLINE);
> +       mutex_lock(&mem_sysfs_mutex);
> +       for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +               mem = find_memory_block(__pfn_to_section(pfn));
> +               if (mem) {
> +                       WARN_ON_ONCE(false);

?? Isn't that a nop?

> +                       put_device(&mem->dev);
> +                       continue;
> +               }
> +               ret = init_memory_block(&mem, __pfn_to_section(pfn),
> +                                       MEM_OFFLINE);
>                 if (ret)
> -                       goto out;
> -               mem->section_count++;
> +                       break;
> +               mem->section_count = memory_block_size_bytes() /
> +                                    MIN_MEMORY_BLOCK_SIZE;
> +       }
> +       if (ret) {
> +               end_pfn = pfn;
> +               for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +                       mem = find_memory_block(__pfn_to_section(pfn));
> +                       if (!mem)
> +                               continue;
> +                       mem->section_count = 0;
> +                       unregister_memory(mem);
> +               }
>         }
> -
> -out:
>         mutex_unlock(&mem_sysfs_mutex);
>         return ret;
>  }
>
> -static void
> -unregister_memory(struct memory_block *memory)
> -{
> -       BUG_ON(memory->dev.bus != &memory_subsys);
> -
> -       /* drop the ref. we got via find_memory_block() */
> -       put_device(&memory->dev);
> -       device_unregister(&memory->dev);
> -}
> -
> -void unregister_memory_section(struct mem_section *section)
> +static int remove_memory_section(struct mem_section *section)
>  {
>         struct memory_block *mem;
>
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index 474c7c60c8f2..95505fbb5f85 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -111,7 +111,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
>  extern void unregister_memory_notifier(struct notifier_block *nb);
>  extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
> -int hotplug_memory_register(int nid, struct mem_section *section);
> +int hotplug_memory_register(unsigned long start, unsigned long size);
>  extern void unregister_memory_section(struct mem_section *);
>  extern int memory_dev_init(void);
>  extern int memory_notify(unsigned long val, void *v);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 7b5439839d67..e1637c8a0723 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -258,13 +258,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>                 return -EEXIST;
>
>         ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
> -       if (ret < 0)
> -               return ret;
> -
> -       if (!want_memblock)
> -               return 0;
> -
> -       return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
> +       return ret < 0 ? ret : 0;
>  }
>
>  /*
> @@ -1106,6 +1100,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
>         if (ret < 0)
>                 goto error;
>
> +       /* create memory block devices after memory was added */
> +       ret = hotplug_memory_register(start, size);
> +       if (ret) {
> +               arch_remove_memory(nid, start, size, NULL);
> +               goto error;
> +       }
> +
>         if (new_node) {
>                 /* If sysfs file of new node can't be created, cpu on the node
>                  * can't be hot-added. There is no rollback way now.
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-07 18:38 ` [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API David Hildenbrand
@ 2019-05-07 21:19   ` Dan Williams
  2019-05-07 21:24     ` David Hildenbrand
  0 siblings, 1 reply; 35+ messages in thread
From: Dan Williams @ 2019-05-07 21:19 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai,
	Arun KS, Mathieu Malaterre

On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> No longer needed, the callers of arch_add_memory() can handle this
> manually.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/linux/memory_hotplug.h | 8 --------
>  mm/memory_hotplug.c            | 9 +++------
>  2 files changed, 3 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 2d4de313926d..2f1f87e13baa 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
>                            unsigned long nr_pages, struct vmem_altmap *altmap);
>
> -/*
> - * Do we want sysfs memblock files created. This will allow userspace to online
> - * and offline memory explicitly. Lack of this bit means that the caller has to
> - * call move_pfn_range_to_zone to finish the initialization.
> - */
> -
> -#define MHP_MEMBLOCK_API               (1<<0)
> -
>  /* reasonably generic interface to expand the physical pages */
>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>                        struct mhp_restrictions *restrictions);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index e1637c8a0723..107f72952347 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
>  #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>
>  static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> -               struct vmem_altmap *altmap, bool want_memblock)
> +                                  struct vmem_altmap *altmap)
>  {
>         int ret;
>
> @@ -293,8 +293,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>         }
>
>         for (i = start_sec; i <= end_sec; i++) {
> -               err = __add_section(nid, section_nr_to_pfn(i), altmap,
> -                               restrictions->flags & MHP_MEMBLOCK_API);
> +               err = __add_section(nid, section_nr_to_pfn(i), altmap);
>
>                 /*
>                  * EEXIST is finally dealt with by ioresource collision
> @@ -1066,9 +1065,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>   */
>  int __ref add_memory_resource(int nid, struct resource *res)
>  {
> -       struct mhp_restrictions restrictions = {
> -               .flags = MHP_MEMBLOCK_API,
> -       };
> +       struct mhp_restrictions restrictions = {};

With mhp_restrictions.flags no longer needed, can we drop
mhp_restrictions and just go back to a plain @altmap argument?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-07 21:19   ` Dan Williams
@ 2019-05-07 21:24     ` David Hildenbrand
  2019-05-07 21:25       ` Dan Williams
  0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2019-05-07 21:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai,
	Arun KS, Mathieu Malaterre

On 07.05.19 23:19, Dan Williams wrote:
> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> No longer needed, the callers of arch_add_memory() can handle this
>> manually.
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Oscar Salvador <osalvador@suse.com>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> Cc: Qian Cai <cai@lca.pw>
>> Cc: Arun KS <arunks@codeaurora.org>
>> Cc: Mathieu Malaterre <malat@debian.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  include/linux/memory_hotplug.h | 8 --------
>>  mm/memory_hotplug.c            | 9 +++------
>>  2 files changed, 3 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>> index 2d4de313926d..2f1f87e13baa 100644
>> --- a/include/linux/memory_hotplug.h
>> +++ b/include/linux/memory_hotplug.h
>> @@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
>>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
>>                            unsigned long nr_pages, struct vmem_altmap *altmap);
>>
>> -/*
>> - * Do we want sysfs memblock files created. This will allow userspace to online
>> - * and offline memory explicitly. Lack of this bit means that the caller has to
>> - * call move_pfn_range_to_zone to finish the initialization.
>> - */
>> -
>> -#define MHP_MEMBLOCK_API               (1<<0)
>> -
>>  /* reasonably generic interface to expand the physical pages */
>>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>>                        struct mhp_restrictions *restrictions);
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index e1637c8a0723..107f72952347 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
>>  #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>>
>>  static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>> -               struct vmem_altmap *altmap, bool want_memblock)
>> +                                  struct vmem_altmap *altmap)
>>  {
>>         int ret;
>>
>> @@ -293,8 +293,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>>         }
>>
>>         for (i = start_sec; i <= end_sec; i++) {
>> -               err = __add_section(nid, section_nr_to_pfn(i), altmap,
>> -                               restrictions->flags & MHP_MEMBLOCK_API);
>> +               err = __add_section(nid, section_nr_to_pfn(i), altmap);
>>
>>                 /*
>>                  * EEXIST is finally dealt with by ioresource collision
>> @@ -1066,9 +1065,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>>   */
>>  int __ref add_memory_resource(int nid, struct resource *res)
>>  {
>> -       struct mhp_restrictions restrictions = {
>> -               .flags = MHP_MEMBLOCK_API,
>> -       };
>> +       struct mhp_restrictions restrictions = {};
> 
> With mhp_restrictions.flags no longer needed, can we drop
> mhp_restrictions and just go back to a plain @altmap argument?
> 

Oscar wants to use it to configure from where to allocate vmemmaps. That
was the original driver behind it.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-07 21:24     ` David Hildenbrand
@ 2019-05-07 21:25       ` Dan Williams
  2019-05-08  7:39         ` David Hildenbrand
  0 siblings, 1 reply; 35+ messages in thread
From: Dan Williams @ 2019-05-07 21:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai,
	Arun KS, Mathieu Malaterre

On Tue, May 7, 2019 at 2:24 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.05.19 23:19, Dan Williams wrote:
> > On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> No longer needed, the callers of arch_add_memory() can handle this
> >> manually.
> >>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: David Hildenbrand <david@redhat.com>
> >> Cc: Michal Hocko <mhocko@suse.com>
> >> Cc: Oscar Salvador <osalvador@suse.com>
> >> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> >> Cc: Wei Yang <richard.weiyang@gmail.com>
> >> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >> Cc: Qian Cai <cai@lca.pw>
> >> Cc: Arun KS <arunks@codeaurora.org>
> >> Cc: Mathieu Malaterre <malat@debian.org>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  include/linux/memory_hotplug.h | 8 --------
> >>  mm/memory_hotplug.c            | 9 +++------
> >>  2 files changed, 3 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> >> index 2d4de313926d..2f1f87e13baa 100644
> >> --- a/include/linux/memory_hotplug.h
> >> +++ b/include/linux/memory_hotplug.h
> >> @@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
> >>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
> >>                            unsigned long nr_pages, struct vmem_altmap *altmap);
> >>
> >> -/*
> >> - * Do we want sysfs memblock files created. This will allow userspace to online
> >> - * and offline memory explicitly. Lack of this bit means that the caller has to
> >> - * call move_pfn_range_to_zone to finish the initialization.
> >> - */
> >> -
> >> -#define MHP_MEMBLOCK_API               (1<<0)
> >> -
> >>  /* reasonably generic interface to expand the physical pages */
> >>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> >>                        struct mhp_restrictions *restrictions);
> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> >> index e1637c8a0723..107f72952347 100644
> >> --- a/mm/memory_hotplug.c
> >> +++ b/mm/memory_hotplug.c
> >> @@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
> >>  #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
> >>
> >>  static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> >> -               struct vmem_altmap *altmap, bool want_memblock)
> >> +                                  struct vmem_altmap *altmap)
> >>  {
> >>         int ret;
> >>
> >> @@ -293,8 +293,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
> >>         }
> >>
> >>         for (i = start_sec; i <= end_sec; i++) {
> >> -               err = __add_section(nid, section_nr_to_pfn(i), altmap,
> >> -                               restrictions->flags & MHP_MEMBLOCK_API);
> >> +               err = __add_section(nid, section_nr_to_pfn(i), altmap);
> >>
> >>                 /*
> >>                  * EEXIST is finally dealt with by ioresource collision
> >> @@ -1066,9 +1065,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
> >>   */
> >>  int __ref add_memory_resource(int nid, struct resource *res)
> >>  {
> >> -       struct mhp_restrictions restrictions = {
> >> -               .flags = MHP_MEMBLOCK_API,
> >> -       };
> >> +       struct mhp_restrictions restrictions = {};
> >
> > With mhp_restrictions.flags no longer needed, can we drop
> > mhp_restrictions and just go back to a plain @altmap argument?
> >
>
> Oscar wants to use it to configure from where to allocate vmemmaps. That
> was the original driver behind it.
>

Ah, ok, makes sense.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 6/8] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
  2019-05-07 18:38 ` [PATCH v2 6/8] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
@ 2019-05-07 21:27   ` Dan Williams
  0 siblings, 0 replies; 35+ messages in thread
From: Dan Williams @ 2019-05-07 21:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Andrew Banman, Ingo Molnar,
	Alex Deucher, David S. Miller, Mark Brown, Chris Wilson,
	Oscar Salvador, Jonathan Cameron, Michal Hocko, Pavel Tatashin,
	Arun KS, Mathieu Malaterre

On Tue, May 7, 2019 at 11:39 AM David Hildenbrand <david@redhat.com> wrote:
>
> Let's factor out removing of memory block devices, which is only
> necessary for memory added via add_memory() and friends that created
> memory block devices. Remove the devices before calling
> arch_remove_memory().
>
> This finishes factoring out memory block device handling from
> arch_add_memory() and arch_remove_memory().

Also nice! makes it easier in the future for the "device-memory" use
case to not avoid messing up the typical memory hotplug flow.

>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/memory.c  | 39 +++++++++++++++++++--------------------
>  drivers/base/node.c    | 11 ++++++-----
>  include/linux/memory.h |  2 +-
>  include/linux/node.h   |  6 ++----
>  mm/memory_hotplug.c    |  5 +++--
>  5 files changed, 31 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 862c202a18ca..47ff49058d1f 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -756,32 +756,31 @@ int hotplug_memory_register(unsigned long start, unsigned long size)
>         return ret;
>  }
>
> -static int remove_memory_section(struct mem_section *section)
> +/*
> + * Remove memory block devices for the given memory area. Start and size
> + * have to be aligned to memory block granularity. Memory block devices
> + * have to be offline.
> + */
> +void hotplug_memory_unregister(unsigned long start, unsigned long size)
>  {
> +       unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
> +       unsigned long start_pfn = PFN_DOWN(start);
> +       unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>         struct memory_block *mem;
> +       unsigned long pfn;
>
> -       if (WARN_ON_ONCE(!present_section(section)))
> -               return;
> +       BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
> +       BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));

Similar BUG_ON vs comments WARN_ON comments as the previous patch.

>
>         mutex_lock(&mem_sysfs_mutex);
> -
> -       /*
> -        * Some users of the memory hotplug do not want/need memblock to
> -        * track all sections. Skip over those.
> -        */
> -       mem = find_memory_block(section);
> -       if (!mem)
> -               goto out_unlock;
> -
> -       unregister_mem_sect_under_nodes(mem, __section_nr(section));
> -
> -       mem->section_count--;
> -       if (mem->section_count == 0)
> +       for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +               mem = find_memory_block(__pfn_to_section(pfn));
> +               if (!mem)
> +                       continue;
> +               mem->section_count = 0;
> +               unregister_memory_block_under_nodes(mem);
>                 unregister_memory(mem);
> -       else
> -               put_device(&mem->dev);
> -
> -out_unlock:
> +       }
>         mutex_unlock(&mem_sysfs_mutex);
>  }
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 8598fcbd2a17..04fdfa99b8bc 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -801,9 +801,10 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
>         return 0;
>  }
>
> -/* unregister memory section under all nodes that it spans */
> -int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -                                   unsigned long phys_index)
> +/*
> + * Unregister memory block device under all nodes that it spans.
> + */
> +int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  {
>         NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
>         unsigned long pfn, sect_start_pfn, sect_end_pfn;
> @@ -816,8 +817,8 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>                 return -ENOMEM;
>         nodes_clear(*unlinked_nodes);
>
> -       sect_start_pfn = section_nr_to_pfn(phys_index);
> -       sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
> +       sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
> +       sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
>         for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
>                 int nid;
>
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index 95505fbb5f85..aa236c2a0466 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -112,7 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
>  extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>  int hotplug_memory_register(unsigned long start, unsigned long size);
> -extern void unregister_memory_section(struct mem_section *);
> +void hotplug_memory_unregister(unsigned long start, unsigned long size);
>  extern int memory_dev_init(void);
>  extern int memory_notify(unsigned long val, void *v);
>  extern int memory_isolate_notify(unsigned long val, void *v);
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 1a557c589ecb..02a29e71b175 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -139,8 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
>  extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
>  extern int register_mem_sect_under_node(struct memory_block *mem_blk,
>                                                 void *arg);
> -extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -                                          unsigned long phys_index);
> +extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
>
>  extern int register_memory_node_under_compute_node(unsigned int mem_nid,
>                                                    unsigned int cpu_nid,
> @@ -176,8 +175,7 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
>  {
>         return 0;
>  }
> -static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -                                                 unsigned long phys_index)
> +static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  {
>         return 0;
>  }
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 107f72952347..527fe4f9c620 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -519,8 +519,6 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
>         if (WARN_ON_ONCE(!valid_section(ms)))
>                 return;
>
> -       unregister_memory_section(ms);
> -
>         scn_nr = __section_nr(ms);
>         start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
>         __remove_zone(zone, start_pfn);
> @@ -1844,6 +1842,9 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
>         memblock_free(start, size);
>         memblock_remove(start, size);
>
> +       /* remove memory block devices before removing memory */
> +       hotplug_memory_unregister(start, size);
> +
>         arch_remove_memory(nid, start, size, NULL);
>         __release_memory_resource(start, size);

Other than the BUG_ON concern you can add

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-07 21:17   ` Dan Williams
@ 2019-05-07 21:27     ` David Hildenbrand
  0 siblings, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2019-05-07 21:27 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang,
	Arun KS, Mathieu Malaterre

>> +static void unregister_memory(struct memory_block *memory)
>> +{
>> +       BUG_ON(memory->dev.bus != &memory_subsys);
> 
> Given this should never happen and only a future kernel developer
> might trip over it, do we really need to kill that developer's
> machine? I.e. s/BUG/WARN/? I guess an argument can be made to move
> such a change that to a follow-on patch since you're just preserving
> existing behavior, but I figure might as well address these as the
> code is refactored.

I assume only

if (WARN ...)
	return;

makes sense then, right?

> 
>> +
>> +       /* drop the ref. we got via find_memory_block() */
>> +       put_device(&memory->dev);
>> +       device_unregister(&memory->dev);
>> +}
>> +
>>  /*
>> - * need an interface for the VM to add new memory regions,
>> - * but without onlining it.
>> + * Create memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * will be initialized as offline.
>>   */
>> -int hotplug_memory_register(int nid, struct mem_section *section)
>> +int hotplug_memory_register(unsigned long start, unsigned long size)
>>  {
>> -       int ret = 0;
>> +       unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
>> +       unsigned long start_pfn = PFN_DOWN(start);
>> +       unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>> +       unsigned long pfn;
>>         struct memory_block *mem;
>> +       int ret = 0;
>>
>> -       mutex_lock(&mem_sysfs_mutex);
>> +       BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
>> +       BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
> 
> Perhaps:
> 
>     if (WARN_ON(...))
>         return -EINVAL;
> 

Yes, guess this souldn't hurt.

>>
>> -       mem = find_memory_block(section);
>> -       if (mem) {
>> -               mem->section_count++;
>> -               put_device(&mem->dev);
>> -       } else {
>> -               ret = init_memory_block(&mem, section, MEM_OFFLINE);
>> +       mutex_lock(&mem_sysfs_mutex);
>> +       for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
>> +               mem = find_memory_block(__pfn_to_section(pfn));
>> +               if (mem) {
>> +                       WARN_ON_ONCE(false);
> 
> ?? Isn't that a nop?

Yes, that makes no sense :)

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-05-07 18:38 ` [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
@ 2019-05-08  0:15   ` Dan Williams
  2019-05-08  7:21     ` David Hildenbrand
  0 siblings, 1 reply; 35+ messages in thread
From: Dan Williams @ 2019-05-08  0:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Oscar Salvador, Jonathan Cameron

On Tue, May 7, 2019 at 11:39 AM David Hildenbrand <david@redhat.com> wrote:
>
> We really don't want anything during memory hotunplug to fail.
> We always pass a valid memory block device, that check can go. Avoid
> allocating memory and eventually failing. As we are always called under
> lock, we can use a static piece of memory. This avoids having to put
> the structure onto the stack, having to guess about the stack size
> of callers.
>
> Patch inspired by a patch from Oscar Salvador.
>
> In the future, there might be no need to iterate over nodes at all.
> mem->nid should tell us exactly what to remove. Memory block devices
> with mixed nodes (added during boot) should properly fenced off and never
> removed.
>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/node.c  | 18 +++++-------------
>  include/linux/node.h |  5 ++---
>  2 files changed, 7 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 04fdfa99b8bc..9be88fd05147 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
>
>  /*
>   * Unregister memory block device under all nodes that it spans.
> + * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).

Given this comment can bitrot relative to the implementation lets
instead add an explicit:

    lockdep_assert_held(&mem_sysfs_mutex);

With that you can add:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 8/8] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section
  2019-05-07 18:38 ` [PATCH v2 8/8] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
@ 2019-05-08  0:30   ` Dan Williams
  0 siblings, 0 replies; 35+ messages in thread
From: Dan Williams @ 2019-05-08  0:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton

On Tue, May 7, 2019 at 11:39 AM David Hildenbrand <david@redhat.com> wrote:
>
> Unused, and memory unplug path should never care about zones. This is
> the job of memory offlining. ZONE_DEVICE might require special care -
> the caller of arch_remove_memory() should handle this.

The ZONE_DEVICE usage does not require special care so you can drop
that comment. The only place it's used in the subsection patches is to
lookup the node-id, but it turns out that the resulting node-id is
then never used.

With the ZONE_DEVICE mention dropped out of changelog you can add:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-05-08  0:15   ` Dan Williams
@ 2019-05-08  7:21     ` David Hildenbrand
  2019-05-08 13:50       ` Dan Williams
  0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2019-05-08  7:21 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Oscar Salvador, Jonathan Cameron


>>  drivers/base/node.c  | 18 +++++-------------
>>  include/linux/node.h |  5 ++---
>>  2 files changed, 7 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index 04fdfa99b8bc..9be88fd05147 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
>>
>>  /*
>>   * Unregister memory block device under all nodes that it spans.
>> + * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).
> 
> Given this comment can bitrot relative to the implementation lets
> instead add an explicit:
> 
>     lockdep_assert_held(&mem_sysfs_mutex);

That would require to make the mutex non-static. Is that what you
suggest, or any other alternative?

Thanks Dan!

> 
> With that you can add:
> 
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-07 21:25       ` Dan Williams
@ 2019-05-08  7:39         ` David Hildenbrand
  2019-05-08 23:08           ` osalvador
  0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2019-05-08  7:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai,
	Arun KS, Mathieu Malaterre

On 07.05.19 23:25, Dan Williams wrote:
> On Tue, May 7, 2019 at 2:24 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 07.05.19 23:19, Dan Williams wrote:
>>> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> No longer needed, the callers of arch_add_memory() can handle this
>>>> manually.
>>>>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>> Cc: Oscar Salvador <osalvador@suse.com>
>>>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>>>> Cc: Wei Yang <richard.weiyang@gmail.com>
>>>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>> Cc: Qian Cai <cai@lca.pw>
>>>> Cc: Arun KS <arunks@codeaurora.org>
>>>> Cc: Mathieu Malaterre <malat@debian.org>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  include/linux/memory_hotplug.h | 8 --------
>>>>  mm/memory_hotplug.c            | 9 +++------
>>>>  2 files changed, 3 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>>>> index 2d4de313926d..2f1f87e13baa 100644
>>>> --- a/include/linux/memory_hotplug.h
>>>> +++ b/include/linux/memory_hotplug.h
>>>> @@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
>>>>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
>>>>                            unsigned long nr_pages, struct vmem_altmap *altmap);
>>>>
>>>> -/*
>>>> - * Do we want sysfs memblock files created. This will allow userspace to online
>>>> - * and offline memory explicitly. Lack of this bit means that the caller has to
>>>> - * call move_pfn_range_to_zone to finish the initialization.
>>>> - */
>>>> -
>>>> -#define MHP_MEMBLOCK_API               (1<<0)
>>>> -
>>>>  /* reasonably generic interface to expand the physical pages */
>>>>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>>>>                        struct mhp_restrictions *restrictions);
>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>> index e1637c8a0723..107f72952347 100644
>>>> --- a/mm/memory_hotplug.c
>>>> +++ b/mm/memory_hotplug.c
>>>> @@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
>>>>  #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>>>>
>>>>  static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>>>> -               struct vmem_altmap *altmap, bool want_memblock)
>>>> +                                  struct vmem_altmap *altmap)
>>>>  {
>>>>         int ret;
>>>>
>>>> @@ -293,8 +293,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>>>>         }
>>>>
>>>>         for (i = start_sec; i <= end_sec; i++) {
>>>> -               err = __add_section(nid, section_nr_to_pfn(i), altmap,
>>>> -                               restrictions->flags & MHP_MEMBLOCK_API);
>>>> +               err = __add_section(nid, section_nr_to_pfn(i), altmap);
>>>>
>>>>                 /*
>>>>                  * EEXIST is finally dealt with by ioresource collision
>>>> @@ -1066,9 +1065,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>>>>   */
>>>>  int __ref add_memory_resource(int nid, struct resource *res)
>>>>  {
>>>> -       struct mhp_restrictions restrictions = {
>>>> -               .flags = MHP_MEMBLOCK_API,
>>>> -       };
>>>> +       struct mhp_restrictions restrictions = {};
>>>
>>> With mhp_restrictions.flags no longer needed, can we drop
>>> mhp_restrictions and just go back to a plain @altmap argument?
>>>
>>
>> Oscar wants to use it to configure from where to allocate vmemmaps. That
>> was the original driver behind it.
>>
> 
> Ah, ok, makes sense.
> 

However I haven't really thought it through yet, smalles like that could
as well just be handled by the caller of
arch_add_memory()/arch_remove_memory() eventually, passing it via
something like the altmap parameter.

Let's leave it in place for now, we can talk about that once we have new
patches from Oscar.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
  2019-05-07 21:17   ` Dan Williams
@ 2019-05-08  8:35   ` David Hildenbrand
  2019-05-09 12:43   ` Wei Yang
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2019-05-08  8:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	akpm, Dan Williams, Greg Kroah-Hartman, Rafael J. Wysocki,
	mike.travis, Ingo Molnar, Andrew Banman, Oscar Salvador,
	Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang, Arun KS,
	Mathieu Malaterre

On 07.05.19 20:38, David Hildenbrand wrote:
> Only memory to be added to the buddy and to be onlined/offlined by
> user space using memory block devices needs (and should have!) memory
> block devices.
> 
> Factor out creation of memory block devices Create all devices after
> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
> because it is now effectively stale.
> 
> Only after memory block devices have been added, memory can be onlined
> by user space. This implies, that memory is not visible to user space at
> all before arch_add_memory() succeeded.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
>  include/linux/memory.h |  2 +-
>  mm/memory_hotplug.c    | 15 ++++-----
>  3 files changed, 53 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 6e0cb4fda179..862c202a18ca 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
>  	return 0;
>  }
>  
> +static void unregister_memory(struct memory_block *memory)
> +{
> +	BUG_ON(memory->dev.bus != &memory_subsys);
> +
> +	/* drop the ref. we got via find_memory_block() */
> +	put_device(&memory->dev);
> +	device_unregister(&memory->dev);
> +}
> +
>  /*
> - * need an interface for the VM to add new memory regions,
> - * but without onlining it.
> + * Create memory block devices for the given memory area. Start and size
> + * have to be aligned to memory block granularity. Memory block devices
> + * will be initialized as offline.
>   */
> -int hotplug_memory_register(int nid, struct mem_section *section)
> +int hotplug_memory_register(unsigned long start, unsigned long size)
>  {
> -	int ret = 0;
> +	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
> +	unsigned long start_pfn = PFN_DOWN(start);
> +	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
> +	unsigned long pfn;
>  	struct memory_block *mem;
> +	int ret = 0;
>  
> -	mutex_lock(&mem_sysfs_mutex);
> +	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
> +	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
>  
> -	mem = find_memory_block(section);
> -	if (mem) {
> -		mem->section_count++;
> -		put_device(&mem->dev);
> -	} else {
> -		ret = init_memory_block(&mem, section, MEM_OFFLINE);
> +	mutex_lock(&mem_sysfs_mutex);
> +	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +		mem = find_memory_block(__pfn_to_section(pfn));
> +		if (mem) {
> +			WARN_ON_ONCE(false);
> +			put_device(&mem->dev);
> +			continue;
> +		}
> +		ret = init_memory_block(&mem, __pfn_to_section(pfn),
> +					MEM_OFFLINE);
>  		if (ret)
> -			goto out;
> -		mem->section_count++;
> +			break;
> +		mem->section_count = memory_block_size_bytes() /
> +				     MIN_MEMORY_BLOCK_SIZE;
> +	}
> +	if (ret) {
> +		end_pfn = pfn;
> +		for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +			mem = find_memory_block(__pfn_to_section(pfn));
> +			if (!mem)
> +				continue;
> +			mem->section_count = 0;
> +			unregister_memory(mem);
> +		}
>  	}
> -
> -out:
>  	mutex_unlock(&mem_sysfs_mutex);
>  	return ret;
>  }
>  
> -static void
> -unregister_memory(struct memory_block *memory)
> -{
> -	BUG_ON(memory->dev.bus != &memory_subsys);
> -
> -	/* drop the ref. we got via find_memory_block() */
> -	put_device(&memory->dev);
> -	device_unregister(&memory->dev);
> -}
> -
> -void unregister_memory_section(struct mem_section *section)
> +static int remove_memory_section(struct mem_section *section)
>  {

The function change is misplaces in this patch will drop it so this
patch compiles without the other patches.


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
  2019-05-08  7:21     ` David Hildenbrand
@ 2019-05-08 13:50       ` Dan Williams
  0 siblings, 0 replies; 35+ messages in thread
From: Dan Williams @ 2019-05-08 13:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alex Deucher, David S. Miller, Mark Brown,
	Chris Wilson, Oscar Salvador, Jonathan Cameron

On Wed, May 8, 2019 at 12:22 AM David Hildenbrand <david@redhat.com> wrote:
>
>
> >>  drivers/base/node.c  | 18 +++++-------------
> >>  include/linux/node.h |  5 ++---
> >>  2 files changed, 7 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/base/node.c b/drivers/base/node.c
> >> index 04fdfa99b8bc..9be88fd05147 100644
> >> --- a/drivers/base/node.c
> >> +++ b/drivers/base/node.c
> >> @@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
> >>
> >>  /*
> >>   * Unregister memory block device under all nodes that it spans.
> >> + * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).
> >
> > Given this comment can bitrot relative to the implementation lets
> > instead add an explicit:
> >
> >     lockdep_assert_held(&mem_sysfs_mutex);
>
> That would require to make the mutex non-static. Is that what you
> suggest, or any other alternative?

If the concern is other code paths taking the lock when they shouldn't
then you could make a public "lockdep_assert_mem_sysfs_held()" to do
the same, but I otherwise think the benefit of inline lock validation
is worth the price of adding a new non-static symbol.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-08  7:39         ` David Hildenbrand
@ 2019-05-08 23:08           ` osalvador
  2019-05-09  7:05             ` David Hildenbrand
  0 siblings, 1 reply; 35+ messages in thread
From: osalvador @ 2019-05-08 23:08 UTC (permalink / raw)
  To: David Hildenbrand, Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Michal Hocko,
	Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai, Arun KS,
	Mathieu Malaterre

On Wed, 2019-05-08 at 09:39 +0200, David Hildenbrand wrote:
> However I haven't really thought it through yet, smalles like that
> could
> as well just be handled by the caller of
> arch_add_memory()/arch_remove_memory() eventually, passing it via
> something like the altmap parameter.
> 
> Let's leave it in place for now, we can talk about that once we have
> new
> patches from Oscar.
Hi David,

I plan to send a new patchset once this is and Dan's are merged,
otherwise I will have a mayhem with the conflicts.

I also plan/want to review this patchset, but time is tight this week.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
  2019-05-08 23:08           ` osalvador
@ 2019-05-09  7:05             ` David Hildenbrand
  0 siblings, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2019-05-09  7:05 UTC (permalink / raw)
  To: osalvador, Dan Williams
  Cc: Linux MM, Linux Kernel Mailing List, linux-ia64, linuxppc-dev,
	linux-s390, Linux-sh, Andrew Morton, Michal Hocko,
	Pavel Tatashin, Wei Yang, Joonsoo Kim, Qian Cai, Arun KS,
	Mathieu Malaterre

On 09.05.19 01:08, osalvador wrote:
> On Wed, 2019-05-08 at 09:39 +0200, David Hildenbrand wrote:
>> However I haven't really thought it through yet, smalles like that
>> could
>> as well just be handled by the caller of
>> arch_add_memory()/arch_remove_memory() eventually, passing it via
>> something like the altmap parameter.
>>
>> Let's leave it in place for now, we can talk about that once we have
>> new
>> patches from Oscar.
> Hi David,
> 
> I plan to send a new patchset once this is and Dan's are merged,
> otherwise I will have a mayhem with the conflicts.
> 
> I also plan/want to review this patchset, but time is tight this week.
> 

Sure, take your time. I'll resend this patch set most probably tomorrow
or early next week. Cheers!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
  2019-05-07 18:37 ` [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
  2019-05-07 20:38   ` Dan Williams
@ 2019-05-09 12:23   ` Wei Yang
  1 sibling, 0 replies; 35+ messages in thread
From: Wei Yang @ 2019-05-09 12:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Oscar Salvador, Michal Hocko,
	Pavel Tatashin, Qian Cai, Wei Yang, Arun KS, Mathieu Malaterre

On Tue, May 07, 2019 at 08:37:57PM +0200, David Hildenbrand wrote:
>By converting start and size to page granularity, we actually ignore
>unaligned parts within a page instead of properly bailing out with an
>error.
>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: Qian Cai <cai@lca.pw>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>


-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
  2019-05-07 21:17   ` Dan Williams
  2019-05-08  8:35   ` David Hildenbrand
@ 2019-05-09 12:43   ` Wei Yang
  2019-05-09 12:50     ` David Hildenbrand
  2019-05-09 13:55   ` Wei Yang
  2019-05-09 14:31   ` Wei Yang
  4 siblings, 1 reply; 35+ messages in thread
From: Wei Yang @ 2019-05-09 12:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang,
	Arun KS, Mathieu Malaterre

On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>Only memory to be added to the buddy and to be onlined/offlined by
>user space using memory block devices needs (and should have!) memory
>block devices.
>
>Factor out creation of memory block devices Create all devices after
>arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>because it is now effectively stale.
>
>Only after memory block devices have been added, memory can be onlined
>by user space. This implies, that memory is not visible to user space at
>all before arch_add_memory() succeeded.
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: Qian Cai <cai@lca.pw>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
> include/linux/memory.h |  2 +-
> mm/memory_hotplug.c    | 15 ++++-----
> 3 files changed, 53 insertions(+), 34 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 6e0cb4fda179..862c202a18ca 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
> 	return 0;
> }
> 
>+static void unregister_memory(struct memory_block *memory)
>+{
>+	BUG_ON(memory->dev.bus != &memory_subsys);
>+
>+	/* drop the ref. we got via find_memory_block() */
>+	put_device(&memory->dev);
>+	device_unregister(&memory->dev);
>+}
>+
> /*
>- * need an interface for the VM to add new memory regions,
>- * but without onlining it.
>+ * Create memory block devices for the given memory area. Start and size
>+ * have to be aligned to memory block granularity. Memory block devices
>+ * will be initialized as offline.
>  */
>-int hotplug_memory_register(int nid, struct mem_section *section)
>+int hotplug_memory_register(unsigned long start, unsigned long size)
> {
>-	int ret = 0;
>+	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
>+	unsigned long start_pfn = PFN_DOWN(start);
>+	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>+	unsigned long pfn;
> 	struct memory_block *mem;
>+	int ret = 0;
> 
>-	mutex_lock(&mem_sysfs_mutex);
>+	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
>+	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));

After this change, the call flow looks like this:

add_memory_resource
    check_hotplug_memory_range
    hotplug_memory_register

Since in check_hotplug_memory_range() has checked the boundary, do we need to
check here again?

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-09 12:43   ` Wei Yang
@ 2019-05-09 12:50     ` David Hildenbrand
  0 siblings, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2019-05-09 12:50 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre

On 09.05.19 14:43, Wei Yang wrote:
> On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>> Only memory to be added to the buddy and to be onlined/offlined by
>> user space using memory block devices needs (and should have!) memory
>> block devices.
>>
>> Factor out creation of memory block devices Create all devices after
>> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>> because it is now effectively stale.
>>
>> Only after memory block devices have been added, memory can be onlined
>> by user space. This implies, that memory is not visible to user space at
>> all before arch_add_memory() succeeded.
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Qian Cai <cai@lca.pw>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Arun KS <arunks@codeaurora.org>
>> Cc: Mathieu Malaterre <malat@debian.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
>> include/linux/memory.h |  2 +-
>> mm/memory_hotplug.c    | 15 ++++-----
>> 3 files changed, 53 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 6e0cb4fda179..862c202a18ca 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
>> 	return 0;
>> }
>>
>> +static void unregister_memory(struct memory_block *memory)
>> +{
>> +	BUG_ON(memory->dev.bus != &memory_subsys);
>> +
>> +	/* drop the ref. we got via find_memory_block() */
>> +	put_device(&memory->dev);
>> +	device_unregister(&memory->dev);
>> +}
>> +
>> /*
>> - * need an interface for the VM to add new memory regions,
>> - * but without onlining it.
>> + * Create memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * will be initialized as offline.
>>  */
>> -int hotplug_memory_register(int nid, struct mem_section *section)
>> +int hotplug_memory_register(unsigned long start, unsigned long size)
>> {
>> -	int ret = 0;
>> +	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
>> +	unsigned long start_pfn = PFN_DOWN(start);
>> +	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>> +	unsigned long pfn;
>> 	struct memory_block *mem;
>> +	int ret = 0;
>>
>> -	mutex_lock(&mem_sysfs_mutex);
>> +	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
>> +	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
> 
> After this change, the call flow looks like this:
> 
> add_memory_resource
>     check_hotplug_memory_range
>     hotplug_memory_register
> 
> Since in check_hotplug_memory_range() has checked the boundary, do we need to
> check here again?
> 

I prefer to check for such requirements explicitly in applicable places,
especially if they are placed in different files. Makes code easier to
get. WARN_ON_ONCE will indicate that this has to be assured by the caller.

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
                     ` (2 preceding siblings ...)
  2019-05-09 12:43   ` Wei Yang
@ 2019-05-09 13:55   ` Wei Yang
  2019-05-09 14:05     ` David Hildenbrand
  2019-05-09 14:31   ` Wei Yang
  4 siblings, 1 reply; 35+ messages in thread
From: Wei Yang @ 2019-05-09 13:55 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang,
	Arun KS, Mathieu Malaterre

On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>Only memory to be added to the buddy and to be onlined/offlined by
>user space using memory block devices needs (and should have!) memory
>block devices.
>
>Factor out creation of memory block devices Create all devices after
>arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>because it is now effectively stale.
>
>Only after memory block devices have been added, memory can be onlined
>by user space. This implies, that memory is not visible to user space at
>all before arch_add_memory() succeeded.
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: Qian Cai <cai@lca.pw>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
> include/linux/memory.h |  2 +-
> mm/memory_hotplug.c    | 15 ++++-----
> 3 files changed, 53 insertions(+), 34 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 6e0cb4fda179..862c202a18ca 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
> 	return 0;
> }
> 
>+static void unregister_memory(struct memory_block *memory)
>+{
>+	BUG_ON(memory->dev.bus != &memory_subsys);
>+
>+	/* drop the ref. we got via find_memory_block() */
>+	put_device(&memory->dev);
>+	device_unregister(&memory->dev);
>+}
>+
> /*
>- * need an interface for the VM to add new memory regions,
>- * but without onlining it.
>+ * Create memory block devices for the given memory area. Start and size
>+ * have to be aligned to memory block granularity. Memory block devices
>+ * will be initialized as offline.
>  */
>-int hotplug_memory_register(int nid, struct mem_section *section)
>+int hotplug_memory_register(unsigned long start, unsigned long size)
> {
>-	int ret = 0;
>+	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
>+	unsigned long start_pfn = PFN_DOWN(start);
>+	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>+	unsigned long pfn;
> 	struct memory_block *mem;
>+	int ret = 0;
> 
>-	mutex_lock(&mem_sysfs_mutex);
>+	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
>+	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
> 
>-	mem = find_memory_block(section);
>-	if (mem) {
>-		mem->section_count++;
>-		put_device(&mem->dev);
>-	} else {
>-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
>+	mutex_lock(&mem_sysfs_mutex);
>+	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
>+		mem = find_memory_block(__pfn_to_section(pfn));
>+		if (mem) {
>+			WARN_ON_ONCE(false);

One question here, the purpose of WARN_ON_ONCE(false) is? Would we trigger
this?

>+			put_device(&mem->dev);
>+			continue;
>+		}
>+		ret = init_memory_block(&mem, __pfn_to_section(pfn),
>+					MEM_OFFLINE);
> 		if (ret)
>-			goto out;
>-		mem->section_count++;
>+			break;
>+		mem->section_count = memory_block_size_bytes() /
>+				     MIN_MEMORY_BLOCK_SIZE;

Maybe we can leverage sections_per_block variable.

                mem->section_count = sections_per_block;

>+	}
>+	if (ret) {
>+		end_pfn = pfn;
>+		for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
>+			mem = find_memory_block(__pfn_to_section(pfn));
>+			if (!mem)
>+				continue;
>+			mem->section_count = 0;
>+			unregister_memory(mem);
>+		}
> 	}

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-09 13:55   ` Wei Yang
@ 2019-05-09 14:05     ` David Hildenbrand
  0 siblings, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2019-05-09 14:05 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre

On 09.05.19 15:55, Wei Yang wrote:
> On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>> Only memory to be added to the buddy and to be onlined/offlined by
>> user space using memory block devices needs (and should have!) memory
>> block devices.
>>
>> Factor out creation of memory block devices Create all devices after
>> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>> because it is now effectively stale.
>>
>> Only after memory block devices have been added, memory can be onlined
>> by user space. This implies, that memory is not visible to user space at
>> all before arch_add_memory() succeeded.
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Qian Cai <cai@lca.pw>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Arun KS <arunks@codeaurora.org>
>> Cc: Mathieu Malaterre <malat@debian.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
>> include/linux/memory.h |  2 +-
>> mm/memory_hotplug.c    | 15 ++++-----
>> 3 files changed, 53 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 6e0cb4fda179..862c202a18ca 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
>> 	return 0;
>> }
>>
>> +static void unregister_memory(struct memory_block *memory)
>> +{
>> +	BUG_ON(memory->dev.bus != &memory_subsys);
>> +
>> +	/* drop the ref. we got via find_memory_block() */
>> +	put_device(&memory->dev);
>> +	device_unregister(&memory->dev);
>> +}
>> +
>> /*
>> - * need an interface for the VM to add new memory regions,
>> - * but without onlining it.
>> + * Create memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * will be initialized as offline.
>>  */
>> -int hotplug_memory_register(int nid, struct mem_section *section)
>> +int hotplug_memory_register(unsigned long start, unsigned long size)
>> {
>> -	int ret = 0;
>> +	unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
>> +	unsigned long start_pfn = PFN_DOWN(start);
>> +	unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>> +	unsigned long pfn;
>> 	struct memory_block *mem;
>> +	int ret = 0;
>>
>> -	mutex_lock(&mem_sysfs_mutex);
>> +	BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
>> +	BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
>>
>> -	mem = find_memory_block(section);
>> -	if (mem) {
>> -		mem->section_count++;
>> -		put_device(&mem->dev);
>> -	} else {
>> -		ret = init_memory_block(&mem, section, MEM_OFFLINE);
>> +	mutex_lock(&mem_sysfs_mutex);
>> +	for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
>> +		mem = find_memory_block(__pfn_to_section(pfn));
>> +		if (mem) {
>> +			WARN_ON_ONCE(false);
> 
> One question here, the purpose of WARN_ON_ONCE(false) is? Would we trigger
> this?

Would happen if something goes terribly wrong. We might want to remove
this once we are sure this will not happen.

I replaced it in the meantime by a

if (WARN_ON_ONCE(mem)) {
	put_device(&mem->dev);
	ret = -EEXIST;
	break;
}

> 
>> +			put_device(&mem->dev);
>> +			continue;
>> +		}
>> +		ret = init_memory_block(&mem, __pfn_to_section(pfn),
>> +					MEM_OFFLINE);
>> 		if (ret)
>> -			goto out;
>> -		mem->section_count++;
>> +			break;
>> +		mem->section_count = memory_block_size_bytes() /
>> +				     MIN_MEMORY_BLOCK_SIZE;
> 
> Maybe we can leverage sections_per_block variable.

Most certainly if it does what I think it does :) thanks!


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
                     ` (3 preceding siblings ...)
  2019-05-09 13:55   ` Wei Yang
@ 2019-05-09 14:31   ` Wei Yang
  2019-05-09 14:58     ` David Hildenbrand
  4 siblings, 1 reply; 35+ messages in thread
From: Wei Yang @ 2019-05-09 14:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Wei Yang,
	Arun KS, Mathieu Malaterre

On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>Only memory to be added to the buddy and to be onlined/offlined by
>user space using memory block devices needs (and should have!) memory
>block devices.
>
>Factor out creation of memory block devices Create all devices after
>arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>because it is now effectively stale.
>
>Only after memory block devices have been added, memory can be onlined
>by user space. This implies, that memory is not visible to user space at
>all before arch_add_memory() succeeded.
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: Qian Cai <cai@lca.pw>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
> include/linux/memory.h |  2 +-
> mm/memory_hotplug.c    | 15 ++++-----
> 3 files changed, 53 insertions(+), 34 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 6e0cb4fda179..862c202a18ca 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
> 	return 0;
> }
> 
>+static void unregister_memory(struct memory_block *memory)
>+{
>+	BUG_ON(memory->dev.bus != &memory_subsys);
>+
>+	/* drop the ref. we got via find_memory_block() */
>+	put_device(&memory->dev);
>+	device_unregister(&memory->dev);
>+}
>+
> /*
>- * need an interface for the VM to add new memory regions,
>- * but without onlining it.
>+ * Create memory block devices for the given memory area. Start and size
>+ * have to be aligned to memory block granularity. Memory block devices
>+ * will be initialized as offline.
>  */
>-int hotplug_memory_register(int nid, struct mem_section *section)
>+int hotplug_memory_register(unsigned long start, unsigned long size)

One trivial suggestion about the function name.

For memory_block device, sometimes we use the full name

    find_memory_block
    init_memory_block
    add_memory_block

But sometimes we use *nick* name

    hotplug_memory_register
    register_memory
    unregister_memory

This is a little bit confusion.

Can we use one name convention here? 

[...]

> /*
>@@ -1106,6 +1100,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
> 	if (ret < 0)
> 		goto error;
> 
>+	/* create memory block devices after memory was added */
>+	ret = hotplug_memory_register(start, size);
>+	if (ret) {
>+		arch_remove_memory(nid, start, size, NULL);

Functionally, it works I think.

But arch_remove_memory() would remove pages from zone. At this point, we just
allocate section/mmap for pages, the zones are empty and pages are not
connected to zone.

Function  zone = page_zone(page); always gets zone #0, since pages->flags is 0
at  this point. This is not exact.

Would we add some comment to mention this? Or we need to clean up
arch_remove_memory() to take out __remove_zone()?


>+		goto error;
>+	}
>+
> 	if (new_node) {
> 		/* If sysfs file of new node can't be created, cpu on the node
> 		 * can't be hot-added. There is no rollback way now.
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-09 14:31   ` Wei Yang
@ 2019-05-09 14:58     ` David Hildenbrand
  2019-05-09 21:50       ` Wei Yang
  0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand @ 2019-05-09 14:58 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre

On 09.05.19 16:31, Wei Yang wrote:
> On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>> Only memory to be added to the buddy and to be onlined/offlined by
>> user space using memory block devices needs (and should have!) memory
>> block devices.
>>
>> Factor out creation of memory block devices Create all devices after
>> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>> because it is now effectively stale.
>>
>> Only after memory block devices have been added, memory can be onlined
>> by user space. This implies, that memory is not visible to user space at
>> all before arch_add_memory() succeeded.
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Qian Cai <cai@lca.pw>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Arun KS <arunks@codeaurora.org>
>> Cc: Mathieu Malaterre <malat@debian.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
>> include/linux/memory.h |  2 +-
>> mm/memory_hotplug.c    | 15 ++++-----
>> 3 files changed, 53 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 6e0cb4fda179..862c202a18ca 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
>> 	return 0;
>> }
>>
>> +static void unregister_memory(struct memory_block *memory)
>> +{
>> +	BUG_ON(memory->dev.bus != &memory_subsys);
>> +
>> +	/* drop the ref. we got via find_memory_block() */
>> +	put_device(&memory->dev);
>> +	device_unregister(&memory->dev);
>> +}
>> +
>> /*
>> - * need an interface for the VM to add new memory regions,
>> - * but without onlining it.
>> + * Create memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * will be initialized as offline.
>>  */
>> -int hotplug_memory_register(int nid, struct mem_section *section)
>> +int hotplug_memory_register(unsigned long start, unsigned long size)
> 
> One trivial suggestion about the function name.
> 
> For memory_block device, sometimes we use the full name
> 
>     find_memory_block
>     init_memory_block
>     add_memory_block
> 
> But sometimes we use *nick* name
> 
>     hotplug_memory_register
>     register_memory
>     unregister_memory
> 
> This is a little bit confusion.
> 
> Can we use one name convention here?

We can just go for

crate_memory_blocks() and free_memory_blocks(). Or do
you have better suggestions?

(I would actually even prefer "memory_block_devices", because memory
blocks have different meanins)

> 
> [...]
> 
>> /*
>> @@ -1106,6 +1100,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
>> 	if (ret < 0)
>> 		goto error;
>>
>> +	/* create memory block devices after memory was added */
>> +	ret = hotplug_memory_register(start, size);
>> +	if (ret) {
>> +		arch_remove_memory(nid, start, size, NULL);
> 
> Functionally, it works I think.
> 
> But arch_remove_memory() would remove pages from zone. At this point, we just
> allocate section/mmap for pages, the zones are empty and pages are not
> connected to zone.
> 
> Function  zone = page_zone(page); always gets zone #0, since pages->flags is 0
> at  this point. This is not exact.
> 
> Would we add some comment to mention this? Or we need to clean up
> arch_remove_memory() to take out __remove_zone()?

That is precisely what is on my list next (see cover letter).This is
already broken when memory that was never onlined is removed again.
So I am planning to fix that independently.


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-09 14:58     ` David Hildenbrand
@ 2019-05-09 21:50       ` Wei Yang
  2019-05-09 22:18         ` David Hildenbrand
  0 siblings, 1 reply; 35+ messages in thread
From: Wei Yang @ 2019-05-09 21:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-mm, linux-kernel, linux-ia64, linuxppc-dev,
	linux-s390, linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre

On Thu, May 09, 2019 at 04:58:56PM +0200, David Hildenbrand wrote:
>On 09.05.19 16:31, Wei Yang wrote:
>> On Tue, May 07, 2019 at 08:38:00PM +0200, David Hildenbrand wrote:
>>> Only memory to be added to the buddy and to be onlined/offlined by
>>> user space using memory block devices needs (and should have!) memory
>>> block devices.
>>>
>>> Factor out creation of memory block devices Create all devices after
>>> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>>> because it is now effectively stale.
>>>
>>> Only after memory block devices have been added, memory can be onlined
>>> by user space. This implies, that memory is not visible to user space at
>>> all before arch_add_memory() succeeded.
>>>
>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Cc: Andrew Banman <andrew.banman@hpe.com>
>>> Cc: Oscar Salvador <osalvador@suse.de>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>>> Cc: Qian Cai <cai@lca.pw>
>>> Cc: Wei Yang <richard.weiyang@gmail.com>
>>> Cc: Arun KS <arunks@codeaurora.org>
>>> Cc: Mathieu Malaterre <malat@debian.org>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>> drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
>>> include/linux/memory.h |  2 +-
>>> mm/memory_hotplug.c    | 15 ++++-----
>>> 3 files changed, 53 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>>> index 6e0cb4fda179..862c202a18ca 100644
>>> --- a/drivers/base/memory.c
>>> +++ b/drivers/base/memory.c
>>> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
>>> 	return 0;
>>> }
>>>
>>> +static void unregister_memory(struct memory_block *memory)
>>> +{
>>> +	BUG_ON(memory->dev.bus != &memory_subsys);
>>> +
>>> +	/* drop the ref. we got via find_memory_block() */
>>> +	put_device(&memory->dev);
>>> +	device_unregister(&memory->dev);
>>> +}
>>> +
>>> /*
>>> - * need an interface for the VM to add new memory regions,
>>> - * but without onlining it.
>>> + * Create memory block devices for the given memory area. Start and size
>>> + * have to be aligned to memory block granularity. Memory block devices
>>> + * will be initialized as offline.
>>>  */
>>> -int hotplug_memory_register(int nid, struct mem_section *section)
>>> +int hotplug_memory_register(unsigned long start, unsigned long size)
>> 
>> One trivial suggestion about the function name.
>> 
>> For memory_block device, sometimes we use the full name
>> 
>>     find_memory_block
>>     init_memory_block
>>     add_memory_block
>> 
>> But sometimes we use *nick* name
>> 
>>     hotplug_memory_register
>>     register_memory
>>     unregister_memory
>> 
>> This is a little bit confusion.
>> 
>> Can we use one name convention here?
>
>We can just go for
>
>crate_memory_blocks() and free_memory_blocks(). Or do
>you have better suggestions?

s/crate/create/

Looks good to me.

>
>(I would actually even prefer "memory_block_devices", because memory
>blocks have different meanins)
>

Agree with you, this comes to my mind sometime ago :-)

>> 
>> [...]
>> 
>>> /*
>>> @@ -1106,6 +1100,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
>>> 	if (ret < 0)
>>> 		goto error;
>>>
>>> +	/* create memory block devices after memory was added */
>>> +	ret = hotplug_memory_register(start, size);
>>> +	if (ret) {
>>> +		arch_remove_memory(nid, start, size, NULL);
>> 
>> Functionally, it works I think.
>> 
>> But arch_remove_memory() would remove pages from zone. At this point, we just
>> allocate section/mmap for pages, the zones are empty and pages are not
>> connected to zone.
>> 
>> Function  zone = page_zone(page); always gets zone #0, since pages->flags is 0
>> at  this point. This is not exact.
>> 
>> Would we add some comment to mention this? Or we need to clean up
>> arch_remove_memory() to take out __remove_zone()?
>
>That is precisely what is on my list next (see cover letter).This is
>already broken when memory that was never onlined is removed again.
>So I am planning to fix that independently.
>

Sounds great :-)

Hope you would cc me in the following series.

>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
  2019-05-09 21:50       ` Wei Yang
@ 2019-05-09 22:18         ` David Hildenbrand
  0 siblings, 0 replies; 35+ messages in thread
From: David Hildenbrand @ 2019-05-09 22:18 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, akpm, Dan Williams, Greg Kroah-Hartman,
	Rafael J. Wysocki, mike.travis, Ingo Molnar, Andrew Banman,
	Oscar Salvador, Michal Hocko, Pavel Tatashin, Qian Cai, Arun KS,
	Mathieu Malaterre

> Looks good to me.
> 
>>
>> (I would actually even prefer "memory_block_devices", because memory
>> blocks have different meanins)
>>
> 
> Agree with you, this comes to my mind sometime ago :-)

We have memblocks, memory_blocks  ... I guess memory_block_device is
unique :)

> 
>>>
>>> [...]
>>>
>>>> /*
>>>> @@ -1106,6 +1100,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
>>>> 	if (ret < 0)
>>>> 		goto error;
>>>>
>>>> +	/* create memory block devices after memory was added */
>>>> +	ret = hotplug_memory_register(start, size);
>>>> +	if (ret) {
>>>> +		arch_remove_memory(nid, start, size, NULL);
>>>
>>> Functionally, it works I think.
>>>
>>> But arch_remove_memory() would remove pages from zone. At this point, we just
>>> allocate section/mmap for pages, the zones are empty and pages are not
>>> connected to zone.
>>>
>>> Function  zone = page_zone(page); always gets zone #0, since pages->flags is 0
>>> at  this point. This is not exact.
>>>
>>> Would we add some comment to mention this? Or we need to clean up
>>> arch_remove_memory() to take out __remove_zone()?
>>
>> That is precisely what is on my list next (see cover letter).This is
>> already broken when memory that was never onlined is removed again.
>> So I am planning to fix that independently.
>>
> 
> Sounds great :-)

Especially, I suspect a lot of bugs in the area of

1. Remove memory that has never been onlined
2. Remove memory that has been onlined/offlined a couple of times
3. Remove memory that has been onlined to different zones.

Will see when refactoring if my intuition is right :)

> 
> Hope you would cc me in the following series.


Sure! Thanks!


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2019-05-09 22:18 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190507183804.5512-1-david@redhat.com>
2019-05-07 18:37 ` [PATCH v2 1/8] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() David Hildenbrand
2019-05-07 20:38   ` Dan Williams
2019-05-09 12:23   ` Wei Yang
2019-05-07 18:37 ` [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory() David Hildenbrand
2019-05-07 20:46   ` Dan Williams
2019-05-07 20:47     ` David Hildenbrand
2019-05-07 20:57       ` Dan Williams
2019-05-07 21:13         ` David Hildenbrand
2019-05-07 18:38 ` [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory() David Hildenbrand
2019-05-07 21:17   ` Dan Williams
2019-05-07 21:27     ` David Hildenbrand
2019-05-08  8:35   ` David Hildenbrand
2019-05-09 12:43   ` Wei Yang
2019-05-09 12:50     ` David Hildenbrand
2019-05-09 13:55   ` Wei Yang
2019-05-09 14:05     ` David Hildenbrand
2019-05-09 14:31   ` Wei Yang
2019-05-09 14:58     ` David Hildenbrand
2019-05-09 21:50       ` Wei Yang
2019-05-09 22:18         ` David Hildenbrand
2019-05-07 18:38 ` [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API David Hildenbrand
2019-05-07 21:19   ` Dan Williams
2019-05-07 21:24     ` David Hildenbrand
2019-05-07 21:25       ` Dan Williams
2019-05-08  7:39         ` David Hildenbrand
2019-05-08 23:08           ` osalvador
2019-05-09  7:05             ` David Hildenbrand
2019-05-07 18:38 ` [PATCH v2 6/8] mm/memory_hotplug: Remove memory block devices before arch_remove_memory() David Hildenbrand
2019-05-07 21:27   ` Dan Williams
2019-05-07 18:38 ` [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail David Hildenbrand
2019-05-08  0:15   ` Dan Williams
2019-05-08  7:21     ` David Hildenbrand
2019-05-08 13:50       ` Dan Williams
2019-05-07 18:38 ` [PATCH v2 8/8] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section David Hildenbrand
2019-05-08  0:30   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).