linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] mm: Further memory block device cleanups
@ 2019-06-20 18:31 David Hildenbrand
  2019-06-20 18:31 ` [PATCH v3 1/6] mm: Section numbers use the type "unsigned long" David Hildenbrand
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: David Hildenbrand @ 2019-06-20 18:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	David Hildenbrand, Andrew Banman, Anshuman Khandual, Arun KS,
	Baoquan He, Benjamin Herrenschmidt, Greg Kroah-Hartman,
	Johannes Weiner, Juergen Gross, Keith Busch, Len Brown,
	Mel Gorman, Michael Ellerman, Michael Neuling, Michal Hocko,
	Mike Rapoport, mike.travis, Oscar Salvador, Oscar Salvador,
	Paul Mackerras, Pavel Tatashin, Pavel Tatashin, Pavel Tatashin,
	Qian Cai, Rafael J. Wysocki, Rafael J. Wysocki, Rashmica Gupta,
	Stephen Rothwell, Thomas Gleixner, Vlastimil Babka, Wei Yang

@Andrew: Only patch 1, 4 and 6 changed compared to v1.

Some further cleanups around memory block devices. Especially, clean up
and simplify walk_memory_range(). Including some other minor cleanups.

Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.

v2 -> v3:
- "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
-- Avoid warning on ppc.
- "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
-- Fixup a comment regarding hinted devices.

v1 -> v2:
- "mm: Section numbers use the type "unsigned long""
-- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
- "drivers/base/memory.c: Get rid of find_memory_block_hinted("
-- Fix compilation error
-- Get rid of the "hint" parameter completely

David Hildenbrand (6):
  mm: Section numbers use the type "unsigned long"
  drivers/base/memory: Use "unsigned long" for block ids
  mm: Make register_mem_sect_under_node() static
  mm/memory_hotplug: Rename walk_memory_range() and pass start+size
    instead of pfns
  mm/memory_hotplug: Move and simplify walk_memory_blocks()
  drivers/base/memory.c: Get rid of find_memory_block_hinted()

 arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
 drivers/acpi/acpi_memhotplug.c            |  19 +---
 drivers/base/memory.c                     | 120 +++++++++++++---------
 drivers/base/node.c                       |   8 +-
 include/linux/memory.h                    |   5 +-
 include/linux/memory_hotplug.h            |   2 -
 include/linux/mmzone.h                    |   4 +-
 include/linux/node.h                      |   7 --
 mm/memory_hotplug.c                       |  57 +---------
 mm/sparse.c                               |  12 +--
 10 files changed, 106 insertions(+), 151 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/6] mm: Section numbers use the type "unsigned long"
  2019-06-20 18:31 [PATCH v3 0/6] mm: Further memory block device cleanups David Hildenbrand
@ 2019-06-20 18:31 ` David Hildenbrand
  2019-06-20 18:31 ` [PATCH v3 2/6] drivers/base/memory: Use "unsigned long" for block ids David Hildenbrand
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2019-06-20 18:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	David Hildenbrand, Greg Kroah-Hartman, Rafael J. Wysocki,
	Vlastimil Babka, Michal Hocko, Mel Gorman, Wei Yang,
	Johannes Weiner, Arun KS, Pavel Tatashin, Oscar Salvador,
	Stephen Rothwell, Mike Rapoport, Baoquan He

We are using a mixture of "int" and "unsigned long". Let's make this
consistent by using "unsigned long" everywhere. We'll do the same with
memory block ids next.

While at it, turn the "unsigned long i" in removable_show() into an
int - sections_per_block is an int.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 27 +++++++++++++--------------
 include/linux/mmzone.h |  4 ++--
 mm/sparse.c            | 12 ++++++------
 3 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 826dd76f662e..5947b5a5686d 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -34,7 +34,7 @@ static DEFINE_MUTEX(mem_sysfs_mutex);
 
 static int sections_per_block;
 
-static inline int base_memory_block_id(int section_nr)
+static inline int base_memory_block_id(unsigned long section_nr)
 {
 	return section_nr / sections_per_block;
 }
@@ -131,9 +131,9 @@ static ssize_t phys_index_show(struct device *dev,
 static ssize_t removable_show(struct device *dev, struct device_attribute *attr,
 			      char *buf)
 {
-	unsigned long i, pfn;
-	int ret = 1;
 	struct memory_block *mem = to_memory_block(dev);
+	unsigned long pfn;
+	int ret = 1, i;
 
 	if (mem->state != MEM_ONLINE)
 		goto out;
@@ -691,15 +691,15 @@ static int init_memory_block(struct memory_block **memory, int block_id,
 	return ret;
 }
 
-static int add_memory_block(int base_section_nr)
+static int add_memory_block(unsigned long base_section_nr)
 {
+	int ret, section_count = 0;
 	struct memory_block *mem;
-	int i, ret, section_count = 0;
+	unsigned long nr;
 
-	for (i = base_section_nr;
-	     i < base_section_nr + sections_per_block;
-	     i++)
-		if (present_section_nr(i))
+	for (nr = base_section_nr; nr < base_section_nr + sections_per_block;
+	     nr++)
+		if (present_section_nr(nr))
 			section_count++;
 
 	if (section_count == 0)
@@ -822,10 +822,9 @@ static const struct attribute_group *memory_root_attr_groups[] = {
  */
 int __init memory_dev_init(void)
 {
-	unsigned int i;
 	int ret;
 	int err;
-	unsigned long block_sz;
+	unsigned long block_sz, nr;
 
 	ret = subsys_system_register(&memory_subsys, memory_root_attr_groups);
 	if (ret)
@@ -839,9 +838,9 @@ int __init memory_dev_init(void)
 	 * during boot and have been initialized
 	 */
 	mutex_lock(&mem_sysfs_mutex);
-	for (i = 0; i <= __highest_present_section_nr;
-		i += sections_per_block) {
-		err = add_memory_block(i);
+	for (nr = 0; nr <= __highest_present_section_nr;
+	     nr += sections_per_block) {
+		err = add_memory_block(nr);
 		if (!ret)
 			ret = err;
 	}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 427b79c39b3c..83b6aae16f13 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1220,7 +1220,7 @@ static inline struct mem_section *__nr_to_section(unsigned long nr)
 		return NULL;
 	return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];
 }
-extern int __section_nr(struct mem_section* ms);
+extern unsigned long __section_nr(struct mem_section *ms);
 extern unsigned long usemap_size(void);
 
 /*
@@ -1292,7 +1292,7 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
 	return __nr_to_section(pfn_to_section_nr(pfn));
 }
 
-extern int __highest_present_section_nr;
+extern unsigned long __highest_present_section_nr;
 
 #ifndef CONFIG_HAVE_ARCH_PFN_VALID
 static inline int pfn_valid(unsigned long pfn)
diff --git a/mm/sparse.c b/mm/sparse.c
index 1552c855d62a..e8c57e039be8 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -102,7 +102,7 @@ static inline int sparse_index_init(unsigned long section_nr, int nid)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_EXTREME
-int __section_nr(struct mem_section* ms)
+unsigned long __section_nr(struct mem_section *ms)
 {
 	unsigned long root_nr;
 	struct mem_section *root = NULL;
@@ -121,9 +121,9 @@ int __section_nr(struct mem_section* ms)
 	return (root_nr * SECTIONS_PER_ROOT) + (ms - root);
 }
 #else
-int __section_nr(struct mem_section* ms)
+unsigned long __section_nr(struct mem_section *ms)
 {
-	return (int)(ms - mem_section[0]);
+	return (unsigned long)(ms - mem_section[0]);
 }
 #endif
 
@@ -178,10 +178,10 @@ void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
  * Keeping track of this gives us an easy way to break out of
  * those loops early.
  */
-int __highest_present_section_nr;
+unsigned long __highest_present_section_nr;
 static void section_mark_present(struct mem_section *ms)
 {
-	int section_nr = __section_nr(ms);
+	unsigned long section_nr = __section_nr(ms);
 
 	if (section_nr > __highest_present_section_nr)
 		__highest_present_section_nr = section_nr;
@@ -189,7 +189,7 @@ static void section_mark_present(struct mem_section *ms)
 	ms->section_mem_map |= SECTION_MARKED_PRESENT;
 }
 
-static inline int next_present_section_nr(int section_nr)
+static inline unsigned long next_present_section_nr(unsigned long section_nr)
 {
 	do {
 		section_nr++;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/6] drivers/base/memory: Use "unsigned long" for block ids
  2019-06-20 18:31 [PATCH v3 0/6] mm: Further memory block device cleanups David Hildenbrand
  2019-06-20 18:31 ` [PATCH v3 1/6] mm: Section numbers use the type "unsigned long" David Hildenbrand
@ 2019-06-20 18:31 ` David Hildenbrand
  2019-06-20 18:31 ` [PATCH v3 3/6] mm: Make register_mem_sect_under_node() static David Hildenbrand
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2019-06-20 18:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	David Hildenbrand, Greg Kroah-Hartman, Rafael J. Wysocki

Block ids are just shifted section numbers, so let's also use
"unsigned long" for them, too.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 5947b5a5686d..c54e80fd25a8 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -34,12 +34,12 @@ static DEFINE_MUTEX(mem_sysfs_mutex);
 
 static int sections_per_block;
 
-static inline int base_memory_block_id(unsigned long section_nr)
+static inline unsigned long base_memory_block_id(unsigned long section_nr)
 {
 	return section_nr / sections_per_block;
 }
 
-static inline int pfn_to_block_id(unsigned long pfn)
+static inline unsigned long pfn_to_block_id(unsigned long pfn)
 {
 	return base_memory_block_id(pfn_to_section_nr(pfn));
 }
@@ -587,7 +587,7 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
  * A reference for the returned object is held and the reference for the
  * hinted object is released.
  */
-static struct memory_block *find_memory_block_by_id(int block_id,
+static struct memory_block *find_memory_block_by_id(unsigned long block_id,
 						    struct memory_block *hint)
 {
 	struct device *hintdev = hint ? &hint->dev : NULL;
@@ -604,7 +604,7 @@ static struct memory_block *find_memory_block_by_id(int block_id,
 struct memory_block *find_memory_block_hinted(struct mem_section *section,
 					      struct memory_block *hint)
 {
-	int block_id = base_memory_block_id(__section_nr(section));
+	unsigned long block_id = base_memory_block_id(__section_nr(section));
 
 	return find_memory_block_by_id(block_id, hint);
 }
@@ -663,8 +663,8 @@ int register_memory(struct memory_block *memory)
 	return ret;
 }
 
-static int init_memory_block(struct memory_block **memory, int block_id,
-			     unsigned long state)
+static int init_memory_block(struct memory_block **memory,
+			     unsigned long block_id, unsigned long state)
 {
 	struct memory_block *mem;
 	unsigned long start_pfn;
@@ -729,8 +729,8 @@ static void unregister_memory(struct memory_block *memory)
  */
 int create_memory_block_devices(unsigned long start, unsigned long size)
 {
-	const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
-	int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
+	const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
+	unsigned long end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
 	struct memory_block *mem;
 	unsigned long block_id;
 	int ret = 0;
@@ -766,10 +766,10 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
  */
 void remove_memory_block_devices(unsigned long start, unsigned long size)
 {
-	const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
-	const int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
+	const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
+	const unsigned long end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
 	struct memory_block *mem;
-	int block_id;
+	unsigned long block_id;
 
 	if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
 			 !IS_ALIGNED(size, memory_block_size_bytes())))
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/6] mm: Make register_mem_sect_under_node() static
  2019-06-20 18:31 [PATCH v3 0/6] mm: Further memory block device cleanups David Hildenbrand
  2019-06-20 18:31 ` [PATCH v3 1/6] mm: Section numbers use the type "unsigned long" David Hildenbrand
  2019-06-20 18:31 ` [PATCH v3 2/6] drivers/base/memory: Use "unsigned long" for block ids David Hildenbrand
@ 2019-06-20 18:31 ` David Hildenbrand
  2019-06-20 18:31 ` [PATCH v3 4/6] mm/memory_hotplug: Rename walk_memory_range() and pass start+size instead of pfns David Hildenbrand
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2019-06-20 18:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	David Hildenbrand, Greg Kroah-Hartman, Rafael J. Wysocki,
	Keith Busch, Oscar Salvador

It is only used internally.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/node.c  | 3 ++-
 include/linux/node.h | 7 -------
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 9be88fd05147..e6364e3e3e31 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -752,7 +752,8 @@ static int __ref get_nid_for_pfn(unsigned long pfn)
 }
 
 /* register memory section under specified node if it spans that node */
-int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
+static int register_mem_sect_under_node(struct memory_block *mem_blk,
+					 void *arg)
 {
 	int ret, nid = *(int *)arg;
 	unsigned long pfn, sect_start_pfn, sect_end_pfn;
diff --git a/include/linux/node.h b/include/linux/node.h
index 548c226966a2..4866f32a02d8 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -137,8 +137,6 @@ static inline int register_one_node(int nid)
 extern void unregister_one_node(int nid);
 extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
-extern int register_mem_sect_under_node(struct memory_block *mem_blk,
-						void *arg);
 extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
 
 extern int register_memory_node_under_compute_node(unsigned int mem_nid,
@@ -170,11 +168,6 @@ static inline int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
 {
 	return 0;
 }
-static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
-							void *arg)
-{
-	return 0;
-}
 static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 {
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/6] mm/memory_hotplug: Rename walk_memory_range() and pass start+size instead of pfns
  2019-06-20 18:31 [PATCH v3 0/6] mm: Further memory block device cleanups David Hildenbrand
                   ` (2 preceding siblings ...)
  2019-06-20 18:31 ` [PATCH v3 3/6] mm: Make register_mem_sect_under_node() static David Hildenbrand
@ 2019-06-20 18:31 ` David Hildenbrand
  2019-06-20 18:31 ` [PATCH v3 5/6] mm/memory_hotplug: Move and simplify walk_memory_blocks() David Hildenbrand
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2019-06-20 18:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	David Hildenbrand, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rafael J. Wysocki, Len Brown,
	Greg Kroah-Hartman, Rashmica Gupta, Pavel Tatashin,
	Anshuman Khandual, Michael Neuling, Thomas Gleixner,
	Oscar Salvador, Michal Hocko, Wei Yang, Juergen Gross, Qian Cai,
	Arun KS

walk_memory_range() was once used to iterate over sections. Now, it
iterates over memory blocks. Rename the function, fixup the
documentation. Also, pass start+size instead of PFNs, which is what most
callers already have at hand. (we'll rework link_mem_sections() most
probably soon)

Follow-up patches wil rework, simplify, and move walk_memory_blocks() to
drivers/base/memory.c.

Note: walk_memory_blocks() only works correctly right now if the
start_pfn is aligned to a section start. This is the case right now,
but we'll generalize the function in a follow up patch so the semantics
match the documentation.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Arun KS <arunks@codeaurora.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/memtrace.c | 23 +++++++++++-----------
 drivers/acpi/acpi_memhotplug.c            | 19 ++++--------------
 drivers/base/node.c                       |  5 +++--
 include/linux/memory_hotplug.h            |  2 +-
 mm/memory_hotplug.c                       | 24 ++++++++++++-----------
 5 files changed, 32 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 5e53c1392d3b..eb2e75dac369 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -70,23 +70,23 @@ static int change_memblock_state(struct memory_block *mem, void *arg)
 /* called with device_hotplug_lock held */
 static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 {
-	u64 end_pfn = start_pfn + nr_pages - 1;
+	const unsigned long start = PFN_PHYS(start_pfn);
+	const unsigned long size = PFN_PHYS(nr_pages);
 
-	if (walk_memory_range(start_pfn, end_pfn, NULL,
-	    check_memblock_online))
+	if (walk_memory_blocks(start, size, NULL, check_memblock_online))
 		return false;
 
-	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
-			  change_memblock_state);
+	walk_memory_blocks(start, size, (void *)MEM_GOING_OFFLINE,
+			   change_memblock_state);
 
 	if (offline_pages(start_pfn, nr_pages)) {
-		walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
-				  change_memblock_state);
+		walk_memory_blocks(start, size, (void *)MEM_ONLINE,
+				   change_memblock_state);
 		return false;
 	}
 
-	walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
-			  change_memblock_state);
+	walk_memory_blocks(start, size, (void *)MEM_OFFLINE,
+			   change_memblock_state);
 
 
 	return true;
@@ -242,9 +242,8 @@ static int memtrace_online(void)
 		 */
 		if (!memhp_auto_online) {
 			lock_device_hotplug();
-			walk_memory_range(PFN_DOWN(ent->start),
-					  PFN_UP(ent->start + ent->size - 1),
-					  NULL, online_mem_block);
+			walk_memory_blocks(ent->start, ent->size, NULL,
+					   online_mem_block);
 			unlock_device_hotplug();
 		}
 
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index db013dc21c02..e294f44a7850 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -155,16 +155,6 @@ static int acpi_memory_check_device(struct acpi_memory_device *mem_device)
 	return 0;
 }
 
-static unsigned long acpi_meminfo_start_pfn(struct acpi_memory_info *info)
-{
-	return PFN_DOWN(info->start_addr);
-}
-
-static unsigned long acpi_meminfo_end_pfn(struct acpi_memory_info *info)
-{
-	return PFN_UP(info->start_addr + info->length-1);
-}
-
 static int acpi_bind_memblk(struct memory_block *mem, void *arg)
 {
 	return acpi_bind_one(&mem->dev, arg);
@@ -173,9 +163,8 @@ static int acpi_bind_memblk(struct memory_block *mem, void *arg)
 static int acpi_bind_memory_blocks(struct acpi_memory_info *info,
 				   struct acpi_device *adev)
 {
-	return walk_memory_range(acpi_meminfo_start_pfn(info),
-				 acpi_meminfo_end_pfn(info), adev,
-				 acpi_bind_memblk);
+	return walk_memory_blocks(info->start_addr, info->length, adev,
+				  acpi_bind_memblk);
 }
 
 static int acpi_unbind_memblk(struct memory_block *mem, void *arg)
@@ -186,8 +175,8 @@ static int acpi_unbind_memblk(struct memory_block *mem, void *arg)
 
 static void acpi_unbind_memory_blocks(struct acpi_memory_info *info)
 {
-	walk_memory_range(acpi_meminfo_start_pfn(info),
-			  acpi_meminfo_end_pfn(info), NULL, acpi_unbind_memblk);
+	walk_memory_blocks(info->start_addr, info->length, NULL,
+			   acpi_unbind_memblk);
 }
 
 static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
diff --git a/drivers/base/node.c b/drivers/base/node.c
index e6364e3e3e31..d8c02e65df68 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -833,8 +833,9 @@ void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 
 int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn)
 {
-	return walk_memory_range(start_pfn, end_pfn, (void *)&nid,
-					register_mem_sect_under_node);
+	return walk_memory_blocks(PFN_PHYS(start_pfn),
+				  PFN_PHYS(end_pfn - start_pfn), (void *)&nid,
+				  register_mem_sect_under_node);
 }
 
 #ifdef CONFIG_HUGETLBFS
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 79e0add6a597..d9fffc34949f 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -340,7 +340,7 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 extern void __ref free_area_init_core_hotplug(int nid);
-extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
+extern int walk_memory_blocks(unsigned long start, unsigned long size,
 		void *arg, int (*func)(struct memory_block *, void *));
 extern int __add_memory(int nid, u64 start, u64 size);
 extern int add_memory(int nid, u64 start, u64 size);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a88c5f334e5a..122a7d31efdd 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1126,8 +1126,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 
 	/* online pages if requested */
 	if (memhp_auto_online)
-		walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1),
-				  NULL, online_memory_block);
+		walk_memory_blocks(start, size, NULL, online_memory_block);
 
 	return ret;
 error:
@@ -1665,20 +1664,24 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 /**
- * walk_memory_range - walks through all mem sections in [start_pfn, end_pfn)
- * @start_pfn: start pfn of the memory range
- * @end_pfn: end pfn of the memory range
+ * walk_memory_blocks - walk through all present memory blocks overlapped
+ *			by the range [start, start + size)
+ *
+ * @start: start address of the memory range
+ * @size: size of the memory range
  * @arg: argument passed to func
- * @func: callback for each memory section walked
+ * @func: callback for each memory block walked
  *
- * This function walks through all present mem sections in range
- * [start_pfn, end_pfn) and call func on each mem section.
+ * This function walks through all present memory blocks overlapped by the
+ * range [start, start + size), calling func on each memory block.
  *
  * Returns the return value of func.
  */
-int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
+int walk_memory_blocks(unsigned long start, unsigned long size,
 		void *arg, int (*func)(struct memory_block *, void *))
 {
+	const unsigned long start_pfn = PFN_DOWN(start);
+	const unsigned long end_pfn = PFN_UP(start + size - 1);
 	struct memory_block *mem = NULL;
 	struct mem_section *section;
 	unsigned long pfn, section_nr;
@@ -1824,8 +1827,7 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
 	 * whether all memory blocks in question are offline and return error
 	 * if this is not the case.
 	 */
-	rc = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
-			       check_memblock_offlined_cb);
+	rc = walk_memory_blocks(start, size, NULL, check_memblock_offlined_cb);
 	if (rc)
 		goto done;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/6] mm/memory_hotplug: Move and simplify walk_memory_blocks()
  2019-06-20 18:31 [PATCH v3 0/6] mm: Further memory block device cleanups David Hildenbrand
                   ` (3 preceding siblings ...)
  2019-06-20 18:31 ` [PATCH v3 4/6] mm/memory_hotplug: Rename walk_memory_range() and pass start+size instead of pfns David Hildenbrand
@ 2019-06-20 18:31 ` David Hildenbrand
  2019-06-21 15:26   ` David Hildenbrand
  2019-06-20 18:31 ` [PATCH v3 6/6] drivers/base/memory.c: Get rid of find_memory_block_hinted() David Hildenbrand
  2019-06-21 15:15 ` [PATCH v3 0/6] mm: Further memory block device cleanups Qian Cai
  6 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2019-06-20 18:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	David Hildenbrand, Greg Kroah-Hartman, Rafael J. Wysocki,
	Stephen Rothwell, Pavel Tatashin, Andrew Banman, mike.travis,
	Oscar Salvador, Michal Hocko, Wei Yang, Arun KS, Qian Cai

Let's move walk_memory_blocks() to the place where memory block logic
resides and simplify it. While at it, add a type for the callback function.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c          | 42 ++++++++++++++++++++++++++
 include/linux/memory.h         |  3 ++
 include/linux/memory_hotplug.h |  2 --
 mm/memory_hotplug.c            | 55 ----------------------------------
 4 files changed, 45 insertions(+), 57 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c54e80fd25a8..0204384b4d1d 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -44,6 +44,11 @@ static inline unsigned long pfn_to_block_id(unsigned long pfn)
 	return base_memory_block_id(pfn_to_section_nr(pfn));
 }
 
+static inline unsigned long phys_to_block_id(unsigned long phys)
+{
+	return pfn_to_block_id(PFN_DOWN(phys));
+}
+
 static int memory_subsys_online(struct device *dev);
 static int memory_subsys_offline(struct device *dev);
 
@@ -851,3 +856,40 @@ int __init memory_dev_init(void)
 		printk(KERN_ERR "%s() failed: %d\n", __func__, ret);
 	return ret;
 }
+
+/**
+ * walk_memory_blocks - walk through all present memory blocks overlapped
+ *			by the range [start, start + size)
+ *
+ * @start: start address of the memory range
+ * @size: size of the memory range
+ * @arg: argument passed to func
+ * @func: callback for each memory section walked
+ *
+ * This function walks through all present memory blocks overlapped by the
+ * range [start, start + size), calling func on each memory block.
+ *
+ * In case func() returns an error, walking is aborted and the error is
+ * returned.
+ */
+int walk_memory_blocks(unsigned long start, unsigned long size,
+		       void *arg, walk_memory_blocks_func_t func)
+{
+	const unsigned long start_block_id = phys_to_block_id(start);
+	const unsigned long end_block_id = phys_to_block_id(start + size - 1);
+	struct memory_block *mem;
+	unsigned long block_id;
+	int ret = 0;
+
+	for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
+		mem = find_memory_block_by_id(block_id, NULL);
+		if (!mem)
+			continue;
+
+		ret = func(mem, arg);
+		put_device(&mem->dev);
+		if (ret)
+			break;
+	}
+	return ret;
+}
diff --git a/include/linux/memory.h b/include/linux/memory.h
index f26a5417ec5d..b3b388775a30 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -119,6 +119,9 @@ extern int memory_isolate_notify(unsigned long val, void *v);
 extern struct memory_block *find_memory_block_hinted(struct mem_section *,
 							struct memory_block *);
 extern struct memory_block *find_memory_block(struct mem_section *);
+typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *);
+extern int walk_memory_blocks(unsigned long start, unsigned long size,
+			      void *arg, walk_memory_blocks_func_t func);
 #define CONFIG_MEM_BLOCK_SIZE	(PAGES_PER_SECTION<<PAGE_SHIFT)
 #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index d9fffc34949f..475aff8efbf8 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -340,8 +340,6 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 extern void __ref free_area_init_core_hotplug(int nid);
-extern int walk_memory_blocks(unsigned long start, unsigned long size,
-		void *arg, int (*func)(struct memory_block *, void *));
 extern int __add_memory(int nid, u64 start, u64 size);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 122a7d31efdd..fc558e9ff939 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1661,62 +1661,7 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
 	return __offline_pages(start_pfn, start_pfn + nr_pages);
 }
-#endif /* CONFIG_MEMORY_HOTREMOVE */
 
-/**
- * walk_memory_blocks - walk through all present memory blocks overlapped
- *			by the range [start, start + size)
- *
- * @start: start address of the memory range
- * @size: size of the memory range
- * @arg: argument passed to func
- * @func: callback for each memory block walked
- *
- * This function walks through all present memory blocks overlapped by the
- * range [start, start + size), calling func on each memory block.
- *
- * Returns the return value of func.
- */
-int walk_memory_blocks(unsigned long start, unsigned long size,
-		void *arg, int (*func)(struct memory_block *, void *))
-{
-	const unsigned long start_pfn = PFN_DOWN(start);
-	const unsigned long end_pfn = PFN_UP(start + size - 1);
-	struct memory_block *mem = NULL;
-	struct mem_section *section;
-	unsigned long pfn, section_nr;
-	int ret;
-
-	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
-		section_nr = pfn_to_section_nr(pfn);
-		if (!present_section_nr(section_nr))
-			continue;
-
-		section = __nr_to_section(section_nr);
-		/* same memblock? */
-		if (mem)
-			if ((section_nr >= mem->start_section_nr) &&
-			    (section_nr <= mem->end_section_nr))
-				continue;
-
-		mem = find_memory_block_hinted(section, mem);
-		if (!mem)
-			continue;
-
-		ret = func(mem, arg);
-		if (ret) {
-			kobject_put(&mem->dev.kobj);
-			return ret;
-		}
-	}
-
-	if (mem)
-		kobject_put(&mem->dev.kobj);
-
-	return 0;
-}
-
-#ifdef CONFIG_MEMORY_HOTREMOVE
 static int check_memblock_offlined_cb(struct memory_block *mem, void *arg)
 {
 	int ret = !is_memblock_offlined(mem);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 6/6] drivers/base/memory.c: Get rid of find_memory_block_hinted()
  2019-06-20 18:31 [PATCH v3 0/6] mm: Further memory block device cleanups David Hildenbrand
                   ` (4 preceding siblings ...)
  2019-06-20 18:31 ` [PATCH v3 5/6] mm/memory_hotplug: Move and simplify walk_memory_blocks() David Hildenbrand
@ 2019-06-20 18:31 ` David Hildenbrand
  2019-06-21 15:15 ` [PATCH v3 0/6] mm: Further memory block device cleanups Qian Cai
  6 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2019-06-20 18:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	David Hildenbrand, Greg Kroah-Hartman, Rafael J. Wysocki,
	Stephen Rothwell, Pavel Tatashin, mike.travis

No longer needed, let's remove it. Also, drop the "hint" parameter
completely from "find_memory_block_by_id", as nobody needs it anymore.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 37 +++++++++++--------------------------
 include/linux/memory.h |  2 --
 2 files changed, 11 insertions(+), 28 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 0204384b4d1d..195dbcb8e8a8 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -588,30 +588,13 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
 	return 0;
 }
 
-/*
- * A reference for the returned object is held and the reference for the
- * hinted object is released.
- */
-static struct memory_block *find_memory_block_by_id(unsigned long block_id,
-						    struct memory_block *hint)
+/* A reference for the returned memory block device is acquired. */
+static struct memory_block *find_memory_block_by_id(unsigned long block_id)
 {
-	struct device *hintdev = hint ? &hint->dev : NULL;
 	struct device *dev;
 
-	dev = subsys_find_device_by_id(&memory_subsys, block_id, hintdev);
-	if (hint)
-		put_device(&hint->dev);
-	if (!dev)
-		return NULL;
-	return to_memory_block(dev);
-}
-
-struct memory_block *find_memory_block_hinted(struct mem_section *section,
-					      struct memory_block *hint)
-{
-	unsigned long block_id = base_memory_block_id(__section_nr(section));
-
-	return find_memory_block_by_id(block_id, hint);
+	dev = subsys_find_device_by_id(&memory_subsys, block_id, NULL);
+	return dev ? to_memory_block(dev) : NULL;
 }
 
 /*
@@ -624,7 +607,9 @@ struct memory_block *find_memory_block_hinted(struct mem_section *section,
  */
 struct memory_block *find_memory_block(struct mem_section *section)
 {
-	return find_memory_block_hinted(section, NULL);
+	unsigned long block_id = base_memory_block_id(__section_nr(section));
+
+	return find_memory_block_by_id(block_id);
 }
 
 static struct attribute *memory_memblk_attrs[] = {
@@ -675,7 +660,7 @@ static int init_memory_block(struct memory_block **memory,
 	unsigned long start_pfn;
 	int ret = 0;
 
-	mem = find_memory_block_by_id(block_id, NULL);
+	mem = find_memory_block_by_id(block_id);
 	if (mem) {
 		put_device(&mem->dev);
 		return -EEXIST;
@@ -755,7 +740,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
 		end_block_id = block_id;
 		for (block_id = start_block_id; block_id != end_block_id;
 		     block_id++) {
-			mem = find_memory_block_by_id(block_id, NULL);
+			mem = find_memory_block_by_id(block_id);
 			mem->section_count = 0;
 			unregister_memory(mem);
 		}
@@ -782,7 +767,7 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
 
 	mutex_lock(&mem_sysfs_mutex);
 	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
-		mem = find_memory_block_by_id(block_id, NULL);
+		mem = find_memory_block_by_id(block_id);
 		if (WARN_ON_ONCE(!mem))
 			continue;
 		mem->section_count = 0;
@@ -882,7 +867,7 @@ int walk_memory_blocks(unsigned long start, unsigned long size,
 	int ret = 0;
 
 	for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
-		mem = find_memory_block_by_id(block_id, NULL);
+		mem = find_memory_block_by_id(block_id);
 		if (!mem)
 			continue;
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index b3b388775a30..02e633f3ede0 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -116,8 +116,6 @@ void remove_memory_block_devices(unsigned long start, unsigned long size);
 extern int memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
 extern int memory_isolate_notify(unsigned long val, void *v);
-extern struct memory_block *find_memory_block_hinted(struct mem_section *,
-							struct memory_block *);
 extern struct memory_block *find_memory_block(struct mem_section *);
 typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *);
 extern int walk_memory_blocks(unsigned long start, unsigned long size,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/6] mm: Further memory block device cleanups
  2019-06-20 18:31 [PATCH v3 0/6] mm: Further memory block device cleanups David Hildenbrand
                   ` (5 preceding siblings ...)
  2019-06-20 18:31 ` [PATCH v3 6/6] drivers/base/memory.c: Get rid of find_memory_block_hinted() David Hildenbrand
@ 2019-06-21 15:15 ` Qian Cai
  2019-06-21 15:22   ` David Hildenbrand
  2019-06-21 18:24   ` David Hildenbrand
  6 siblings, 2 replies; 16+ messages in thread
From: Qian Cai @ 2019-06-21 15:15 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	Andrew Banman, Anshuman Khandual, Arun KS, Baoquan He,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, Johannes Weiner,
	Juergen Gross, Keith Busch, Len Brown, Mel Gorman,
	Michael Ellerman, Michael Neuling, Michal Hocko, Mike Rapoport,
	mike.travis, Oscar Salvador, Oscar Salvador, Paul Mackerras,
	Pavel Tatashin, Pavel Tatashin, Pavel Tatashin,
	Rafael J. Wysocki, Rafael J. Wysocki, Rashmica Gupta,
	Stephen Rothwell, Thomas Gleixner, Vlastimil Babka, Wei Yang

On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
> 
> Some further cleanups around memory block devices. Especially, clean up
> and simplify walk_memory_range(). Including some other minor cleanups.
> 
> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
> 
> v2 -> v3:
> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
> -- Avoid warning on ppc.
> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
> -- Fixup a comment regarding hinted devices.
> 
> v1 -> v2:
> - "mm: Section numbers use the type "unsigned long""
> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
> -- Fix compilation error
> -- Get rid of the "hint" parameter completely
> 
> David Hildenbrand (6):
>   mm: Section numbers use the type "unsigned long"
>   drivers/base/memory: Use "unsigned long" for block ids
>   mm: Make register_mem_sect_under_node() static
>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>     instead of pfns
>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
> 
>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>  drivers/acpi/acpi_memhotplug.c            |  19 +---
>  drivers/base/memory.c                     | 120 +++++++++++++---------
>  drivers/base/node.c                       |   8 +-
>  include/linux/memory.h                    |   5 +-
>  include/linux/memory_hotplug.h            |   2 -
>  include/linux/mmzone.h                    |   4 +-
>  include/linux/node.h                      |   7 --
>  mm/memory_hotplug.c                       |  57 +---------
>  mm/sparse.c                               |  12 +--
>  10 files changed, 106 insertions(+), 151 deletions(-)
> 

This series causes a few machines are unable to boot triggering endless soft
lockups. Reverted those commits fixed the issue.

97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
start+size instead of pfns"
c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
startsize-instead-of-pfns-fix"
34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify walk_memory_blocks()"
59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
find_memory_block_hinted()"
5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-find_memory_block_hinted-
v3"

[    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe ASPM,
so disable it
[    4.590405][    T1] ACPI: bus type PCI registered
[    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
0x80000000-0x8fffffff] (base 0x80000000)
[    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in
E820
[    4.601860][    T1] PCI: Using configuration type 1 for base access
[   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
[swapper/0:1]
[   28.671351][   C16] Modules linked in:
[   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
next-20190621+ #1
[   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 03/09/2018
[   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
[   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48 8b
55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d <65>
ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
[   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff13
[   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
ffffffffb6f2a3b8
[   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
ffff8882053d6138
[   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
ffffed1040a7ac27
[   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
0000000000000246
[   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
0000000000000000
[   28.761368][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
knlGS:0000000000000000
[   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
00000000001406a0
[   28.791333][   C16] Call Trace:
[   28.791374][   C16]  klist_next+0xd8/0x1c0
[   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
[   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
[   28.801370][   C16]  ? kobject_put+0x23/0x250
[   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
[   28.811353][   C16]  ? write_policy_show+0x40/0x40
[   28.821334][   C16]  link_mem_sections+0x7e/0xa0
[   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
[   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
[   28.831353][   C16]  topology_init+0xbf/0x126
[   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
[   28.841368][   C16]  do_one_initcall+0xfe/0x45a
[   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
[   28.851353][   C16]  ? kasan_check_write+0x14/0x20
[   28.861333][   C16]  ? up_write+0x75/0x140
[   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
[   28.871333][   C16]  ? rest_init+0x188/0x188
[   28.871353][   C16]  kernel_init+0x11/0x138
[   28.881363][   C16]  ? rest_init+0x188/0x188
[   28.881363][   C16]  ret_from_fork+0x22/0x40
[   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
[swapper/0:1]
[   56.671352][   C16] Modules linked in:
[   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
G             L    5.2.0-rc5-next-20190621+ #1
[   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 03/09/2018
[   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
[   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d 8b 7e
58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00 <75>
c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
[   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287 ORIG_RAX:
ffffffffffffff13
[   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
ffffffffb74c9dc1
[   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
ffff8888774ec3e0
[   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
ffffed1040a7ac27
[   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
0000000000085c1b
[   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
ffff8888774ec040
[   56.761372][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
knlGS:0000000000000000
[   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
00000000001406a0
[   56.791373][   C16] Call Trace:
[   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
[   56.801334][   C16]  ? kobject_put+0x23/0x250
[   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
[   56.811333][   C16]  ? write_policy_show+0x40/0x40
[   56.811353][   C16]  link_mem_sections+0x7e/0xa0
[   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
[   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
[   56.831333][   C16]  topology_init+0xbf/0x126
[   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
[   56.841334][   C16]  do_one_initcall+0xfe/0x45a
[   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
[   56.851333][   C16]  ? kasan_check_write+0x14/0x20
[   56.851354][   C16]  ? up_write+0x75/0x140
[   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
[   56.861333][   C16]  ? rest_init+0x188/0x188
[   56.861369][   C16]  kernel_init+0x11/0x138
[   56.871333][   C16]  ? rest_init+0x188/0x188
[   56.871354][   C16]  ret_from_fork+0x22/0x40
[   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
[   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
[   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
[   64.621334][   C16] NMI backtrace for cpu 16
[   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
G             L    5.2.0-rc5-next-20190621+ #1
[   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 03/09/2018
[   64.641371][   C16] Call Trace:
[   64.651337][   C16]  <IRQ>
[   64.651376][   C16]  dump_stack+0x62/0x9a
[   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
[   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
[   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
[   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
[   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
[   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
[   64.691336][   C16]  ? kasan_check_read+0x11/0x20
[   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
[   64.701336][   C16]  update_process_times+0x2f/0x60
[   64.701362][   C16]  tick_periodic+0x38/0xe0
[   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
[   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
[   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
[   64.721367][   C16]  </IRQ>
[   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
[   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/6] mm: Further memory block device cleanups
  2019-06-21 15:15 ` [PATCH v3 0/6] mm: Further memory block device cleanups Qian Cai
@ 2019-06-21 15:22   ` David Hildenbrand
  2019-06-21 18:24   ` David Hildenbrand
  1 sibling, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2019-06-21 15:22 UTC (permalink / raw)
  To: Qian Cai, linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	Andrew Banman, Anshuman Khandual, Arun KS, Baoquan He,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, Johannes Weiner,
	Juergen Gross, Keith Busch, Len Brown, Mel Gorman,
	Michael Ellerman, Michael Neuling, Michal Hocko, Mike Rapoport,
	mike.travis, Oscar Salvador, Oscar Salvador, Paul Mackerras,
	Pavel Tatashin, Pavel Tatashin, Pavel Tatashin,
	Rafael J. Wysocki, Rafael J. Wysocki, Rashmica Gupta,
	Stephen Rothwell, Thomas Gleixner, Vlastimil Babka, Wei Yang

On 21.06.19 17:15, Qian Cai wrote:
> On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
>> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
>>
>> Some further cleanups around memory block devices. Especially, clean up
>> and simplify walk_memory_range(). Including some other minor cleanups.
>>
>> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
>>
>> v2 -> v3:
>> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
>> -- Avoid warning on ppc.
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
>> -- Fixup a comment regarding hinted devices.
>>
>> v1 -> v2:
>> - "mm: Section numbers use the type "unsigned long""
>> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
>> -- Fix compilation error
>> -- Get rid of the "hint" parameter completely
>>
>> David Hildenbrand (6):
>>   mm: Section numbers use the type "unsigned long"
>>   drivers/base/memory: Use "unsigned long" for block ids
>>   mm: Make register_mem_sect_under_node() static
>>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>>     instead of pfns
>>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
>>
>>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>>  drivers/acpi/acpi_memhotplug.c            |  19 +---
>>  drivers/base/memory.c                     | 120 +++++++++++++---------
>>  drivers/base/node.c                       |   8 +-
>>  include/linux/memory.h                    |   5 +-
>>  include/linux/memory_hotplug.h            |   2 -
>>  include/linux/mmzone.h                    |   4 +-
>>  include/linux/node.h                      |   7 --
>>  mm/memory_hotplug.c                       |  57 +---------
>>  mm/sparse.c                               |  12 +--
>>  10 files changed, 106 insertions(+), 151 deletions(-)
>>
> 
> This series causes a few machines are unable to boot triggering endless soft
> lockups. Reverted those commits fixed the issue.
> 
> 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
> start+size instead of pfns"
> c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> startsize-instead-of-pfns-fix"
> 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify walk_memory_blocks()"
> 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> find_memory_block_hinted()"
> 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-find_memory_block_hinted-
> v3"
> 
> [    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe ASPM,
> so disable it
> [    4.590405][    T1] ACPI: bus type PCI registered
> [    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> 0x80000000-0x8fffffff] (base 0x80000000)
> [    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in
> E820
> [    4.601860][    T1] PCI: Using configuration type 1 for base access
> [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> [swapper/0:1]
> [   28.671351][   C16] Modules linked in:
> [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
> next-20190621+ #1
> [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48 8b
> 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d <65>
> ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> [   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff13
> [   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
> ffffffffb6f2a3b8
> [   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
> ffff8882053d6138
> [   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
> ffffed1040a7ac27
> [   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> 0000000000000246
> [   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
> 0000000000000000
> [   28.761368][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
> knlGS:0000000000000000
> [   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> 00000000001406a0
> [   28.791333][   C16] Call Trace:
> [   28.791374][   C16]  klist_next+0xd8/0x1c0
> [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
> [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
> [   28.801370][   C16]  ? kobject_put+0x23/0x250
> [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
> [   28.811353][   C16]  ? write_policy_show+0x40/0x40
> [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
> [   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> [   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
> [   28.831353][   C16]  topology_init+0xbf/0x126
> [   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> [   28.841368][   C16]  do_one_initcall+0xfe/0x45a
> [   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
> [   28.851353][   C16]  ? kasan_check_write+0x14/0x20
> [   28.861333][   C16]  ? up_write+0x75/0x140
> [   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
> [   28.871333][   C16]  ? rest_init+0x188/0x188
> [   28.871353][   C16]  kernel_init+0x11/0x138
> [   28.881363][   C16]  ? rest_init+0x188/0x188
> [   28.881363][   C16]  ret_from_fork+0x22/0x40
> [   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> [swapper/0:1]
> [   56.671352][   C16] Modules linked in:
> [   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> G             L    5.2.0-rc5-next-20190621+ #1
> [   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
> [   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d 8b 7e
> 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00 <75>
> c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
> [   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287 ORIG_RAX:
> ffffffffffffff13
> [   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
> ffffffffb74c9dc1
> [   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
> ffff8888774ec3e0
> [   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
> ffffed1040a7ac27
> [   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> 0000000000085c1b
> [   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
> ffff8888774ec040
> [   56.761372][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
> knlGS:0000000000000000
> [   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> 00000000001406a0
> [   56.791373][   C16] Call Trace:
> [   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
> [   56.801334][   C16]  ? kobject_put+0x23/0x250
> [   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
> [   56.811333][   C16]  ? write_policy_show+0x40/0x40
> [   56.811353][   C16]  link_mem_sections+0x7e/0xa0
> [   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> [   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
> [   56.831333][   C16]  topology_init+0xbf/0x126
> [   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> [   56.841334][   C16]  do_one_initcall+0xfe/0x45a
> [   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
> [   56.851333][   C16]  ? kasan_check_write+0x14/0x20
> [   56.851354][   C16]  ? up_write+0x75/0x140
> [   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
> [   56.861333][   C16]  ? rest_init+0x188/0x188
> [   56.861369][   C16]  kernel_init+0x11/0x138
> [   56.871333][   C16]  ? rest_init+0x188/0x188
> [   56.871354][   C16]  ret_from_fork+0x22/0x40
> [   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
> [   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
> idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
> [   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
> [   64.621334][   C16] NMI backtrace for cpu 16
> [   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> G             L    5.2.0-rc5-next-20190621+ #1
> [   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   64.641371][   C16] Call Trace:
> [   64.651337][   C16]  <IRQ>
> [   64.651376][   C16]  dump_stack+0x62/0x9a
> [   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
> [   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
> [   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
> [   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
> [   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
> [   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
> [   64.691336][   C16]  ? kasan_check_read+0x11/0x20
> [   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
> [   64.701336][   C16]  update_process_times+0x2f/0x60
> [   64.701362][   C16]  tick_periodic+0x38/0xe0
> [   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
> [   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
> [   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
> [   64.721367][   C16]  </IRQ>
> [   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> [   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 
> 

Thanks for the report. Man, this series is nastier than I thought. This
is making more noise than I was hoping for.

@Andrew can you revert patch 4-6 for now? I'll be on vacation soon and
don't want cleanups to constantly break things. Just nasty.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/6] mm/memory_hotplug: Move and simplify walk_memory_blocks()
  2019-06-20 18:31 ` [PATCH v3 5/6] mm/memory_hotplug: Move and simplify walk_memory_blocks() David Hildenbrand
@ 2019-06-21 15:26   ` David Hildenbrand
  0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2019-06-21 15:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	Greg Kroah-Hartman, Rafael J. Wysocki, Stephen Rothwell,
	Pavel Tatashin, Andrew Banman, mike.travis, Oscar Salvador,
	Michal Hocko, Wei Yang, Arun KS, Qian Cai

On 20.06.19 20:31, David Hildenbrand wrote:
> Let's move walk_memory_blocks() to the place where memory block logic
> resides and simplify it. While at it, add a type for the callback function.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Qian Cai <cai@lca.pw>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/memory.c          | 42 ++++++++++++++++++++++++++
>  include/linux/memory.h         |  3 ++
>  include/linux/memory_hotplug.h |  2 --
>  mm/memory_hotplug.c            | 55 ----------------------------------
>  4 files changed, 45 insertions(+), 57 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index c54e80fd25a8..0204384b4d1d 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -44,6 +44,11 @@ static inline unsigned long pfn_to_block_id(unsigned long pfn)
>  	return base_memory_block_id(pfn_to_section_nr(pfn));
>  }
>  
> +static inline unsigned long phys_to_block_id(unsigned long phys)
> +{
> +	return pfn_to_block_id(PFN_DOWN(phys));
> +}
> +
>  static int memory_subsys_online(struct device *dev);
>  static int memory_subsys_offline(struct device *dev);
>  
> @@ -851,3 +856,40 @@ int __init memory_dev_init(void)
>  		printk(KERN_ERR "%s() failed: %d\n", __func__, ret);
>  	return ret;
>  }
> +
> +/**
> + * walk_memory_blocks - walk through all present memory blocks overlapped
> + *			by the range [start, start + size)
> + *
> + * @start: start address of the memory range
> + * @size: size of the memory range
> + * @arg: argument passed to func
> + * @func: callback for each memory section walked
> + *
> + * This function walks through all present memory blocks overlapped by the
> + * range [start, start + size), calling func on each memory block.
> + *
> + * In case func() returns an error, walking is aborted and the error is
> + * returned.
> + */
> +int walk_memory_blocks(unsigned long start, unsigned long size,
> +		       void *arg, walk_memory_blocks_func_t func)
> +{
> +	const unsigned long start_block_id = phys_to_block_id(start);
> +	const unsigned long end_block_id = phys_to_block_id(start + size - 1);
> +	struct memory_block *mem;
> +	unsigned long block_id;
> +	int ret = 0;

I *guess* the stall we are seeing is when size = 0.

(via ACPI, if info->length is 0)

if (!size)
	return 0;

... but that is just a wild guess. Will have a look after my vacation.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/6] mm: Further memory block device cleanups
  2019-06-21 15:15 ` [PATCH v3 0/6] mm: Further memory block device cleanups Qian Cai
  2019-06-21 15:22   ` David Hildenbrand
@ 2019-06-21 18:24   ` David Hildenbrand
  2019-06-21 18:56     ` David Hildenbrand
                       ` (2 more replies)
  1 sibling, 3 replies; 16+ messages in thread
From: David Hildenbrand @ 2019-06-21 18:24 UTC (permalink / raw)
  To: Qian Cai, linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	Andrew Banman, Anshuman Khandual, Arun KS, Baoquan He,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, Johannes Weiner,
	Juergen Gross, Keith Busch, Len Brown, Mel Gorman,
	Michael Ellerman, Michael Neuling, Michal Hocko, Mike Rapoport,
	mike.travis, Oscar Salvador, Oscar Salvador, Paul Mackerras,
	Pavel Tatashin, Pavel Tatashin, Pavel Tatashin,
	Rafael J. Wysocki, Rafael J. Wysocki, Rashmica Gupta,
	Stephen Rothwell, Thomas Gleixner, Vlastimil Babka, Wei Yang

On 21.06.19 17:15, Qian Cai wrote:
> On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
>> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
>>
>> Some further cleanups around memory block devices. Especially, clean up
>> and simplify walk_memory_range(). Including some other minor cleanups.
>>
>> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
>>
>> v2 -> v3:
>> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
>> -- Avoid warning on ppc.
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
>> -- Fixup a comment regarding hinted devices.
>>
>> v1 -> v2:
>> - "mm: Section numbers use the type "unsigned long""
>> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
>> -- Fix compilation error
>> -- Get rid of the "hint" parameter completely
>>
>> David Hildenbrand (6):
>>   mm: Section numbers use the type "unsigned long"
>>   drivers/base/memory: Use "unsigned long" for block ids
>>   mm: Make register_mem_sect_under_node() static
>>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>>     instead of pfns
>>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
>>
>>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>>  drivers/acpi/acpi_memhotplug.c            |  19 +---
>>  drivers/base/memory.c                     | 120 +++++++++++++---------
>>  drivers/base/node.c                       |   8 +-
>>  include/linux/memory.h                    |   5 +-
>>  include/linux/memory_hotplug.h            |   2 -
>>  include/linux/mmzone.h                    |   4 +-
>>  include/linux/node.h                      |   7 --
>>  mm/memory_hotplug.c                       |  57 +---------
>>  mm/sparse.c                               |  12 +--
>>  10 files changed, 106 insertions(+), 151 deletions(-)
>>
> 
> This series causes a few machines are unable to boot triggering endless soft
> lockups. Reverted those commits fixed the issue.
> 
> 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
> start+size instead of pfns"
> c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> startsize-instead-of-pfns-fix"
> 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify walk_memory_blocks()"
> 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> find_memory_block_hinted()"
> 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-find_memory_block_hinted-
> v3"
> 
> [    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe ASPM,
> so disable it
> [    4.590405][    T1] ACPI: bus type PCI registered
> [    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> 0x80000000-0x8fffffff] (base 0x80000000)
> [    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in
> E820
> [    4.601860][    T1] PCI: Using configuration type 1 for base access
> [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> [swapper/0:1]
> [   28.671351][   C16] Modules linked in:
> [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
> next-20190621+ #1
> [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48 8b
> 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d <65>
> ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> [   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff13
> [   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
> ffffffffb6f2a3b8
> [   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
> ffff8882053d6138
> [   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
> ffffed1040a7ac27
> [   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> 0000000000000246
> [   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
> 0000000000000000
> [   28.761368][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
> knlGS:0000000000000000
> [   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> 00000000001406a0
> [   28.791333][   C16] Call Trace:
> [   28.791374][   C16]  klist_next+0xd8/0x1c0
> [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
> [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
> [   28.801370][   C16]  ? kobject_put+0x23/0x250
> [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
> [   28.811353][   C16]  ? write_policy_show+0x40/0x40
> [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
> [   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> [   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
> [   28.831353][   C16]  topology_init+0xbf/0x126
> [   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> [   28.841368][   C16]  do_one_initcall+0xfe/0x45a
> [   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
> [   28.851353][   C16]  ? kasan_check_write+0x14/0x20
> [   28.861333][   C16]  ? up_write+0x75/0x140
> [   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
> [   28.871333][   C16]  ? rest_init+0x188/0x188
> [   28.871353][   C16]  kernel_init+0x11/0x138
> [   28.881363][   C16]  ? rest_init+0x188/0x188
> [   28.881363][   C16]  ret_from_fork+0x22/0x40
> [   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> [swapper/0:1]
> [   56.671352][   C16] Modules linked in:
> [   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> G             L    5.2.0-rc5-next-20190621+ #1
> [   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
> [   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d 8b 7e
> 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00 <75>
> c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
> [   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287 ORIG_RAX:
> ffffffffffffff13
> [   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
> ffffffffb74c9dc1
> [   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
> ffff8888774ec3e0
> [   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
> ffffed1040a7ac27
> [   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> 0000000000085c1b
> [   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
> ffff8888774ec040
> [   56.761372][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
> knlGS:0000000000000000
> [   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> 00000000001406a0
> [   56.791373][   C16] Call Trace:
> [   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
> [   56.801334][   C16]  ? kobject_put+0x23/0x250
> [   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
> [   56.811333][   C16]  ? write_policy_show+0x40/0x40
> [   56.811353][   C16]  link_mem_sections+0x7e/0xa0
> [   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> [   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
> [   56.831333][   C16]  topology_init+0xbf/0x126
> [   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> [   56.841334][   C16]  do_one_initcall+0xfe/0x45a
> [   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
> [   56.851333][   C16]  ? kasan_check_write+0x14/0x20
> [   56.851354][   C16]  ? up_write+0x75/0x140
> [   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
> [   56.861333][   C16]  ? rest_init+0x188/0x188
> [   56.861369][   C16]  kernel_init+0x11/0x138
> [   56.871333][   C16]  ? rest_init+0x188/0x188
> [   56.871354][   C16]  ret_from_fork+0x22/0x40
> [   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
> [   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
> idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
> [   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
> [   64.621334][   C16] NMI backtrace for cpu 16
> [   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> G             L    5.2.0-rc5-next-20190621+ #1
> [   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   64.641371][   C16] Call Trace:
> [   64.651337][   C16]  <IRQ>
> [   64.651376][   C16]  dump_stack+0x62/0x9a
> [   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
> [   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
> [   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
> [   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
> [   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
> [   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
> [   64.691336][   C16]  ? kasan_check_read+0x11/0x20
> [   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
> [   64.701336][   C16]  update_process_times+0x2f/0x60
> [   64.701362][   C16]  tick_periodic+0x38/0xe0
> [   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
> [   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
> [   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
> [   64.721367][   C16]  </IRQ>
> [   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> [   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 
> 

@Qian Cai, unfortunately I can't reproduce.

If you get the chance, it would be great if you could retry with

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 972c5336bebf..742f99ddd148 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
long size,
        unsigned long block_id;
        int ret = 0;

+       if (!size)
+               return;
+
        for (block_id = start_block_id; block_id <= end_block_id;
block_id++) {
                mem = find_memory_block_by_id(block_id);
                if (!mem)



If both, start and size are 0, we would get a veeeery long loop. This
would mean that we have an online node that does not span any pages at
all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).

-- 

Thanks,

David / dhildenb

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/6] mm: Further memory block device cleanups
  2019-06-21 18:24   ` David Hildenbrand
@ 2019-06-21 18:56     ` David Hildenbrand
  2019-06-21 19:07       ` Qian Cai
  2019-06-21 19:29     ` Qian Cai
  2019-06-21 23:42     ` Andrew Morton
  2 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2019-06-21 18:56 UTC (permalink / raw)
  To: Qian Cai, linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	Andrew Banman, Anshuman Khandual, Arun KS, Baoquan He,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, Johannes Weiner,
	Juergen Gross, Keith Busch, Len Brown, Mel Gorman,
	Michael Ellerman, Michael Neuling, Michal Hocko, Mike Rapoport,
	mike.travis, Oscar Salvador, Oscar Salvador, Paul Mackerras,
	Pavel Tatashin, Pavel Tatashin, Pavel Tatashin,
	Rafael J. Wysocki, Rafael J. Wysocki, Rashmica Gupta,
	Stephen Rothwell, Thomas Gleixner, Vlastimil Babka, Wei Yang

On 21.06.19 20:24, David Hildenbrand wrote:
> On 21.06.19 17:15, Qian Cai wrote:
>> On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
>>> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
>>>
>>> Some further cleanups around memory block devices. Especially, clean up
>>> and simplify walk_memory_range(). Including some other minor cleanups.
>>>
>>> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
>>>
>>> v2 -> v3:
>>> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
>>> -- Avoid warning on ppc.
>>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
>>> -- Fixup a comment regarding hinted devices.
>>>
>>> v1 -> v2:
>>> - "mm: Section numbers use the type "unsigned long""
>>> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
>>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
>>> -- Fix compilation error
>>> -- Get rid of the "hint" parameter completely
>>>
>>> David Hildenbrand (6):
>>>   mm: Section numbers use the type "unsigned long"
>>>   drivers/base/memory: Use "unsigned long" for block ids
>>>   mm: Make register_mem_sect_under_node() static
>>>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>>>     instead of pfns
>>>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>>>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
>>>
>>>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>>>  drivers/acpi/acpi_memhotplug.c            |  19 +---
>>>  drivers/base/memory.c                     | 120 +++++++++++++---------
>>>  drivers/base/node.c                       |   8 +-
>>>  include/linux/memory.h                    |   5 +-
>>>  include/linux/memory_hotplug.h            |   2 -
>>>  include/linux/mmzone.h                    |   4 +-
>>>  include/linux/node.h                      |   7 --
>>>  mm/memory_hotplug.c                       |  57 +---------
>>>  mm/sparse.c                               |  12 +--
>>>  10 files changed, 106 insertions(+), 151 deletions(-)
>>>
>>
>> This series causes a few machines are unable to boot triggering endless soft
>> lockups. Reverted those commits fixed the issue.
>>
>> 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
>> start+size instead of pfns"
>> c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
>> startsize-instead-of-pfns-fix"
>> 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify walk_memory_blocks()"
>> 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
>> find_memory_block_hinted()"
>> 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-find_memory_block_hinted-
>> v3"
>>
>> [    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe ASPM,
>> so disable it
>> [    4.590405][    T1] ACPI: bus type PCI registered
>> [    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
>> 0x80000000-0x8fffffff] (base 0x80000000)
>> [    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in
>> E820
>> [    4.601860][    T1] PCI: Using configuration type 1 for base access
>> [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
>> [swapper/0:1]
>> [   28.671351][   C16] Modules linked in:
>> [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
>> next-20190621+ #1
>> [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
>> Gen10, BIOS A40 03/09/2018
>> [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
>> [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48 8b
>> 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d <65>
>> ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
>> [   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246 ORIG_RAX:
>> ffffffffffffff13
>> [   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
>> ffffffffb6f2a3b8
>> [   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
>> ffff8882053d6138
>> [   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
>> ffffed1040a7ac27
>> [   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
>> 0000000000000246
>> [   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
>> 0000000000000000
>> [   28.761368][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
>> knlGS:0000000000000000
>> [   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
>> 00000000001406a0
>> [   28.791333][   C16] Call Trace:
>> [   28.791374][   C16]  klist_next+0xd8/0x1c0
>> [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
>> [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
>> [   28.801370][   C16]  ? kobject_put+0x23/0x250
>> [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
>> [   28.811353][   C16]  ? write_policy_show+0x40/0x40
>> [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
>> [   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
>> [   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
>> [   28.831353][   C16]  topology_init+0xbf/0x126
>> [   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
>> [   28.841368][   C16]  do_one_initcall+0xfe/0x45a
>> [   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
>> [   28.851353][   C16]  ? kasan_check_write+0x14/0x20
>> [   28.861333][   C16]  ? up_write+0x75/0x140
>> [   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
>> [   28.871333][   C16]  ? rest_init+0x188/0x188
>> [   28.871353][   C16]  kernel_init+0x11/0x138
>> [   28.881363][   C16]  ? rest_init+0x188/0x188
>> [   28.881363][   C16]  ret_from_fork+0x22/0x40
>> [   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
>> [swapper/0:1]
>> [   56.671352][   C16] Modules linked in:
>> [   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
>> G             L    5.2.0-rc5-next-20190621+ #1
>> [   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
>> Gen10, BIOS A40 03/09/2018
>> [   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
>> [   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d 8b 7e
>> 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00 <75>
>> c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
>> [   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287 ORIG_RAX:
>> ffffffffffffff13
>> [   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
>> ffffffffb74c9dc1
>> [   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
>> ffff8888774ec3e0
>> [   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
>> ffffed1040a7ac27
>> [   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
>> 0000000000085c1b
>> [   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
>> ffff8888774ec040
>> [   56.761372][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
>> knlGS:0000000000000000
>> [   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
>> 00000000001406a0
>> [   56.791373][   C16] Call Trace:
>> [   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
>> [   56.801334][   C16]  ? kobject_put+0x23/0x250
>> [   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
>> [   56.811333][   C16]  ? write_policy_show+0x40/0x40
>> [   56.811353][   C16]  link_mem_sections+0x7e/0xa0
>> [   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
>> [   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
>> [   56.831333][   C16]  topology_init+0xbf/0x126
>> [   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
>> [   56.841334][   C16]  do_one_initcall+0xfe/0x45a
>> [   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
>> [   56.851333][   C16]  ? kasan_check_write+0x14/0x20
>> [   56.851354][   C16]  ? up_write+0x75/0x140
>> [   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
>> [   56.861333][   C16]  ? rest_init+0x188/0x188
>> [   56.861369][   C16]  kernel_init+0x11/0x138
>> [   56.871333][   C16]  ? rest_init+0x188/0x188
>> [   56.871354][   C16]  ret_from_fork+0x22/0x40
>> [   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
>> [   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
>> idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
>> [   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
>> [   64.621334][   C16] NMI backtrace for cpu 16
>> [   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
>> G             L    5.2.0-rc5-next-20190621+ #1
>> [   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
>> Gen10, BIOS A40 03/09/2018
>> [   64.641371][   C16] Call Trace:
>> [   64.651337][   C16]  <IRQ>
>> [   64.651376][   C16]  dump_stack+0x62/0x9a
>> [   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
>> [   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
>> [   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
>> [   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
>> [   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
>> [   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
>> [   64.691336][   C16]  ? kasan_check_read+0x11/0x20
>> [   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
>> [   64.701336][   C16]  update_process_times+0x2f/0x60
>> [   64.701362][   C16]  tick_periodic+0x38/0xe0
>> [   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
>> [   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
>> [   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
>> [   64.721367][   C16]  </IRQ>
>> [   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
>> [   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 
>>
> 
> @Qian Cai, unfortunately I can't reproduce.
> 
> If you get the chance, it would be great if you could retry with
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 972c5336bebf..742f99ddd148 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
> long size,
>         unsigned long block_id;
>         int ret = 0;
> 
> +       if (!size)
> +               return;
> +
>         for (block_id = start_block_id; block_id <= end_block_id;
> block_id++) {
>                 mem = find_memory_block_by_id(block_id);
>                 if (!mem)
> 
> 
> 
> If both, start and size are 0, we would get a veeeery long loop. This
> would mean that we have an online node that does not span any pages at
> all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).
> 


...trying to reproduce with QEMU (setting 0MB for the second node):

qemu-system-x86_64 --enable-kvm -m 4G,maxmem=20G,slots=2 \
	-smp sockets=2,cores=1 \
	-numa node,nodeid=0,cpus=0,mem=4G \
	-numa node,nodeid=1,cpus=1,mem=0 ...

I can indeed see that the node is online and
"pgdat->node_start_pfn == 0 && start_pfn + pgdat->node_spanned_pages == 0".

However, the kernel segfaults in an unrelated code path, so I can't
verify if this solves this problem:

[    0.313284] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[    0.313479] #PF: supervisor read access in kernel mode
[    0.313479] #PF: error_code(0x0000) - not-present page
[    0.313479] PGD 0 P4D 0 
[    0.313479] Oops: 0000 [#1] SMP PTI
[    0.313479] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-20190620+ #56
[    0.313479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
[    0.313479] RIP: 0010:bus_add_device+0x59/0x110
[    0.313479] Code: 20 48 89 df e8 f8 b4 ff ff 41 89 c4 85 c0 0f 85 81 00 00 00 48 8b 53 50 48 85 d2 75 03 48 8b 135
[    0.313479] RSP: 0000:ffffb4a6c0013e20 EFLAGS: 00010246
[    0.313479] RAX: 0000000000000000 RBX: ffff8b61bac23800 RCX: 0000000000000000
[    0.313479] RDX: ffff8b61bac29038 RSI: ffff8b61bac23800 RDI: ffff8b61bac23800
[    0.313479] RBP: ffffffff9d2f4500 R08: 0000000000000000 R09: 0000000000000001
[    0.313479] R10: 0000000000000000 R11: ffff8b61bad20878 R12: 0000000000000000
[    0.313479] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    0.313479] FS:  0000000000000000(0000) GS:ffff8b61bba00000(0000) knlGS:0000000000000000
[    0.313479] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.313479] CR2: 00000000000000a0 CR3: 0000000013c24000 CR4: 00000000000006f0
[    0.313479] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.313479] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.313479] Call Trace:
[    0.313479]  device_add+0x304/0x660
[    0.313479]  ? __init_waitqueue_head+0x31/0x50
[    0.313479]  __register_one_node+0x67/0x170
[    0.313479]  __try_online_node.cold+0x3e/0x78
[    0.313479]  try_online_node+0x25/0x40
[    0.313479]  do_cpu_up+0x36/0xc0
[    0.313479]  smp_init+0x59/0xb3
[    0.313479]  kernel_init_freeable+0x11a/0x247
[    0.313479]  ? rest_init+0x23f/0x23f
[    0.313479]  kernel_init+0x5/0xf1
[    0.313479]  ret_from_fork+0x3a/0x50
[    0.313479] Modules linked in:

Figuring out what goes wrong here (maybe QEMU creating a weird
system configuration) is a different journey :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/6] mm: Further memory block device cleanups
  2019-06-21 18:56     ` David Hildenbrand
@ 2019-06-21 19:07       ` Qian Cai
  2019-06-21 19:25         ` David Hildenbrand
  0 siblings, 1 reply; 16+ messages in thread
From: Qian Cai @ 2019-06-21 19:07 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	Andrew Banman, Anshuman Khandual, Arun KS, Baoquan He,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, Johannes Weiner,
	Juergen Gross, Keith Busch, Len Brown, Mel Gorman,
	Michael Ellerman, Michael Neuling, Michal Hocko, Mike Rapoport,
	mike.travis, Oscar Salvador, Oscar Salvador, Paul Mackerras,
	Pavel Tatashin, Pavel Tatashin, Pavel Tatashin,
	Rafael J. Wysocki, Rafael J. Wysocki, Rashmica Gupta,
	Stephen Rothwell, Thomas Gleixner, Vlastimil Babka, Wei Yang

On Fri, 2019-06-21 at 20:56 +0200, David Hildenbrand wrote:
> On 21.06.19 20:24, David Hildenbrand wrote:
> > On 21.06.19 17:15, Qian Cai wrote:
> > > On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
> > > > @Andrew: Only patch 1, 4 and 6 changed compared to v1.
> > > > 
> > > > Some further cleanups around memory block devices. Especially, clean up
> > > > and simplify walk_memory_range(). Including some other minor cleanups.
> > > > 
> > > > Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
> > > > 
> > > > v2 -> v3:
> > > > - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
> > > > -- Avoid warning on ppc.
> > > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
> > > > -- Fixup a comment regarding hinted devices.
> > > > 
> > > > v1 -> v2:
> > > > - "mm: Section numbers use the type "unsigned long""
> > > > -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
> > > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
> > > > -- Fix compilation error
> > > > -- Get rid of the "hint" parameter completely
> > > > 
> > > > David Hildenbrand (6):
> > > >   mm: Section numbers use the type "unsigned long"
> > > >   drivers/base/memory: Use "unsigned long" for block ids
> > > >   mm: Make register_mem_sect_under_node() static
> > > >   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
> > > >     instead of pfns
> > > >   mm/memory_hotplug: Move and simplify walk_memory_blocks()
> > > >   drivers/base/memory.c: Get rid of find_memory_block_hinted()
> > > > 
> > > >  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
> > > >  drivers/acpi/acpi_memhotplug.c            |  19 +---
> > > >  drivers/base/memory.c                     | 120 +++++++++++++---------
> > > >  drivers/base/node.c                       |   8 +-
> > > >  include/linux/memory.h                    |   5 +-
> > > >  include/linux/memory_hotplug.h            |   2 -
> > > >  include/linux/mmzone.h                    |   4 +-
> > > >  include/linux/node.h                      |   7 --
> > > >  mm/memory_hotplug.c                       |  57 +---------
> > > >  mm/sparse.c                               |  12 +--
> > > >  10 files changed, 106 insertions(+), 151 deletions(-)
> > > > 
> > > 
> > > This series causes a few machines are unable to boot triggering endless
> > > soft
> > > lockups. Reverted those commits fixed the issue.
> > > 
> > > 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and
> > > pass
> > > start+size instead of pfns"
> > > c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> > > startsize-instead-of-pfns-fix"
> > > 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify
> > > walk_memory_blocks()"
> > > 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> > > find_memory_block_hinted()"
> > > 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-
> > > find_memory_block_hinted-
> > > v3"
> > > 
> > > [    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe
> > > ASPM,
> > > so disable it
> > > [    4.590405][    T1] ACPI: bus type PCI registered
> > > [    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> > > 0x80000000-0x8fffffff] (base 0x80000000)
> > > [    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff]
> > > reserved in
> > > E820
> > > [    4.601860][    T1] PCI: Using configuration type 1 for base access
> > > [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > > [swapper/0:1]
> > > [   28.671351][   C16] Modules linked in:
> > > [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-
> > > rc5-
> > > next-20190621+ #1
> > > [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385
> > > Gen10, BIOS A40 03/09/2018
> > > [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > > [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53
> > > 48 8b
> > > 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d
> > > <65>
> > > ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> > > [   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246
> > > ORIG_RAX:
> > > ffffffffffffff13
> > > [   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
> > > ffffffffb6f2a3b8
> > > [   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
> > > ffff8882053d6138
> > > [   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
> > > ffffed1040a7ac27
> > > [   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> > > 0000000000000246
> > > [   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
> > > 0000000000000000
> > > [   28.761368][   C16] FS:  0000000000000000(0000)
> > > GS:ffff888454500000(0000)
> > > knlGS:0000000000000000
> > > [   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> > > 00000000001406a0
> > > [   28.791333][   C16] Call Trace:
> > > [   28.791374][   C16]  klist_next+0xd8/0x1c0
> > > [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
> > > [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
> > > [   28.801370][   C16]  ? kobject_put+0x23/0x250
> > > [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
> > > [   28.811353][   C16]  ? write_policy_show+0x40/0x40
> > > [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
> > > [   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> > > [   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
> > > [   28.831353][   C16]  topology_init+0xbf/0x126
> > > [   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> > > [   28.841368][   C16]  do_one_initcall+0xfe/0x45a
> > > [   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
> > > [   28.851353][   C16]  ? kasan_check_write+0x14/0x20
> > > [   28.861333][   C16]  ? up_write+0x75/0x140
> > > [   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
> > > [   28.871333][   C16]  ? rest_init+0x188/0x188
> > > [   28.871353][   C16]  kernel_init+0x11/0x138
> > > [   28.881363][   C16]  ? rest_init+0x188/0x188
> > > [   28.881363][   C16]  ret_from_fork+0x22/0x40
> > > [   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > > [swapper/0:1]
> > > [   56.671352][   C16] Modules linked in:
> > > [   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> > > G             L    5.2.0-rc5-next-20190621+ #1
> > > [   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385
> > > Gen10, BIOS A40 03/09/2018
> > > [   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
> > > [   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d
> > > 8b 7e
> > > 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00
> > > <75>
> > > c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
> > > [   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287
> > > ORIG_RAX:
> > > ffffffffffffff13
> > > [   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
> > > ffffffffb74c9dc1
> > > [   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
> > > ffff8888774ec3e0
> > > [   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
> > > ffffed1040a7ac27
> > > [   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> > > 0000000000085c1b
> > > [   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
> > > ffff8888774ec040
> > > [   56.761372][   C16] FS:  0000000000000000(0000)
> > > GS:ffff888454500000(0000)
> > > knlGS:0000000000000000
> > > [   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> > > 00000000001406a0
> > > [   56.791373][   C16] Call Trace:
> > > [   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
> > > [   56.801334][   C16]  ? kobject_put+0x23/0x250
> > > [   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
> > > [   56.811333][   C16]  ? write_policy_show+0x40/0x40
> > > [   56.811353][   C16]  link_mem_sections+0x7e/0xa0
> > > [   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> > > [   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
> > > [   56.831333][   C16]  topology_init+0xbf/0x126
> > > [   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> > > [   56.841334][   C16]  do_one_initcall+0xfe/0x45a
> > > [   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
> > > [   56.851333][   C16]  ? kasan_check_write+0x14/0x20
> > > [   56.851354][   C16]  ? up_write+0x75/0x140
> > > [   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
> > > [   56.861333][   C16]  ? rest_init+0x188/0x188
> > > [   56.861369][   C16]  kernel_init+0x11/0x138
> > > [   56.871333][   C16]  ? rest_init+0x188/0x188
> > > [   56.871354][   C16]  ret_from_fork+0x22/0x40
> > > [   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
> > > [   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
> > > idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
> > > [   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
> > > [   64.621334][   C16] NMI backtrace for cpu 16
> > > [   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> > > G             L    5.2.0-rc5-next-20190621+ #1
> > > [   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385
> > > Gen10, BIOS A40 03/09/2018
> > > [   64.641371][   C16] Call Trace:
> > > [   64.651337][   C16]  <IRQ>
> > > [   64.651376][   C16]  dump_stack+0x62/0x9a
> > > [   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
> > > [   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
> > > [   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
> > > [   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
> > > [   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
> > > [   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
> > > [   64.691336][   C16]  ? kasan_check_read+0x11/0x20
> > > [   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
> > > [   64.701336][   C16]  update_process_times+0x2f/0x60
> > > [   64.701362][   C16]  tick_periodic+0x38/0xe0
> > > [   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
> > > [   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
> > > [   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
> > > [   64.721367][   C16]  </IRQ>
> > > [   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > > [   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 
> > > 
> > 
> > @Qian Cai, unfortunately I can't reproduce.
> > 
> > If you get the chance, it would be great if you could retry with
> > 
> > diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> > index 972c5336bebf..742f99ddd148 100644
> > --- a/drivers/base/memory.c
> > +++ b/drivers/base/memory.c
> > @@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
> > long size,
> >         unsigned long block_id;
> >         int ret = 0;
> > 
> > +       if (!size)
> > +               return;
> > +
> >         for (block_id = start_block_id; block_id <= end_block_id;
> > block_id++) {
> >                 mem = find_memory_block_by_id(block_id);
> >                 if (!mem)
> > 
> > 
> > 
> > If both, start and size are 0, we would get a veeeery long loop. This
> > would mean that we have an online node that does not span any pages at
> > all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).
> > 
> 
> 
> ...trying to reproduce with QEMU (setting 0MB for the second node):
> 
> qemu-system-x86_64 --enable-kvm -m 4G,maxmem=20G,slots=2 \
> 	-smp sockets=2,cores=1 \
> 	-numa node,nodeid=0,cpus=0,mem=4G \
> 	-numa node,nodeid=1,cpus=1,mem=0 ...
> 
> I can indeed see that the node is online and
> "pgdat->node_start_pfn == 0 && start_pfn + pgdat->node_spanned_pages == 0".
> 
> However, the kernel segfaults in an unrelated code path, so I can't
> verify if this solves this problem:
> 
> [    0.313284] BUG: kernel NULL pointer dereference, address: 00000000000000a0
> [    0.313479] #PF: supervisor read access in kernel mode
> [    0.313479] #PF: error_code(0x0000) - not-present page
> [    0.313479] PGD 0 P4D 0 
> [    0.313479] Oops: 0000 [#1] SMP PTI
> [    0.313479] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-
> 20190620+ #56
> [    0.313479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
> [    0.313479] RIP: 0010:bus_add_device+0x59/0x110
> [    0.313479] Code: 20 48 89 df e8 f8 b4 ff ff 41 89 c4 85 c0 0f 85 81 00 00
> 00 48 8b 53 50 48 85 d2 75 03 48 8b 135
> [    0.313479] RSP: 0000:ffffb4a6c0013e20 EFLAGS: 00010246
> [    0.313479] RAX: 0000000000000000 RBX: ffff8b61bac23800 RCX:
> 0000000000000000
> [    0.313479] RDX: ffff8b61bac29038 RSI: ffff8b61bac23800 RDI:
> ffff8b61bac23800
> [    0.313479] RBP: ffffffff9d2f4500 R08: 0000000000000000 R09:
> 0000000000000001
> [    0.313479] R10: 0000000000000000 R11: ffff8b61bad20878 R12:
> 0000000000000000
> [    0.313479] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> [    0.313479] FS:  0000000000000000(0000) GS:ffff8b61bba00000(0000)
> knlGS:0000000000000000
> [    0.313479] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.313479] CR2: 00000000000000a0 CR3: 0000000013c24000 CR4:
> 00000000000006f0
> [    0.313479] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [    0.313479] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [    0.313479] Call Trace:
> [    0.313479]  device_add+0x304/0x660
> [    0.313479]  ? __init_waitqueue_head+0x31/0x50
> [    0.313479]  __register_one_node+0x67/0x170
> [    0.313479]  __try_online_node.cold+0x3e/0x78
> [    0.313479]  try_online_node+0x25/0x40
> [    0.313479]  do_cpu_up+0x36/0xc0
> [    0.313479]  smp_init+0x59/0xb3
> [    0.313479]  kernel_init_freeable+0x11a/0x247
> [    0.313479]  ? rest_init+0x23f/0x23f
> [    0.313479]  kernel_init+0x5/0xf1
> [    0.313479]  ret_from_fork+0x3a/0x50
> [    0.313479] Modules linked in:
> 
> Figuring out what goes wrong here (maybe QEMU creating a weird
> system configuration) is a different journey :)
> 

That is a separate issue need to revert,

"x86, numa: always initialize all possible nodes"

and then, you should be able to reproduce.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/6] mm: Further memory block device cleanups
  2019-06-21 19:07       ` Qian Cai
@ 2019-06-21 19:25         ` David Hildenbrand
  0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2019-06-21 19:25 UTC (permalink / raw)
  To: Qian Cai, linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	Andrew Banman, Anshuman Khandual, Arun KS, Baoquan He,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, Johannes Weiner,
	Juergen Gross, Keith Busch, Len Brown, Mel Gorman,
	Michael Ellerman, Michael Neuling, Michal Hocko, Mike Rapoport,
	mike.travis, Oscar Salvador, Oscar Salvador, Paul Mackerras,
	Pavel Tatashin, Pavel Tatashin, Pavel Tatashin,
	Rafael J. Wysocki, Rafael J. Wysocki, Rashmica Gupta,
	Stephen Rothwell, Thomas Gleixner, Vlastimil Babka, Wei Yang

On 21.06.19 21:07, Qian Cai wrote:
> On Fri, 2019-06-21 at 20:56 +0200, David Hildenbrand wrote:
>> On 21.06.19 20:24, David Hildenbrand wrote:
>>> On 21.06.19 17:15, Qian Cai wrote:
>>>> On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
>>>>> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
>>>>>
>>>>> Some further cleanups around memory block devices. Especially, clean up
>>>>> and simplify walk_memory_range(). Including some other minor cleanups.
>>>>>
>>>>> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
>>>>>
>>>>> v2 -> v3:
>>>>> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
>>>>> -- Avoid warning on ppc.
>>>>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
>>>>> -- Fixup a comment regarding hinted devices.
>>>>>
>>>>> v1 -> v2:
>>>>> - "mm: Section numbers use the type "unsigned long""
>>>>> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
>>>>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
>>>>> -- Fix compilation error
>>>>> -- Get rid of the "hint" parameter completely
>>>>>
>>>>> David Hildenbrand (6):
>>>>>   mm: Section numbers use the type "unsigned long"
>>>>>   drivers/base/memory: Use "unsigned long" for block ids
>>>>>   mm: Make register_mem_sect_under_node() static
>>>>>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>>>>>     instead of pfns
>>>>>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>>>>>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
>>>>>
>>>>>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>>>>>  drivers/acpi/acpi_memhotplug.c            |  19 +---
>>>>>  drivers/base/memory.c                     | 120 +++++++++++++---------
>>>>>  drivers/base/node.c                       |   8 +-
>>>>>  include/linux/memory.h                    |   5 +-
>>>>>  include/linux/memory_hotplug.h            |   2 -
>>>>>  include/linux/mmzone.h                    |   4 +-
>>>>>  include/linux/node.h                      |   7 --
>>>>>  mm/memory_hotplug.c                       |  57 +---------
>>>>>  mm/sparse.c                               |  12 +--
>>>>>  10 files changed, 106 insertions(+), 151 deletions(-)
>>>>>
>>>>
>>>> This series causes a few machines are unable to boot triggering endless
>>>> soft
>>>> lockups. Reverted those commits fixed the issue.
>>>>
>>>> 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and
>>>> pass
>>>> start+size instead of pfns"
>>>> c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
>>>> startsize-instead-of-pfns-fix"
>>>> 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify
>>>> walk_memory_blocks()"
>>>> 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
>>>> find_memory_block_hinted()"
>>>> 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-
>>>> find_memory_block_hinted-
>>>> v3"
>>>>
>>>> [    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe
>>>> ASPM,
>>>> so disable it
>>>> [    4.590405][    T1] ACPI: bus type PCI registered
>>>> [    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
>>>> 0x80000000-0x8fffffff] (base 0x80000000)
>>>> [    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff]
>>>> reserved in
>>>> E820
>>>> [    4.601860][    T1] PCI: Using configuration type 1 for base access
>>>> [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
>>>> [swapper/0:1]
>>>> [   28.671351][   C16] Modules linked in:
>>>> [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-
>>>> rc5-
>>>> next-20190621+ #1
>>>> [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>> DL385
>>>> Gen10, BIOS A40 03/09/2018
>>>> [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
>>>> [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53
>>>> 48 8b
>>>> 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d
>>>> <65>
>>>> ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
>>>> [   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246
>>>> ORIG_RAX:
>>>> ffffffffffffff13
>>>> [   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
>>>> ffffffffb6f2a3b8
>>>> [   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
>>>> ffff8882053d6138
>>>> [   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
>>>> ffffed1040a7ac27
>>>> [   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
>>>> 0000000000000246
>>>> [   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
>>>> 0000000000000000
>>>> [   28.761368][   C16] FS:  0000000000000000(0000)
>>>> GS:ffff888454500000(0000)
>>>> knlGS:0000000000000000
>>>> [   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
>>>> 00000000001406a0
>>>> [   28.791333][   C16] Call Trace:
>>>> [   28.791374][   C16]  klist_next+0xd8/0x1c0
>>>> [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
>>>> [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
>>>> [   28.801370][   C16]  ? kobject_put+0x23/0x250
>>>> [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
>>>> [   28.811353][   C16]  ? write_policy_show+0x40/0x40
>>>> [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
>>>> [   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
>>>> [   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
>>>> [   28.831353][   C16]  topology_init+0xbf/0x126
>>>> [   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
>>>> [   28.841368][   C16]  do_one_initcall+0xfe/0x45a
>>>> [   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
>>>> [   28.851353][   C16]  ? kasan_check_write+0x14/0x20
>>>> [   28.861333][   C16]  ? up_write+0x75/0x140
>>>> [   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
>>>> [   28.871333][   C16]  ? rest_init+0x188/0x188
>>>> [   28.871353][   C16]  kernel_init+0x11/0x138
>>>> [   28.881363][   C16]  ? rest_init+0x188/0x188
>>>> [   28.881363][   C16]  ret_from_fork+0x22/0x40
>>>> [   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
>>>> [swapper/0:1]
>>>> [   56.671352][   C16] Modules linked in:
>>>> [   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
>>>> G             L    5.2.0-rc5-next-20190621+ #1
>>>> [   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>> DL385
>>>> Gen10, BIOS A40 03/09/2018
>>>> [   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
>>>> [   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d
>>>> 8b 7e
>>>> 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00
>>>> <75>
>>>> c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
>>>> [   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287
>>>> ORIG_RAX:
>>>> ffffffffffffff13
>>>> [   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
>>>> ffffffffb74c9dc1
>>>> [   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
>>>> ffff8888774ec3e0
>>>> [   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
>>>> ffffed1040a7ac27
>>>> [   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
>>>> 0000000000085c1b
>>>> [   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
>>>> ffff8888774ec040
>>>> [   56.761372][   C16] FS:  0000000000000000(0000)
>>>> GS:ffff888454500000(0000)
>>>> knlGS:0000000000000000
>>>> [   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
>>>> 00000000001406a0
>>>> [   56.791373][   C16] Call Trace:
>>>> [   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
>>>> [   56.801334][   C16]  ? kobject_put+0x23/0x250
>>>> [   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
>>>> [   56.811333][   C16]  ? write_policy_show+0x40/0x40
>>>> [   56.811353][   C16]  link_mem_sections+0x7e/0xa0
>>>> [   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
>>>> [   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
>>>> [   56.831333][   C16]  topology_init+0xbf/0x126
>>>> [   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
>>>> [   56.841334][   C16]  do_one_initcall+0xfe/0x45a
>>>> [   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
>>>> [   56.851333][   C16]  ? kasan_check_write+0x14/0x20
>>>> [   56.851354][   C16]  ? up_write+0x75/0x140
>>>> [   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
>>>> [   56.861333][   C16]  ? rest_init+0x188/0x188
>>>> [   56.861369][   C16]  kernel_init+0x11/0x138
>>>> [   56.871333][   C16]  ? rest_init+0x188/0x188
>>>> [   56.871354][   C16]  ret_from_fork+0x22/0x40
>>>> [   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
>>>> [   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
>>>> idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
>>>> [   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
>>>> [   64.621334][   C16] NMI backtrace for cpu 16
>>>> [   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
>>>> G             L    5.2.0-rc5-next-20190621+ #1
>>>> [   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>> DL385
>>>> Gen10, BIOS A40 03/09/2018
>>>> [   64.641371][   C16] Call Trace:
>>>> [   64.651337][   C16]  <IRQ>
>>>> [   64.651376][   C16]  dump_stack+0x62/0x9a
>>>> [   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
>>>> [   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
>>>> [   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
>>>> [   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
>>>> [   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
>>>> [   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
>>>> [   64.691336][   C16]  ? kasan_check_read+0x11/0x20
>>>> [   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
>>>> [   64.701336][   C16]  update_process_times+0x2f/0x60
>>>> [   64.701362][   C16]  tick_periodic+0x38/0xe0
>>>> [   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
>>>> [   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
>>>> [   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
>>>> [   64.721367][   C16]  </IRQ>
>>>> [   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
>>>> [   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 
>>>>
>>>
>>> @Qian Cai, unfortunately I can't reproduce.
>>>
>>> If you get the chance, it would be great if you could retry with
>>>
>>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>>> index 972c5336bebf..742f99ddd148 100644
>>> --- a/drivers/base/memory.c
>>> +++ b/drivers/base/memory.c
>>> @@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
>>> long size,
>>>         unsigned long block_id;
>>>         int ret = 0;
>>>
>>> +       if (!size)
>>> +               return;
>>> +
>>>         for (block_id = start_block_id; block_id <= end_block_id;
>>> block_id++) {
>>>                 mem = find_memory_block_by_id(block_id);
>>>                 if (!mem)
>>>
>>>
>>>
>>> If both, start and size are 0, we would get a veeeery long loop. This
>>> would mean that we have an online node that does not span any pages at
>>> all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).
>>>
>>
>>
>> ...trying to reproduce with QEMU (setting 0MB for the second node):
>>
>> qemu-system-x86_64 --enable-kvm -m 4G,maxmem=20G,slots=2 \
>> 	-smp sockets=2,cores=1 \
>> 	-numa node,nodeid=0,cpus=0,mem=4G \
>> 	-numa node,nodeid=1,cpus=1,mem=0 ...
>>
>> I can indeed see that the node is online and
>> "pgdat->node_start_pfn == 0 && start_pfn + pgdat->node_spanned_pages == 0".
>>
>> However, the kernel segfaults in an unrelated code path, so I can't
>> verify if this solves this problem:
>>
>> [    0.313284] BUG: kernel NULL pointer dereference, address: 00000000000000a0
>> [    0.313479] #PF: supervisor read access in kernel mode
>> [    0.313479] #PF: error_code(0x0000) - not-present page
>> [    0.313479] PGD 0 P4D 0 
>> [    0.313479] Oops: 0000 [#1] SMP PTI
>> [    0.313479] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-
>> 20190620+ #56
>> [    0.313479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
>> [    0.313479] RIP: 0010:bus_add_device+0x59/0x110
>> [    0.313479] Code: 20 48 89 df e8 f8 b4 ff ff 41 89 c4 85 c0 0f 85 81 00 00
>> 00 48 8b 53 50 48 85 d2 75 03 48 8b 135
>> [    0.313479] RSP: 0000:ffffb4a6c0013e20 EFLAGS: 00010246
>> [    0.313479] RAX: 0000000000000000 RBX: ffff8b61bac23800 RCX:
>> 0000000000000000
>> [    0.313479] RDX: ffff8b61bac29038 RSI: ffff8b61bac23800 RDI:
>> ffff8b61bac23800
>> [    0.313479] RBP: ffffffff9d2f4500 R08: 0000000000000000 R09:
>> 0000000000000001
>> [    0.313479] R10: 0000000000000000 R11: ffff8b61bad20878 R12:
>> 0000000000000000
>> [    0.313479] R13: 0000000000000000 R14: 0000000000000000 R15:
>> 0000000000000000
>> [    0.313479] FS:  0000000000000000(0000) GS:ffff8b61bba00000(0000)
>> knlGS:0000000000000000
>> [    0.313479] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.313479] CR2: 00000000000000a0 CR3: 0000000013c24000 CR4:
>> 00000000000006f0
>> [    0.313479] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [    0.313479] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>> 0000000000000400
>> [    0.313479] Call Trace:
>> [    0.313479]  device_add+0x304/0x660
>> [    0.313479]  ? __init_waitqueue_head+0x31/0x50
>> [    0.313479]  __register_one_node+0x67/0x170
>> [    0.313479]  __try_online_node.cold+0x3e/0x78
>> [    0.313479]  try_online_node+0x25/0x40
>> [    0.313479]  do_cpu_up+0x36/0xc0
>> [    0.313479]  smp_init+0x59/0xb3
>> [    0.313479]  kernel_init_freeable+0x11a/0x247
>> [    0.313479]  ? rest_init+0x23f/0x23f
>> [    0.313479]  kernel_init+0x5/0xf1
>> [    0.313479]  ret_from_fork+0x3a/0x50
>> [    0.313479] Modules linked in:
>>
>> Figuring out what goes wrong here (maybe QEMU creating a weird
>> system configuration) is a different journey :)
>>
> 
> That is a separate issue need to revert,
> 
> "x86, numa: always initialize all possible nodes"
> 
> and then, you should be able to reproduce.
> 

Thanks, reproduced and verified that this is the fix.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/6] mm: Further memory block device cleanups
  2019-06-21 18:24   ` David Hildenbrand
  2019-06-21 18:56     ` David Hildenbrand
@ 2019-06-21 19:29     ` Qian Cai
  2019-06-21 23:42     ` Andrew Morton
  2 siblings, 0 replies; 16+ messages in thread
From: Qian Cai @ 2019-06-21 19:29 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Dan Williams, Andrew Morton, linuxppc-dev, linux-acpi, linux-mm,
	Andrew Banman, Anshuman Khandual, Arun KS, Baoquan He,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, Johannes Weiner,
	Juergen Gross, Keith Busch, Len Brown, Mel Gorman,
	Michael Ellerman, Michael Neuling, Michal Hocko, Mike Rapoport,
	mike.travis, Oscar Salvador, Oscar Salvador, Paul Mackerras,
	Pavel Tatashin, Pavel Tatashin, Pavel Tatashin,
	Rafael J. Wysocki, Rafael J. Wysocki, Rashmica Gupta,
	Stephen Rothwell, Thomas Gleixner, Vlastimil Babka, Wei Yang

On Fri, 2019-06-21 at 20:24 +0200, David Hildenbrand wrote:
> On 21.06.19 17:15, Qian Cai wrote:
> > On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
> > > @Andrew: Only patch 1, 4 and 6 changed compared to v1.
> > > 
> > > Some further cleanups around memory block devices. Especially, clean up
> > > and simplify walk_memory_range(). Including some other minor cleanups.
> > > 
> > > Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
> > > 
> > > v2 -> v3:
> > > - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
> > > -- Avoid warning on ppc.
> > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
> > > -- Fixup a comment regarding hinted devices.
> > > 
> > > v1 -> v2:
> > > - "mm: Section numbers use the type "unsigned long""
> > > -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
> > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
> > > -- Fix compilation error
> > > -- Get rid of the "hint" parameter completely
> > > 
> > > David Hildenbrand (6):
> > >   mm: Section numbers use the type "unsigned long"
> > >   drivers/base/memory: Use "unsigned long" for block ids
> > >   mm: Make register_mem_sect_under_node() static
> > >   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
> > >     instead of pfns
> > >   mm/memory_hotplug: Move and simplify walk_memory_blocks()
> > >   drivers/base/memory.c: Get rid of find_memory_block_hinted()
> > > 
> > >  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
> > >  drivers/acpi/acpi_memhotplug.c            |  19 +---
> > >  drivers/base/memory.c                     | 120 +++++++++++++---------
> > >  drivers/base/node.c                       |   8 +-
> > >  include/linux/memory.h                    |   5 +-
> > >  include/linux/memory_hotplug.h            |   2 -
> > >  include/linux/mmzone.h                    |   4 +-
> > >  include/linux/node.h                      |   7 --
> > >  mm/memory_hotplug.c                       |  57 +---------
> > >  mm/sparse.c                               |  12 +--
> > >  10 files changed, 106 insertions(+), 151 deletions(-)
> > > 
> > 
> > This series causes a few machines are unable to boot triggering endless soft
> > lockups. Reverted those commits fixed the issue.
> > 
> > 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
> > start+size instead of pfns"
> > c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> > startsize-instead-of-pfns-fix"
> > 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify
> > walk_memory_blocks()"
> > 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> > find_memory_block_hinted()"
> > 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-
> > find_memory_block_hinted-
> > v3"
> > 
> > [    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe
> > ASPM,
> > so disable it
> > [    4.590405][    T1] ACPI: bus type PCI registered
> > [    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> > 0x80000000-0x8fffffff] (base 0x80000000)
> > [    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved
> > in
> > E820
> > [    4.601860][    T1] PCI: Using configuration type 1 for base access
> > [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > [swapper/0:1]
> > [   28.671351][   C16] Modules linked in:
> > [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
> > next-20190621+ #1
> > [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385
> > Gen10, BIOS A40 03/09/2018
> > [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48
> > 8b
> > 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d
> > <65>
> > ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> > [   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246 ORIG_RAX:
> > ffffffffffffff13
> > [   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
> > ffffffffb6f2a3b8
> > [   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
> > ffff8882053d6138
> > [   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
> > ffffed1040a7ac27
> > [   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> > 0000000000000246
> > [   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
> > 0000000000000000
> > [   28.761368][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
> > knlGS:0000000000000000
> > [   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> > 00000000001406a0
> > [   28.791333][   C16] Call Trace:
> > [   28.791374][   C16]  klist_next+0xd8/0x1c0
> > [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
> > [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
> > [   28.801370][   C16]  ? kobject_put+0x23/0x250
> > [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
> > [   28.811353][   C16]  ? write_policy_show+0x40/0x40
> > [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
> > [   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> > [   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
> > [   28.831353][   C16]  topology_init+0xbf/0x126
> > [   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> > [   28.841368][   C16]  do_one_initcall+0xfe/0x45a
> > [   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
> > [   28.851353][   C16]  ? kasan_check_write+0x14/0x20
> > [   28.861333][   C16]  ? up_write+0x75/0x140
> > [   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
> > [   28.871333][   C16]  ? rest_init+0x188/0x188
> > [   28.871353][   C16]  kernel_init+0x11/0x138
> > [   28.881363][   C16]  ? rest_init+0x188/0x188
> > [   28.881363][   C16]  ret_from_fork+0x22/0x40
> > [   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > [swapper/0:1]
> > [   56.671352][   C16] Modules linked in:
> > [   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> > G             L    5.2.0-rc5-next-20190621+ #1
> > [   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385
> > Gen10, BIOS A40 03/09/2018
> > [   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
> > [   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d 8b
> > 7e
> > 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00
> > <75>
> > c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
> > [   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287 ORIG_RAX:
> > ffffffffffffff13
> > [   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
> > ffffffffb74c9dc1
> > [   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
> > ffff8888774ec3e0
> > [   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
> > ffffed1040a7ac27
> > [   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> > 0000000000085c1b
> > [   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
> > ffff8888774ec040
> > [   56.761372][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
> > knlGS:0000000000000000
> > [   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> > 00000000001406a0
> > [   56.791373][   C16] Call Trace:
> > [   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
> > [   56.801334][   C16]  ? kobject_put+0x23/0x250
> > [   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
> > [   56.811333][   C16]  ? write_policy_show+0x40/0x40
> > [   56.811353][   C16]  link_mem_sections+0x7e/0xa0
> > [   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> > [   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
> > [   56.831333][   C16]  topology_init+0xbf/0x126
> > [   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> > [   56.841334][   C16]  do_one_initcall+0xfe/0x45a
> > [   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
> > [   56.851333][   C16]  ? kasan_check_write+0x14/0x20
> > [   56.851354][   C16]  ? up_write+0x75/0x140
> > [   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
> > [   56.861333][   C16]  ? rest_init+0x188/0x188
> > [   56.861369][   C16]  kernel_init+0x11/0x138
> > [   56.871333][   C16]  ? rest_init+0x188/0x188
> > [   56.871354][   C16]  ret_from_fork+0x22/0x40
> > [   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
> > [   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
> > idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
> > [   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
> > [   64.621334][   C16] NMI backtrace for cpu 16
> > [   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> > G             L    5.2.0-rc5-next-20190621+ #1
> > [   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385
> > Gen10, BIOS A40 03/09/2018
> > [   64.641371][   C16] Call Trace:
> > [   64.651337][   C16]  <IRQ>
> > [   64.651376][   C16]  dump_stack+0x62/0x9a
> > [   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
> > [   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
> > [   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
> > [   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
> > [   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
> > [   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
> > [   64.691336][   C16]  ? kasan_check_read+0x11/0x20
> > [   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
> > [   64.701336][   C16]  update_process_times+0x2f/0x60
> > [   64.701362][   C16]  tick_periodic+0x38/0xe0
> > [   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
> > [   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
> > [   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
> > [   64.721367][   C16]  </IRQ>
> > [   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > [   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 
> > 
> 
> @Qian Cai, unfortunately I can't reproduce.
> 
> If you get the chance, it would be great if you could retry with
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 972c5336bebf..742f99ddd148 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
> long size,
>         unsigned long block_id;
>         int ret = 0;
> 
> +       if (!size)
> +               return;
> +
>         for (block_id = start_block_id; block_id <= end_block_id;
> block_id++) {
>                 mem = find_memory_block_by_id(block_id);
>                 if (!mem)
> 
> 
> 
> If both, start and size are 0, we would get a veeeery long loop. This
> would mean that we have an online node that does not span any pages at
> all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).
> 

It works fine here.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/6] mm: Further memory block device cleanups
  2019-06-21 18:24   ` David Hildenbrand
  2019-06-21 18:56     ` David Hildenbrand
  2019-06-21 19:29     ` Qian Cai
@ 2019-06-21 23:42     ` Andrew Morton
  2 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2019-06-21 23:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Qian Cai, linux-kernel, Dan Williams, linuxppc-dev, linux-acpi,
	linux-mm, Andrew Banman, Anshuman Khandual, Arun KS, Baoquan He,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, Johannes Weiner,
	Juergen Gross, Keith Busch, Len Brown, Mel Gorman,
	Michael Ellerman, Michael Neuling, Michal Hocko, Mike Rapoport,
	mike.travis, Oscar Salvador, Oscar Salvador, Paul Mackerras,
	Pavel Tatashin, Pavel Tatashin, Pavel Tatashin,
	Rafael J. Wysocki, Rafael J. Wysocki, Rashmica Gupta,
	Stephen Rothwell, Thomas Gleixner, Vlastimil Babka, Wei Yang

On Fri, 21 Jun 2019 20:24:59 +0200 David Hildenbrand <david@redhat.com> wrote:

> @Qian Cai, unfortunately I can't reproduce.
> 
> If you get the chance, it would be great if you could retry with
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 972c5336bebf..742f99ddd148 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
> long size,
>         unsigned long block_id;
>         int ret = 0;
> 
> +       if (!size)
> +               return;
> +
>         for (block_id = start_block_id; block_id <= end_block_id;
> block_id++) {
>                 mem = find_memory_block_by_id(block_id);
>                 if (!mem)
> 
> 
> 
> If both, start and size are 0, we would get a veeeery long loop. This
> would mean that we have an online node that does not span any pages at
> all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).

I think I'll make that a `return 0' and I won't drop patches 4-6 for
now, as we appear to have this fixed.



From: David Hildenbrand <david@redhat.com>
Subject: drivers-base-memoryc-get-rid-of-find_memory_block_hinted-v3-fix

handle zero-length walks

Link: http://lkml.kernel.org/r/1c2edc22-afd7-2211-c4c7-40e54e5007e8@redhat.com
Reported-by: Qian Cai <cai@lca.pw>
Tested-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/base/memory.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/base/memory.c~drivers-base-memoryc-get-rid-of-find_memory_block_hinted-v3-fix
+++ a/drivers/base/memory.c
@@ -866,6 +866,9 @@ int walk_memory_blocks(unsigned long sta
 	unsigned long block_id;
 	int ret = 0;
 
+	if (!size)
+		return 0;
+
 	for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
 		mem = find_memory_block_by_id(block_id);
 		if (!mem)



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-06-21 23:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-20 18:31 [PATCH v3 0/6] mm: Further memory block device cleanups David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 1/6] mm: Section numbers use the type "unsigned long" David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 2/6] drivers/base/memory: Use "unsigned long" for block ids David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 3/6] mm: Make register_mem_sect_under_node() static David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 4/6] mm/memory_hotplug: Rename walk_memory_range() and pass start+size instead of pfns David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 5/6] mm/memory_hotplug: Move and simplify walk_memory_blocks() David Hildenbrand
2019-06-21 15:26   ` David Hildenbrand
2019-06-20 18:31 ` [PATCH v3 6/6] drivers/base/memory.c: Get rid of find_memory_block_hinted() David Hildenbrand
2019-06-21 15:15 ` [PATCH v3 0/6] mm: Further memory block device cleanups Qian Cai
2019-06-21 15:22   ` David Hildenbrand
2019-06-21 18:24   ` David Hildenbrand
2019-06-21 18:56     ` David Hildenbrand
2019-06-21 19:07       ` Qian Cai
2019-06-21 19:25         ` David Hildenbrand
2019-06-21 19:29     ` Qian Cai
2019-06-21 23:42     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).